scholarly journals Sequence composition diversity in Alaskan glacier and other metagenomes

Author(s):  
Sulbha Choudhari ◽  
Roman J Dial ◽  
Dibyendu Kumar ◽  
Daniel H Shain ◽  
Andrey Grigoriev

Metagenomics by next generation sequencing has become an important tool for interrogating complex microbial communities. In this study we analyzed several pairs of metagenomic samples obtained by different methods and observed biases, resulting in different nucleotide composition of the sequenced reads. The pairwise sample comparison was based on the principal component analysis of dinucleotide word frequencies in sequences obtained from different platforms. We found bias in the sequences obtained from the different platforms for the amplified hypervariable regions in 16S rRNA but not in shotgun metagenome reads aligned to such hypervariable regions. The differences and consistency of the distributions of the nucleotides suggest that the biases are likely due to a combination of biases introduced by PCR and different sequencing protocols, and they are related to the GC content of the reads produced. For this reason, caution should be exercised when interpreting the results of comparative metagenomics studies, as they may vary depending on the sequencing technology.

Author(s):  
Sulbha Choudhari ◽  
Roman J Dial ◽  
Dibyendu Kumar ◽  
Daniel H Shain ◽  
Andrey Grigoriev

Metagenomics by next generation sequencing has become an important tool for interrogating complex microbial communities. In this study we analyzed several pairs of metagenomic samples obtained by different methods and observed biases, resulting in different nucleotide composition of the sequenced reads. The pairwise sample comparison was based on the principal component analysis of dinucleotide word frequencies in sequences obtained from different platforms. We found bias in the sequences obtained from the different platforms for the amplified hypervariable regions in 16S rRNA but not in shotgun metagenome reads aligned to such hypervariable regions. The differences and consistency of the distributions of the nucleotides suggest that the biases are likely due to a combination of biases introduced by PCR and different sequencing protocols, and they are related to the GC content of the reads produced. For this reason, caution should be exercised when interpreting the results of comparative metagenomics studies, as they may vary depending on the sequencing technology.


Planta Medica ◽  
2020 ◽  
Vol 86 (10) ◽  
pp. 674-685 ◽  
Author(s):  
Ping Geng ◽  
Jianghao Sun ◽  
Pei Chen ◽  
Eric Brand ◽  
James Frame ◽  
...  

AbstractMaca (Lepidium meyenii, synonym L. peruvianum) was analyzed using a systematic approach employing principal component analysis of flow injection mass spectrometry fingerprints (no chromatographic separation) to guide the selection of samples for metabolite profiling and DNA next generation sequencing. Samples consisted of 39 commercial maca supplements from 11 manufacturers, 31 unprocessed maca tubers grown in Peru and China, and a historic non-tuber maca sample from Peru. Principal component analysis of flow injection mass spectrometry fingerprints initially placed all the maca samples in three classes with similar chemical composition: commercial maca samples, tubers grown in Peru, and tubers grown in China. Metabolite profiling identified 67 compounds in the negative mode and 51 compounds in the positive mode. Compounds identified by metabolite profiling (macamides, glucosinolates, amino acids, fatty acids, polyunsaturated fatty acids, saccharides, imidazoles) were then used to identify ions in the flow injection mass spectrometry fingerprints. The tuber fingerprints were analyzed by factorial multivariate analysis of variance revealing that black, red, and yellow maca from Peru and black and yellow maca from China were compositionally different with respect to color and country. Critical ions were identified that allowed for the differentiation of maca between colors from the same country or between two countries with the same color. Genetically, all samples were confirmed to be L. meyenii based on next generation sequencing at three gene regions (ITS2, psbA, and trnL) and comparison to recorded sequences of vouchered standards.


2012 ◽  
Vol 10 (05) ◽  
pp. 1250015 ◽  
Author(s):  
ZEEHASHAM RASHEED ◽  
HUZEFA RANGWALA

Next-generation sequencing technologies have allowed researchers to determine the collective genomes of microbial communities co-existing within diverse ecological environments. Varying species abundance, length and complexities within different communities, coupled with discovery of new species makes the problem of taxonomic assignment to short DNA sequence reads extremely challenging. We have developed a new sequence composition-based taxonomic classifier using extreme learning machines referred to as TAC-ELM for metagenomic analysis. TAC-ELM uses the framework of extreme learning machines to quickly and accurately learn the weights for a neural network model. The input features consist of GC content and oligonucleotides. TAC-ELM is evaluated on two metagenomic benchmarks with sequence read lengths reflecting the traditional and current sequencing technologies. Our empirical results indicate the strength of the developed approach, which outperforms state-of-the-art taxonomic classifiers in terms of accuracy and implementation complexity. We also perform experiments that evaluate the pervasive case within metagenome analysis, where a species may not have been previously sequenced or discovered and will not exist in the reference genome databases. TAC-ELM was also combined with BLAST to show improved classification results. Code and Supplementary Results: http://www.cs.gmu.edu/~mlbio/TAC-ELM (BSD License).


2018 ◽  
Author(s):  
Jonas Meisner ◽  
Anders Albrechtsen

ABSTRACTWe here present two methods for inferring population structure and admixture proportions in low depth next generation sequencing data. Inference of population structure is essential in both population genetics and association studies and is often performed using principal component analysis or clustering-based approaches. Next-generation sequencing methods provide large amounts of genetic data but are associated with statistical uncertainty for especially low depth sequencing data. Models can account for this uncertainty by working directly on genotype likelihoods of the unobserved genotypes. We propose a method for inferring population structure through principal component analysis in an iterative approach of estimating individual allele frequencies, where we demonstrate improved accuracy in samples with low and variable sequencing depth for both simulated and real datasets. We also use the estimated individual allele frequencies in a fast non-negative matrix factorization method to estimate admixture proportions. Both methods have been implemented in the PCAngsd framework available at http://www.popgen.dk/software/.


2021 ◽  
Vol 29 (2) ◽  
pp. 81-96
Author(s):  
Sri Wening ◽  
Heri Adriwan Siregar ◽  
Edy Suprianto ◽  
Dani Setyawan ◽  
Hernawan Y Rahmadi ◽  
...  

Usaha pencarian marka DNA yang berhubungan dengan sifat yang diinginkan pada Elaeis oleifera guna introgresi sifat tersebut ke genome Elaeis guineensis memerlukan marka DNA yang polimorfik. Untuk menghasilkan marka DNA yang polimorfik dengan jumlah banyak, identifikasi SNP genom dilakukan melalui pengurutan kembali (resequencing) 12 individu contoh populasi hibrida E. guineensis x E. oleifera (hibrida OxG), yaitu E. oleifera tipe liar, F1 hibrida interspesifik, pseudo-backcross dan material maju E. guineensis, menggunakan next generation sequencing (NGS). Read (urutan basa yang “dibaca”/merupakan keluaran mesin NGS) dari 12 contoh memiliki mutu yang baik dan 96% total read yang disaring dapat dilakukan demultipleks dan ditentukan pada contoh yang sesuai. Setelah proses penyaringan dan pemotongan, 84% read dapat digunakan untuk pemetaan genom dan menghasilkan 5,7X hingga 10,42X cakupan genom. Dari 34.410.224 SNP yang teridentifikasi, 98,7% diantaranya adalah varian non-coding, dan berdasarkan lokasi, 69,1% total SNP adalah SNP intergenic. Sebanyak 5.618 SNP dari total SNP yang dihasilkan dibuktikan menggunakan targeted genotyping by sequencing pada 500 individu contoh. Sebanyak 74% SNP yang digunakan bermutu tinggi yang dibaca pada setidaknya 95% contoh. Principal component analysis menggunakan SNP tersebut mampu mengidentifikasi setiap latar belakang genetik contoh. Pembuktian tersebut menyimpulkan bahwa identifikasi SNP yang dilakukan melalui pengurutan kembali menghasilkan SNP bermutu tinggi yang dapat digunakan untuk pengembangan marka DNA yang dapat diperbantukan pada seleksi populasi pemuliaan E. guineensis x E. oleifera.


Author(s):  
Brian Cross

A relatively new entry, in the field of microscopy, is the Scanning X-Ray Fluorescence Microscope (SXRFM). Using this type of instrument (e.g. Kevex Omicron X-ray Microprobe), one can obtain multiple elemental x-ray images, from the analysis of materials which show heterogeneity. The SXRFM obtains images by collimating an x-ray beam (e.g. 100 μm diameter), and then scanning the sample with a high-speed x-y stage. To speed up the image acquisition, data is acquired "on-the-fly" by slew-scanning the stage along the x-axis, like a TV or SEM scan. To reduce the overhead from "fly-back," the images can be acquired by bi-directional scanning of the x-axis. This results in very little overhead with the re-positioning of the sample stage. The image acquisition rate is dominated by the x-ray acquisition rate. Therefore, the total x-ray image acquisition rate, using the SXRFM, is very comparable to an SEM. Although the x-ray spatial resolution of the SXRFM is worse than an SEM (say 100 vs. 2 μm), there are several other advantages.


VASA ◽  
2012 ◽  
Vol 41 (5) ◽  
pp. 333-342 ◽  
Author(s):  
Kirchberger ◽  
Finger ◽  
Müller-Bühl

Background: The Intermittent Claudication Questionnaire (ICQ) is a short questionnaire for the assessment of health-related quality of life (HRQOL) in patients with intermittent claudication (IC). The objective of this study was to translate the ICQ into German and to investigate the psychometric properties of the German ICQ version in patients with IC. Patients and methods: The original English version was translated using a forward-backward method. The resulting German version was reviewed by the author of the original version and an experienced clinician. Finally, it was tested for clarity with 5 German patients with IC. A sample of 81 patients were administered the German ICQ. The sample consisted of 58.0 % male patients with a median age of 71 years and a median IC duration of 36 months. Test of feasibility included completeness of questionnaires, completion time, and ratings of clarity, length and relevance. Reliability was assessed through a retest in 13 patients at 14 days, and analysis of Cronbach’s alpha for internal consistency. Construct validity was investigated using principal component analysis. Concurrent validity was assessed by correlating the ICQ scores with the Short Form 36 Health Survey (SF-36) as well as clinical measures. Results: The ICQ was completely filled in by 73 subjects (90.1 %) with an average completion time of 6.3 minutes. Cronbach’s alpha coefficient reached 0.75. Intra-class correlation for test-retest reliability was r = 0.88. Principal component analysis resulted in a 3 factor solution. The first factor explained 51.5 of the total variation and all items had loadings of at least 0.65 on it. The ICQ was significantly associated with the SF-36 and treadmill-walking distances whereas no association was found for resting ABPI. Conclusions: The German version of the ICQ demonstrated good feasibility, satisfactory reliability and good validity. Responsiveness should be investigated in further validation studies.


Sign in / Sign up

Export Citation Format

Share Document