Sequence composition diversity in Alaskan glacier and other metagenomes

10.7287/peerj.preprints.734v1 ◽

2014 ◽

Cited By ~ 1

Author(s):

Sulbha Choudhari ◽

Roman J Dial ◽

Dibyendu Kumar ◽

Daniel H Shain ◽

Andrey Grigoriev

Keyword(s):

Principal Component Analysis ◽

Gc Content ◽

Principal Component ◽

Nucleotide Composition ◽

Sequencing Technology ◽

Sequence Composition ◽

Comparative Metagenomics ◽

Hypervariable Regions ◽

Word Frequencies ◽

Generation Sequencing

Metagenomics by next generation sequencing has become an important tool for interrogating complex microbial communities. In this study we analyzed several pairs of metagenomic samples obtained by different methods and observed biases, resulting in different nucleotide composition of the sequenced reads. The pairwise sample comparison was based on the principal component analysis of dinucleotide word frequencies in sequences obtained from different platforms. We found bias in the sequences obtained from the different platforms for the amplified hypervariable regions in 16S rRNA but not in shotgun metagenome reads aligned to such hypervariable regions. The differences and consistency of the distributions of the nucleotides suggest that the biases are likely due to a combination of biases introduced by PCR and different sequencing protocols, and they are related to the GC content of the reads produced. For this reason, caution should be exercised when interpreting the results of comparative metagenomics studies, as they may vary depending on the sequencing technology.

Download Full-text

Characterization of Maca (Lepidium meyenii/Lepidium peruvianum) Using a Mass Spectral Fingerprinting, Metabolomic Analysis, and Genetic Sequencing Approach

Planta Medica ◽

10.1055/a-1161-0372 ◽

2020 ◽

Vol 86 (10) ◽

pp. 674-685 ◽

Cited By ~ 1

Author(s):

Ping Geng ◽

Jianghao Sun ◽

Pei Chen ◽

Eric Brand ◽

James Frame ◽

...

Keyword(s):

Mass Spectrometry ◽

Fatty Acids ◽

Principal Component Analysis ◽

Next Generation Sequencing ◽

Flow Injection ◽

Metabolite Profiling ◽

Principal Component ◽

Next Generation ◽

Lepidium Meyenii ◽

Generation Sequencing

AbstractMaca (Lepidium meyenii, synonym L. peruvianum) was analyzed using a systematic approach employing principal component analysis of flow injection mass spectrometry fingerprints (no chromatographic separation) to guide the selection of samples for metabolite profiling and DNA next generation sequencing. Samples consisted of 39 commercial maca supplements from 11 manufacturers, 31 unprocessed maca tubers grown in Peru and China, and a historic non-tuber maca sample from Peru. Principal component analysis of flow injection mass spectrometry fingerprints initially placed all the maca samples in three classes with similar chemical composition: commercial maca samples, tubers grown in Peru, and tubers grown in China. Metabolite profiling identified 67 compounds in the negative mode and 51 compounds in the positive mode. Compounds identified by metabolite profiling (macamides, glucosinolates, amino acids, fatty acids, polyunsaturated fatty acids, saccharides, imidazoles) were then used to identify ions in the flow injection mass spectrometry fingerprints. The tuber fingerprints were analyzed by factorial multivariate analysis of variance revealing that black, red, and yellow maca from Peru and black and yellow maca from China were compositionally different with respect to color and country. Critical ions were identified that allowed for the differentiation of maca between colors from the same country or between two countries with the same color. Genetically, all samples were confirmed to be L. meyenii based on next generation sequencing at three gene regions (ITS2, psbA, and trnL) and comparison to recorded sequences of vouchered standards.

Download Full-text

METAGENOMIC TAXONOMIC CLASSIFICATION USING EXTREME LEARNING MACHINES

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720012500151 ◽

2012 ◽

Vol 10 (05) ◽

pp. 1250015 ◽

Cited By ~ 18

Author(s):

ZEEHASHAM RASHEED ◽

HUZEFA RANGWALA

Keyword(s):

Reference Genome ◽

Species Abundance ◽

Gc Content ◽

Extreme Learning Machines ◽

Taxonomic Assignment ◽

Sequence Composition ◽

Metagenome Analysis ◽

Learning Machines ◽

Sequencing Technologies ◽

Generation Sequencing

Next-generation sequencing technologies have allowed researchers to determine the collective genomes of microbial communities co-existing within diverse ecological environments. Varying species abundance, length and complexities within different communities, coupled with discovery of new species makes the problem of taxonomic assignment to short DNA sequence reads extremely challenging. We have developed a new sequence composition-based taxonomic classifier using extreme learning machines referred to as TAC-ELM for metagenomic analysis. TAC-ELM uses the framework of extreme learning machines to quickly and accurately learn the weights for a neural network model. The input features consist of GC content and oligonucleotides. TAC-ELM is evaluated on two metagenomic benchmarks with sequence read lengths reflecting the traditional and current sequencing technologies. Our empirical results indicate the strength of the developed approach, which outperforms state-of-the-art taxonomic classifiers in terms of accuracy and implementation complexity. We also perform experiments that evaluate the pervasive case within metagenome analysis, where a species may not have been previously sequenced or discovered and will not exist in the reference genome databases. TAC-ELM was also combined with BLAST to show improved classification results. Code and Supplementary Results: http://www.cs.gmu.edu/~mlbio/TAC-ELM (BSD License).

Download Full-text

Inferring Population Structure and Admixture Proportions in Low Depth NGS Data

10.1101/302463 ◽

2018 ◽

Cited By ~ 5

Author(s):

Jonas Meisner ◽

Anders Albrechtsen

Keyword(s):

Principal Component Analysis ◽

Population Structure ◽

Next Generation Sequencing ◽

Principal Component ◽

Component Analysis ◽

Allele Frequencies ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

ABSTRACTWe here present two methods for inferring population structure and admixture proportions in low depth next generation sequencing data. Inference of population structure is essential in both population genetics and association studies and is often performed using principal component analysis or clustering-based approaches. Next-generation sequencing methods provide large amounts of genetic data but are associated with statistical uncertainty for especially low depth sequencing data. Models can account for this uncertainty by working directly on genotype likelihoods of the unobserved genotypes. We propose a method for inferring population structure through principal component analysis in an iterative approach of estimating individual allele frequencies, where we demonstrate improved accuracy in samples with low and variable sequencing depth for both simulated and real datasets. We also use the estimated individual allele frequencies in a fast non-negative matrix factorization method to estimate admixture proportions. Both methods have been implemented in the PCAngsd framework available at http://www.popgen.dk/software/.

Download Full-text

Identifikasi SNP genom pada populasi Elaeis guineensis x Elaeis oleifera

Jurnal Penelitian Kelapa Sawit ◽

10.22302/iopri.jur.jpks.v29i2.148 ◽

2021 ◽

Vol 29 (2) ◽

pp. 81-96

Author(s):

Sri Wening ◽

Heri Adriwan Siregar ◽

Edy Suprianto ◽

Dani Setyawan ◽

Hernawan Y Rahmadi ◽

...

Keyword(s):

Principal Component Analysis ◽

Next Generation Sequencing ◽

Elaeis Guineensis ◽

Principal Component ◽

Genotyping By Sequencing ◽

Component Analysis ◽

Next Generation ◽

Elaeis Oleifera ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

Usaha pencarian marka DNA yang berhubungan dengan sifat yang diinginkan pada Elaeis oleifera guna introgresi sifat tersebut ke genome Elaeis guineensis memerlukan marka DNA yang polimorfik. Untuk menghasilkan marka DNA yang polimorfik dengan jumlah banyak, identifikasi SNP genom dilakukan melalui pengurutan kembali (resequencing) 12 individu contoh populasi hibrida E. guineensis x E. oleifera (hibrida OxG), yaitu E. oleifera tipe liar, F1 hibrida interspesifik, pseudo-backcross dan material maju E. guineensis, menggunakan next generation sequencing (NGS). Read (urutan basa yang “dibaca”/merupakan keluaran mesin NGS) dari 12 contoh memiliki mutu yang baik dan 96% total read yang disaring dapat dilakukan demultipleks dan ditentukan pada contoh yang sesuai. Setelah proses penyaringan dan pemotongan, 84% read dapat digunakan untuk pemetaan genom dan menghasilkan 5,7X hingga 10,42X cakupan genom. Dari 34.410.224 SNP yang teridentifikasi, 98,7% diantaranya adalah varian non-coding, dan berdasarkan lokasi, 69,1% total SNP adalah SNP intergenic. Sebanyak 5.618 SNP dari total SNP yang dihasilkan dibuktikan menggunakan targeted genotyping by sequencing pada 500 individu contoh. Sebanyak 74% SNP yang digunakan bermutu tinggi yang dibaca pada setidaknya 95% contoh. Principal component analysis menggunakan SNP tersebut mampu mengidentifikasi setiap latar belakang genetik contoh. Pembuktian tersebut menyimpulkan bahwa identifikasi SNP yang dilakukan melalui pengurutan kembali menghasilkan SNP bermutu tinggi yang dapat digunakan untuk pengembangan marka DNA yang dapat diperbantukan pada seleksi populasi pemuliaan E. guineensis x E. oleifera.

Download Full-text

Scanning x-ray fluorescence microscopy and principal component analysis

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100133394 ◽

1992 ◽

Vol 50 (2) ◽

pp. 1752-1753

Author(s):

Brian Cross

Keyword(s):

Principal Component Analysis ◽

Fluorescence Microscopy ◽

Spatial Resolution ◽

High Speed ◽

Image Acquisition ◽

Principal Component ◽

Fluorescence Microscope ◽

Acquisition Rate ◽

X Ray ◽

Speed Up

A relatively new entry, in the field of microscopy, is the Scanning X-Ray Fluorescence Microscope (SXRFM). Using this type of instrument (e.g. Kevex Omicron X-ray Microprobe), one can obtain multiple elemental x-ray images, from the analysis of materials which show heterogeneity. The SXRFM obtains images by collimating an x-ray beam (e.g. 100 μm diameter), and then scanning the sample with a high-speed x-y stage. To speed up the image acquisition, data is acquired "on-the-fly" by slew-scanning the stage along the x-axis, like a TV or SEM scan. To reduce the overhead from "fly-back," the images can be acquired by bi-directional scanning of the x-axis. This results in very little overhead with the re-positioning of the sample stage. The image acquisition rate is dominated by the x-ray acquisition rate. Therefore, the total x-ray image acquisition rate, using the SXRFM, is very comparable to an SEM. Although the x-ray spatial resolution of the SXRFM is worse than an SEM (say 100 vs. 2 μm), there are several other advantages.

Download Full-text

A German version of the Intermittent Claudication Questionnaire (ICQ): cultural adaptation and validation

VASA ◽

10.1024/0301-1526/a000218 ◽

2012 ◽

Vol 41 (5) ◽

pp. 333-342 ◽

Cited By ~ 3

Author(s):

Kirchberger ◽

Finger ◽

Müller-Bühl

Keyword(s):

Principal Component Analysis ◽

Intermittent Claudication ◽

Completion Time ◽

Short Form ◽

Principal Component ◽

Component Analysis ◽

German Version ◽

Average Completion Time ◽

Sf 36 ◽

Related Quality

Background: The Intermittent Claudication Questionnaire (ICQ) is a short questionnaire for the assessment of health-related quality of life (HRQOL) in patients with intermittent claudication (IC). The objective of this study was to translate the ICQ into German and to investigate the psychometric properties of the German ICQ version in patients with IC. Patients and methods: The original English version was translated using a forward-backward method. The resulting German version was reviewed by the author of the original version and an experienced clinician. Finally, it was tested for clarity with 5 German patients with IC. A sample of 81 patients were administered the German ICQ. The sample consisted of 58.0 % male patients with a median age of 71 years and a median IC duration of 36 months. Test of feasibility included completeness of questionnaires, completion time, and ratings of clarity, length and relevance. Reliability was assessed through a retest in 13 patients at 14 days, and analysis of Cronbachs alpha for internal consistency. Construct validity was investigated using principal component analysis. Concurrent validity was assessed by correlating the ICQ scores with the Short Form 36 Health Survey (SF-36) as well as clinical measures. Results: The ICQ was completely filled in by 73 subjects (90.1 %) with an average completion time of 6.3 minutes. Cronbachs alpha coefficient reached 0.75. Intra-class correlation for test-retest reliability was r = 0.88. Principal component analysis resulted in a 3 factor solution. The first factor explained 51.5 of the total variation and all items had loadings of at least 0.65 on it. The ICQ was significantly associated with the SF-36 and treadmill-walking distances whereas no association was found for resting ABPI. Conclusions: The German version of the ICQ demonstrated good feasibility, satisfactory reliability and good validity. Responsiveness should be investigated in further validation studies.

Download Full-text

Review of Three-Mode Principal Component Analysis: Theory and Applications, Vol. 2.

Contemporary Psychology ◽

10.1037/022425 ◽

1984 ◽

Vol 29 (11) ◽

pp. 915-916

Author(s):

William R Koch

Keyword(s):

Principal Component Analysis ◽

Principal Component ◽

Component Analysis ◽

Analysis Theory

Download Full-text

Principal component analysis of variation pattern amongCola nitida, Cola acuminata (Sterculiaceae) and their interspecific hybrids

Feddes Repertorium ◽

10.1002/fedr.4911110312 ◽

2000 ◽

Vol 111 (3-4) ◽

pp. 183-188

Author(s):

P. O. Adebola

Keyword(s):

Principal Component Analysis ◽

Interspecific Hybrids ◽

Principal Component ◽

Component Analysis ◽

Variation Pattern ◽

Analysis Of Variation ◽

Cola Acuminata

Download Full-text

Nonlinear analysis of electroencephalograms of healthy people during driving test based on symplectic principal component analysis method

PsycEXTRA Dataset ◽

10.1037/e573792014-017 ◽

2014 ◽

Author(s):

Min Lei ◽

Guang Meng ◽

Nilanjan Sarkar ◽

Jing Fan ◽

Josh Wade ◽

...

Keyword(s):

Principal Component Analysis ◽

Nonlinear Analysis ◽

Principal Component ◽

Component Analysis ◽

Healthy People ◽

Analysis Method ◽

Principal Component Analysis Method

Download Full-text