scholarly journals Efficient genotype compression and analysis of large genetic variation datasets

2015 ◽  
Author(s):  
Ryan M Layer ◽  
Neil Kindlon ◽  
Konrad J Karczewski ◽  
Exome Aggregation Consortium ExAC ◽  
Aaron R Quinlan

The economy of human genome sequencing has catalyzed ambitious efforts to interrogate the genomes of large cohorts in search of new insight into the genetic basis of disease. This manuscript introduces Genotype Query Tools (GQT) as a new indexing strategy and toolset that addresses an analytical bottleneck by enabling interactive analyses based on genotypes, phenotypes and sample relationships. Speed improvements are achieved by operating directly on a compressed genotype index without decompression. GQT?s data compression ratios increase favorably with cohort size and relative analysis performance improves in kind. We demonstrate substantial performance improvements over state-of-theart tools using datasets from the 1000 Genomes Project (46 fold), the Exome Aggregation Consortium (443 fold), and simulated datasets of up to 100,000 genomes (218 fold). Furthermore, we show that this indexing strategy facilitates population and statistical genetics measures such as principal component analysis and burden tests. Based on its computational efficiency and by complementing existing toolsets, GQT provides a flexible framework for current and future analyses of massive genome datasets.

Author(s):  
J. M. Paque ◽  
R. Browning ◽  
P. L. King ◽  
P. Pianetta

Geological samples typically contain many minerals (phases) with multiple element compositions. A complete analytical description should give the number of phases present, the volume occupied by each phase in the bulk sample, the average and range of composition of each phase, and the bulk composition of the sample. A practical approach to providing such a complete description is from quantitative analysis of multi-elemental x-ray images.With the advances in recent years in the speed and storage capabilities of laboratory computers, large quantities of data can be efficiently manipulated. Commercial software and hardware presently available allow simultaneous collection of multiple x-ray images from a sample (up to 16 for the Kevex Delta system). Thus, high resolution x-ray images of the majority of the detectable elements in a sample can be collected. The use of statistical techniques, including principal component analysis (PCA), can provide insight into mineral phase composition and the distribution of minerals within a sample.


2002 ◽  
Vol 11 (3) ◽  
pp. 205-217 ◽  
Author(s):  
Brenda K. Smith Richards ◽  
Brenda N. Belton ◽  
Angela C. Poole ◽  
James J. Mancuso ◽  
Gary A. Churchill ◽  
...  

The present study investigated the inheritance of dietary fat, carbohydrate, and kilocalorie intake traits in an F2 population derived from an intercross between C57BL/6J (fat-preferring) and CAST/EiJ (carbohydrate-preferring) mice. Mice were phenotyped for self-selected food intake in a paradigm which provided for 10 days a choice between two macronutrient diets containing 78/22% of energy as a composite of either fat/protein or carbohydrate/protein. Quantitative trait locus (QTL) analysis identified six significant loci for macronutrient intake: three for fat intake on chromosomes (Chrs) 8 ( Mnif1), 18 ( Mnif2), and X ( Mnif3), and three for carbohydrate intake on Chrs 17 ( Mnic1), 6 ( Mnic2), and X ( Mnic3). An absence of interactions among these QTL suggests the existence of separate mechanisms controlling the intake of fat and carbohydrate. Two significant QTL for cumulative kilocalorie intake, adjusted for baseline body weight, were found on Chrs 17 ( Kcal1) and 18 ( Kcal2). Without body weight adjustment, another significant kcal locus appeared on distal Chr 2 ( Kcal3). These macronutrient and kilocalorie QTL, with the exception of loci on Chrs 8 and X, encompassed chromosomal regions influencing body weight gain and adiposity in this F2 population. These results provide new insight into the genetic basis of naturally occurring variation in nutrient intake phenotypes.


Sensors ◽  
2018 ◽  
Vol 18 (9) ◽  
pp. 2936 ◽  
Author(s):  
Xianghao Zhan ◽  
Xiaoqing Guan ◽  
Rumeng Wu ◽  
Zhan Wang ◽  
You Wang ◽  
...  

As alternative herbal medicine gains soar in popularity around the world, it is necessary to apply a fast and convenient means for classifying and evaluating herbal medicines. In this work, an electronic nose system with seven classification algorithms is used to discriminate between 12 categories of herbal medicines. The results show that these herbal medicines can be successfully classified, with support vector machine (SVM) and linear discriminant analysis (LDA) outperforming other algorithms in terms of accuracy. When principal component analysis (PCA) is used to lower the number of dimensions, the time cost for classification can be reduced while the data is visualized. Afterwards, conformal predictions based on 1NN (1-Nearest Neighbor) and 3NN (3-Nearest Neighbor) (CP-1NN and CP-3NN) are introduced. CP-1NN and CP-3NN provide additional, yet significant and reliable, information by giving the confidence and credibility associated with each prediction without sacrificing of accuracy. This research provides insight into the construction of a herbal medicine flavor library and gives methods and reference for future works.


Agronomy ◽  
2020 ◽  
Vol 10 (3) ◽  
pp. 423
Author(s):  
Yaolong Yang ◽  
Xin Xu ◽  
Mengchen Zhang ◽  
Qun Xu ◽  
Yue Feng ◽  
...  

The japonica rice in Northeast China is famous because of its high quality. Eating and cooking qualities (ECQs) are the most important factors that determine cooked rice quality. However, the genetic basis of ECQ of japonica varieties in Northeast China needs further study. In this study, 200 japonica varieties that are widely distributed in Northeast China were collected to evaluate the physicochemical indices of grain ECQs. The distribution of each trait was concentrated without large variations. Correlation analysis indicated that gel consistency (GC) had a significantly negative correlation with gelatinization temperature (GT). By integrating various analyses including kinship calculation, principal component analysis (PCA), linkage disequilibrium (LD) analysis, and original parent investigation, we found that the japonica varieties in Northeast China exhibited a narrow genetic basis. An association study for grain ECQs was performed and eight quantitative trait loci (QTLs) were detected. ALK was the major locus that regulated GT and also significantly affecting GC. Through the linkage disequilibrium (LD) and expression pattern analysis, one possible candidate gene (LOC_Os02g29980) was predicted and required further research for validation. Additionally, a different allele of Wx was identified in the variety CH4126, and ALK was not fixed in these japonica varieties. These results further elucidate the genetic basis of ECQs of japonica varieties in Northeast China and provide local breeders some assistance for improving ECQs of rice grain in rice breeding.


2020 ◽  
Author(s):  
Nian Liu ◽  
Li Huang ◽  
Weigang Chen ◽  
Bei Wu ◽  
Manish K. Pandey ◽  
...  

Abstract Background: Peanut is one of the primary sources for vegetable oil worldwide, and enhancing oil content is the main objective in several peanut breeding programs of the world. Tightly linked markers are required for faster development of high oil content peanut varieties through genomics-assisted breeding (GAB), and association mapping is one of the promising approaches for discovery of such associated markers. Results: An association mapping panel consisting of 292 peanut varieties extensively distributed in China was phenotyped for oil content and genotyped with 583 polymorphic SSR markers. These markers amplified 3663 alleles with an average of 6.28 alleles per locus. The structure, phylogenetic relationship, and principal component analysis (PCA) indicated two subgroups majorly differentiating based on geographic regions. Genome-wide association analysis identified 12 associated markers including one (AGGS1014_2) highly stable association controlling up to 9.94% phenotypic variance explained (PVE) across multiple environments. Interestingly, the frequency of the favorable alleles for 12 associated markers showed a geographic difference. Two associated markers (AGGS1014_2 and AHGS0798) with 6.90-9.94% PVE were verified to enhance oil content in an independent RIL population and also indicated selection during the breeding program. Conclusion: This study provided insights into the genetic basis of oil content in peanut and verified highly associated two SSR markers to facilitate marker-assisted selection for developing high-oil content breeding peanut varieties.


Animals ◽  
2020 ◽  
Vol 10 (8) ◽  
pp. 1309
Author(s):  
Veronika Kharzinova ◽  
Arsen Dotsev ◽  
Anastasiya Solovieva ◽  
Olga Sergeeva ◽  
Georgiy Bryzgalov ◽  
...  

To examine the genetic diversity and population structure of domestic reindeer, using the BovineHD BeadChip, we genotyped reindeer individuals belonging to the Nenets breed of the five main breeding regions, the Even breed of the Republic of Sakha, the Evenk breed of the Krasnoyarsk and Yakutia regions, and the Chukotka breed of the Chukotka region and its within-breed ecotype, namely, the Chukotka–Khargin, which is bred in Yakutia. The Chukotka reindeer was shown to have the lowest genetic diversity in terms of the allelic richness and heterozygosity indicators. The principal component analysis (PCA) results are consistent with the neighbor-net tree topology, dividing the reindeer into groups according to their habitat location and origin of the breed. Admixture analysis indicated a genetic structuring of two groups of Chukotka origin, the Even breed and most of the geographical groups of the Nenets breed, with the exception of the Murmansk reindeer, the gene pool of which was comprised of the Nenets and apparently the native Sami reindeer. The presence of a genetic component of the Nenets breed in some reindeer inhabiting the Krasnoyarsk region was detected. Our results provide a deeper insight into the current intra-breeding reindeer genetic diversity, which is an important requirement for future reindeer herding strategies and for animal adaptation to environmental changes.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
David Aguilar-Benitez ◽  
Inés Casimiro-Soriguer ◽  
Ana M. Torres

Abstract Pod dehiscence causes important yield losses in cultivated crops and therefore has been a key trait strongly selected against in crop domestication. In spite of the growing knowledge on the genetic basis of dehiscence in different crops, no information is available so far for faba bean. Here we conduct the first comprehensive study for faba bean pod dehiscence by combining, linkage mapping, comparative genomics, QTL analysis and histological examination of mature pods. Mapping of dehiscence-related genes revealed conservation of syntenic blocks among different legumes. Three QTLs were identified in faba bean chromosomes II, IV and VI, although none of them was stable across years. Histological analysis supports the convergent phenotypic evolution previously reported in cereals and related legume species but revealed a more complex pattern in faba bean. Contrary to common bean and soybean, the faba bean dehiscence zone appears to show functional equivalence to that described in crucifers. The lignified wall fiber layer, which is absent in the paucijuga primitive line Vf27, or less lignified and vacuolated in other dehiscent lines, appears to act as the major force triggering pod dehiscence in this species. While our findings, provide new insight into the mechanisms underlying faba bean dehiscence, full understanding of the molecular bases will require further studies combining precise phenotyping with genomic analysis.


2017 ◽  
Vol 63 (12) ◽  
pp. 961-969 ◽  
Author(s):  
Hui Xia ◽  
Qiongwei Tang ◽  
Jie Song ◽  
Jiang Ye ◽  
Haizhen Wu ◽  
...  

Small colony variants (SCVs) are a commonly observed subpopulation of bacteria that have a small colony size and distinctive biochemical characteristics. SCVs are more resistant than the wild type to some antibiotics and usually cause persistent infections in the clinic. SCV studies have been very active during the past 2 decades, especially Staphylococcus aureus SCVs. However, fewer studies on Escherichia coli SCVs exist, so we studied an E. coli SCV during an experiment involving the deletion of the yigP locus. PCR and DNA sequencing revealed that the SCV was attributable to a defect in the yigP function. Furthermore, we investigated the antibiotic resistance profile of the E. coli SCV and it showed increased erythromycin, kanamycin, and d-cycloserine resistance, but collateral sensitivity to ampicillin, polymyxin, chloramphenicol, tetracycline, rifampin, and nalidixic acid. We tried to determine the association between yigP and the pleiotropic antibiotic resistance of the SCV by analyzing biofilm formation, cellular morphology, and coenzyme Q (Q8) production. Our results indicated that impaired Q8biosynthesis was the primary factor that contributed to the increased resistance and collateral sensitivity of the SCV. This study offers a novel genetic basis for E. coli SCVs and an insight into the development of alternative antimicrobial strategies for clinical therapy.


2007 ◽  
Vol 13 (5) ◽  
pp. 623-634 ◽  
Author(s):  
J. Christopher Fromme ◽  
Mariella Ravazzola ◽  
Susan Hamamoto ◽  
Mohammed Al-Balwi ◽  
Wafaa Eyaid ◽  
...  
Keyword(s):  

Blood ◽  
2011 ◽  
Vol 118 (21) ◽  
pp. 273-273 ◽  
Author(s):  
Yasunobu Nagata ◽  
Masashi Sanada ◽  
Ayana Kon ◽  
Kenichi Yoshida ◽  
Yuichi Shiraishi ◽  
...  

Abstract Abstract 273 Myelodysplastic syndromes (MDS) are a heterogeneous group of myeloid neoplasms showing a frequent transition to acute myeloid leukemia. Although they are discriminated from de novo AML by the presence of a preleukemic period and dysplastic cell morphology, the difference in molecular genetics between both neoplasms has not been fully elucidated because of the similar spectrum of gene mutations. In this regards, the recent discovery of frequent pathway mutations (45∼90%) involving the RNA splicing machinery in MDS and related myeloid neoplasm with their rare mutation rate in de novo AML provided a novel insight into the distinct molecular pathogenesis of both neoplasms. Thus far, eight components of the RNA splicing machinery have been identified as the targets of gene mutations, among which U2AF35, SF3B1, SRSF2 and ZRSR2 show the highest mutation rates in MDS and CMML. Meanwhile, the frequency of mutations shows a substantial variation among disease subtypes, although the genetic/biological basis for these differences has not been clarified; SF3B1 mutations explain >90% of the spliceosome gene mutations in RARS and RCMD-RS, while mutations of U2AF35 and ZRSR2 are rare in these categories (< 5%) but common in CMML (16%) and MDS without increased ring sideroblasts (20%). On the other hand, SRSF2 mutations are most frequent in CMML (30%), compared with other subtypes (<10 %) (p<0.001) (Yoshida K, et al, unpublished data). So to obtain an insight into the genetic basis for these difference, we extensively explored spectrums of gene mutations in a set of 161 samples with MDS and related myeloid neoplasms, in which mutations of 10 genes thus far identified as major targets in MDS were examined and their frequencies were compared with regard to the species of mutated components of the splicing machinery. The mutation status of the 161 specimens was determined using the target exon enrichment followed by massively parallel sequencing. In total, 86 mutations were identified in 81(50%) in the 8 components of the splicing machinery. The mutations among 4 genes, U2AF35 (N = 20), SRSF2 (N = 31), SF3B1 (N = 15) and ZRSR2 (N = 10), explained most of the mutations with a much lower mutational rate for SF3A1 (N = 3), PRPF40B (N = 3), U2AF65 (N = 3) and SF1 (N = 1). Conspicuously, higher frequency 4 components of the splicing machinery were mutated in 76 out of the 161 cases (47.2%) in a mutually exclusive manner. On the other hand, 172 mutations of the 10 common targets were identified among 117, including 41 TET2 (25%), 32 RUNX1 (20%), 26 ASXL1 (16%), 24 RAS (NRAS/KRAS) (15%), 22 TP53 (14%), 17 IDH1/2 (10%), 10 CBL (6%) and 10 EZH2 (6%) mutations. We examined the difference between the major spliceosome mutations in terms of the number of the accompanying mutations in the 10 common gene targets. The possible bias from the difference in disease subtypes was compensated by multiple regressions. The SRSF2 mutations are more frequently associated with accompanying gene mutations with a significantly higher number of those mutations (N=29; OR 6.2; 95%CI 1.1–35) compared with that of the U2AF35 mutations (N=14) (p=0.038). Commonly involving the E/A splicing complexes, these splicing pathway mutations lead to compromised 3' splice site recognition. However, individual mutations may still have different impacts on cell functions, which could contribute to the determination of discrete disease phenotypes. It was demonstrated that SRSF2 was involved in the regulation of DNA stability and that depletion of SRSF2 can lead to DNA hypermutability, which may explain the higher number of accompanying gene mutation in SRSF2-mutated cases than cases with other spliceosome gene mutations. In conclusion, it may help to disclosing the genetic basis of MDS and related myeloid neoplasms that highly paralleled resequencing was confirmed SRSF2 mutated case significantly overlapped common mutations. Disclosures: No relevant conflicts of interest to declare.


Sign in / Sign up

Export Citation Format

Share Document