scholarly journals Choosing subsamples for sequencing studies by minimizing the average distance to the closest leaf

2015 ◽  
Author(s):  
Jonathan T. L. Kang ◽  
Peng Zhang ◽  
Sebastian Zöllner ◽  
Noah A. Rosenberg

Imputation of genotypes in a study sample can make use of sequenced or densely genotyped external reference panels consisting of individuals that are not from the study sample. It can also employ internal reference panels, incorporating a subset of individuals from the study sample itself. Internal panels offer an advantage over external panels, as they can reduce imputation errors arising from genetic dissimilarity between a population of interest and a second, distinct population from which the external reference panel has been constructed. As the cost of next-generation sequencing decreases, internal reference panel selection is becoming increasingly feasible. However, it is not clear how best to select individuals to include in such panels. We introduce a new method for selecting an internal reference panel???minimizing the average distance to the closest leaf (ADCL)???and compare its performance relative to an earlier algorithm: maximizing phylogenetic diversity (PD). Employing both simulated data and sequences from the 1000 Genomes Project, we show that ADCL provides a significant improvement in imputation accuracy, especially for imputation of sites with low-frequency alleles. This improvement in imputation accuracy is robust to changes in reference panel size, marker density, and length of the imputation target region.

2020 ◽  
Vol 60 (8) ◽  
pp. 999
Author(s):  
Lianjie Hou ◽  
Wenshuai Liang ◽  
Guli Xu ◽  
Bo Huang ◽  
Xiquan Zhang ◽  
...  

Low-density single-nucleotide polymorphism (LD-SNP) panel is one effective way to reduce the cost of genomic selection in animal breeding. The present study proposes a new type of LD-SNP panel called mixed low-density (MLD) panel, which considers SNPs with a substantial effect estimated by Bayes method B (BayesB) from many traits and evenly spaced distribution simultaneously. Simulated and real data were used to compare the imputation accuracy and genomic-selection accuracy of two types of LD-SNP panels. The result of genotyping imputation for simulated data showed that the number of quantitative trait loci (QTL) had limited influence on the imputation accuracy only for MLD panels. Evenly spaced (ELD) panel was not affected by QTL. For real data, ELD performed slightly better than did MLD when panel contained 500 and 1000 SNP. However, this advantage vanished quickly as the density increased. The result of genomic selection for simulated data using BayesB showed that MLD performed much better than did ELD when QTL was 100. For real data, MLD also outperformed ELD in growth and carcass traits when using BayesB. In conclusion, the MLD strategy is superior to ELD in genomic selection under most situations.


2019 ◽  
Author(s):  
Seong-Keun Yoo ◽  
Chang-Uk Kim ◽  
Hie Lim Kim ◽  
Sungjae Kim ◽  
Jong-Yeon Shin ◽  
...  

AbstractGenotype imputation using the reference panel is a cost-effective strategy to fill millions of missing genotypes for the purpose of various genetic analyses. Here, we present the Northeast Asian Reference Database (NARD), including whole-genome sequencing data of 1,781 individuals from Korea, Mongolia, Japan, China, and Hong Kong. NARD provides the genetic diversities of Korean (n=850) and Mongolian (n=386) ancestries that were not present in the 1000 Genomes Project Phase 3 (1KGP3). We combined and re-phased the genotypes from NARD and 1KGP3 to construct a union set of haplotypes. This approach established a robust imputation reference panel for the Northeast Asian populations, which yields the greatest imputation accuracy of rare and low-frequency variants compared with the existing panels. Also, we illustrate that NARD can potentially improve disease variant discovery by reducing pathogenic candidates. Overall, this study provides a decent reference panel for the genetic studies in Northeast Asia.


2017 ◽  
Vol 25 (7) ◽  
pp. 869-876 ◽  
Author(s):  
Mario Mitt ◽  
Mart Kals ◽  
Kalle Pärn ◽  
Stacey B Gabriel ◽  
Eric S Lander ◽  
...  

2019 ◽  
Vol 11 (1) ◽  
Author(s):  
Seong-Keun Yoo ◽  
Chang-Uk Kim ◽  
Hie Lim Kim ◽  
Sungjae Kim ◽  
Jong-Yeon Shin ◽  
...  

Abstract Here, we present the Northeast Asian Reference Database (NARD), including whole-genome sequencing data of 1779 individuals from Korea, Mongolia, Japan, China, and Hong Kong. NARD provides the genetic diversity of Korean (n = 850) and Mongolian (n = 384) ancestries that were not present in the 1000 Genomes Project Phase 3 (1KGP3). We combined and re-phased the genotypes from NARD and 1KGP3 to construct a union set of haplotypes. This approach established a robust imputation reference panel for Northeast Asians, which yields the greatest imputation accuracy of rare and low-frequency variants compared with the existing panels. NARD imputation panel is available at https://nard.macrogen.com/.


2007 ◽  
Vol 38 (7) ◽  
pp. 11-17
Author(s):  
Ronald M. Aarts

Conventionally, the ultimate goal in loudspeaker design has been to obtain a flat frequency response over a specified frequency range. This can be achieved by carefully selecting the main loudspeaker parameters such as the enclosure volume, the cone diameter, the moving mass and the very crucial “force factor”. For loudspeakers in small cabinets the results of this design procedure appear to be quite inefficient, especially at low frequencies. This paper describes a new solution to this problem. It consists of the combination of a highly non-linear preprocessing of the audio signal and the use of a so called low-force-factor loudspeaker. This combination yields a strongly increased efficiency, at least over a limited frequency range, at the cost of a somewhat altered sound quality. An analytically tractable optimality criterion has been defined and has been verified by the design of an experimental loudspeaker. This has a much higher efficiency and a higher sensitivity than current low-frequency loudspeakers, while its cabinet can be much smaller.


2021 ◽  
Vol 12 ◽  
Author(s):  
Yang Yang ◽  
Hongli Tian ◽  
Rui Wang ◽  
Lu Wang ◽  
Hongmei Yi ◽  
...  

Molecular marker technology is used widely in plant variety discrimination, molecular breeding, and other fields. To lower the cost of testing and improve the efficiency of data analysis, molecular marker screening is very important. Screening usually involves two phases: the first to control loci quality and the second to reduce loci quantity. To reduce loci quantity, an appraisal index that is very sensitive to a specific scenario is necessary to select loci combinations. In this study, we focused on loci combination screening for plant variety discrimination. A loci combination appraisal index, variety discrimination power (VDP), is proposed, and three statistical methods, probability-based VDP (P-VDP), comparison-based VDP (C-VDP), and ratio-based VDP (R-VDP), are described and compared. The results using the simulated data showed that VDP was sensitive to statistical populations with convergence toward the same variety, and the total probability of discrimination power (TDP) method was effective only for partial populations. R-VDP was more sensitive to statistical populations with convergence toward various varieties than P-VDP and C-VDP, which both had the same sensitivity; TDP was not sensitive at all. With the real data, R-VDP values for sorghum, wheat, maize and rice data begin to show downward tendency when the number of loci is 20, 7, 100, 100 respectively, while in the case of P-VDP and C-VDP (which have the same results), the number is 6, 4, 9, 19 respectively and in the case of TDP, the number is 6, 4, 4, 11 respectively. For the variety threshold setting, R-VDP values of loci combinations with different numbers of loci responded evenly to different thresholds. C-VDP values responded unevenly to different thresholds, and the extent of the response increased as the number of loci decreased. All the methods gave underestimations when data were missing, with systematic errors for TDP, C-VDP, and R-VDP going from smallest to biggest. We concluded that VDP was a better loci combination appraisal index than TDP for plant variety discrimination and the three VDP methods have different applications. We developed the software called VDPtools, which can calculate the values of TDP, P-VDP, C-VDP, and R-VDP. VDPtools is publicly available athttps://github.com/caurwx1/VDPtools.git.


2022 ◽  
Author(s):  
Lars Wienbrandt ◽  
David Ellinghaus

Background: Reference-based phasing and genotype imputation algorithms have been developed with sublinear theoretical runtime behaviour, but runtimes are still high in practice when large genome-wide reference datasets are used. Methods: We developed EagleImp, a software with algorithmic and technical improvements and new features for accurate and accelerated phasing and imputation in a single tool. Results: We compared accuracy and runtime of EagleImp with Eagle2, PBWT and prominent imputation servers using whole-genome sequencing data from the 1000 Genomes Project, the Haplotype Reference Consortium and simulated data with more than 1 million reference genomes. EagleImp is 2 to 10 times faster (depending on the single or multiprocessor configuration selected) than Eagle2/PBWT, with the same or better phasing and imputation quality in all tested scenarios. For common variants investigated in typical GWAS studies, EagleImp provides same or higher imputation accuracy than the Sanger Imputation Service, Michigan Imputation Server and the newly developed TOPMed Imputation Server, despite larger (not publicly available) reference panels. It has many new features, including automated chromosome splitting and memory management at runtime to avoid job aborts, fast reading and writing of large files, and various user-configurable algorithm and output options. Conclusions: Due to the technical optimisations, EagleImp can perform fast and accurate reference-based phasing and imputation for future very large reference panels with more than 1 million genomes. EagleImp is freely available for download from https://github.com/ikmb/eagleimp.


2017 ◽  
Vol 59 (2) ◽  
pp. 189-194 ◽  
Author(s):  
Choongbeom Choi ◽  
Sung Jun Joe ◽  
Anna S. Mattila

Empirical research shows that customers form price evaluations by comparing the actual price with a reference price. The relative use of an internal reference price (IRP) versus an external reference price (ERP) is an important issue in the lodging industry due to the popularity of price-comparison–based advertising. Although prior literature shows that demographic factors influence the relative use of IRP and ERP, the impact of gender on the relationship between reference prices and price evaluations has received scant attention in both hospitality and marketing contexts. Drawing on the agency-communal theory, the current research examines the effect of gender on the use of IRP and ERP in price evaluations. The findings indicate that males are more susceptible to IRP than to ERP, whereas females are only influenced by ERP. Relevant managerial implications are drawn in terms of pricing and promotional strategies.


2021 ◽  
pp. 1-29
Author(s):  
Lisa Lorentz ◽  
Kaian Unwalla ◽  
David I. Shore

Abstract Successful interaction with our environment requires accurate tactile localization. Although we seem to localize tactile stimuli effortlessly, the processes underlying this ability are complex. This is evidenced by the crossed-hands deficit, in which tactile localization performance suffers when the hands are crossed. The deficit results from the conflict between an internal reference frame, based in somatotopic coordinates, and an external reference frame, based in external spatial coordinates. Previous evidence in favour of the integration model employed manipulations to the external reference frame (e.g., blindfolding participants), which reduced the deficit by reducing conflict between the two reference frames. The present study extends this finding by asking blindfolded participants to visually imagine their crossed arms as uncrossed. This imagery manipulation further decreased the magnitude of the crossed-hands deficit by bringing information in the two reference frames into alignment. This imagery manipulation differentially affected males and females, which was consistent with the previously observed sex difference in this effect: females tend to show a larger crossed-hands deficit than males and females were more impacted by the imagery manipulation. Results are discussed in terms of the integration model of the crossed-hands deficit.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Dimitrios Kleftogiannis ◽  
Danliang Ho ◽  
Jun Xian Liew ◽  
Polly S. Y. Poon ◽  
Anna Gan ◽  
...  

Abstract Analysis of circulating cell-free DNA (cfDNA) has opened new opportunities for characterizing tumour mutational landscapes with many applications in genomic-driven oncology. We developed a customized targeted cfDNA sequencing approach for breast cancer (BC) using unique molecular identifiers (UMIs) for error correction. Our assay, spanning a 284.5 kb target region, is combined with a novel freely-licensed bioinformatics pipeline that provides detection of low-frequency variants, and reliable identification of copy number variations (CNVs) directly from plasma DNA. We first evaluated our pipeline on reference samples. Then in a cohort of 35 BC patients our approach detected actionable driver and clonal variants at low variant frequency levels in cfDNA that were concordant (77%) with sequencing of primary and/or metastatic solid tumour sites. We also detected ERRB2 gene CNVs used for HER2 subtype classification with 80% precision compared to immunohistochemistry. Further, we evaluated fragmentation profiles of cfDNA in BC and observed distinct differences compared to data from healthy individuals. Our results show that the developed assay addresses the majority of tumour associated aberrations directly from plasma DNA, and thus may be used to elucidate genomic alterations in liquid biopsy studies.


Sign in / Sign up

Export Citation Format

Share Document