Parentage assignment with genotyping‐by‐sequencing data

Exclusion and genomic relatedness methods for assignment of parentage using genotyping-by-sequencing data

10.1101/582585 ◽

2019 ◽

Author(s):

K. G. Dodds ◽

J. C. McEwan ◽

R. Brauning ◽

T. C. van Stijn ◽

S. J. Rowe ◽

...

Keyword(s):

Genotyping By Sequencing ◽

Parentage Analysis ◽

Random Selection ◽

Parentage Assignment ◽

Sequencing Data ◽

Putative Parent ◽

The Difference ◽

Genomic Relatedness ◽

Selection Of ◽

Mismatch Rate

SummaryGenotypes are often used to assign parentage in agricultural and ecological settings. Sequencing can be used to obtain genotypes but does not provide unambiguous genotype calls, especially when sequencing depth is low in order to reduce costs. In that case, standard parentage analysis methods no longer apply. A strategy for using low-depth sequencing data for parentage assignment is developed here. It entails the use of relatedness estimates along with a metric termed excess mismatch rate which, for parent-offspring pairs or trios, is the difference between the observed mismatch rate and the rate expected under a model of inheritance and allele reads without error. When more than one putative parent has similar statistics, bootstrapping can provide a measure of the relatedness similarity. Putative parent-offspring trios can be further checked for consistency by comparing the offspring’s estimated inbreeding to half the parent relatedness. Suitable thresholds are required for each metric. These methods were applied to a deer breeding operation consisting of two herds of different breeds. Relatedness estimates were more in line with expectation when the herds were analysed separately than when combined, although this did not alter which parents were the best matches with each offspring. Parentage results were largely consistent with those based on a microsatellite parentage panel with three discordant parent assignments out of 1561. Two models are investigated to allow the parentage metrics to be calculated with non-random selection of alleles. The tools and strategies given here allow parentage to be assigned from low-depth sequencing data.

Download Full-text

Parentage assignment with genotyping-by-sequencing data

10.1101/270561 ◽

2018 ◽

Cited By ~ 3

Author(s):

Andrew Whalen ◽

Gregor Gorjanc ◽

John M Hickey

Keyword(s):

False Positive Rate ◽

Simulated Data ◽

Genotyping By Sequencing ◽

Unrelated Individual ◽

Parentage Assignment ◽

Sequencing Data ◽

High Coverage ◽

Array Data ◽

Assignment Method ◽

Low Coverage

AbstractIn this paper we evaluate using genotype-by-sequencing (GBS) data to perform parentage assignment in lieu of traditional array data. The use of GBS data raises two issues: First, for low-coverage GBS data, it may not be possible to call the genotype at many loci, a critical first step for detecting opposing homozygous markers. Second, the amount of sequencing coverage may vary across individuals, making it challenging to directly compare the likelihood scores between putative parents. To address these issues we extend the probabilistic framework of Huisman (2017) and evaluate putative parents by comparing their (potentially noisy) genotypes to a series of proposal distributions. These distributions describe the expected genotype probabilities for the relatives of an individual. We assign putative parents as a parent if they are classified as a parent (as opposed to e.g., an unrelated individual), and if the assignment score passes a threshold. We evaluated this method on simulated data and found that (1) high-coverage GBS data performs similarly to array data and requires only a small number of markers to correctly assign parents and (2) low-coverage GBS data (as low as 0.1x) can also be used, provided that it is obtained across a large number of markers. When analysing the low-coverage GBS data, we also found a high number of false positives if the true parent is not contained within the list of candidate parents, but that this false positive rate can be greatly reduced by hand tuning the assignment threshold. We provide this parentage assignment method as a standalone program called AlphaAssign.

Download Full-text

Exclusion and Genomic Relatedness Methods for Assignment of Parentage Using Genotyping-by-Sequencing Data

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400501 ◽

2019 ◽

Vol 9 (10) ◽

pp. 3239-3247 ◽

Cited By ~ 3

Author(s):

Ken G. Dodds ◽

John C. McEwan ◽

Rudiger Brauning ◽

Tracey C. van Stijn ◽

Suzanne J. Rowe ◽

...

Keyword(s):

Genotyping By Sequencing ◽

Parentage Analysis ◽

Random Selection ◽

Parentage Assignment ◽

Sequencing Data ◽

Putative Parent ◽

The Difference ◽

Genomic Relatedness ◽

Selection Of ◽

Mismatch Rate

Genotypes are often used to assign parentage in agricultural and ecological settings. Sequencing can be used to obtain genotypes but does not provide unambiguous genotype calls, especially when sequencing depth is low in order to reduce costs. In that case, standard parentage analysis methods no longer apply. A strategy for using low-depth sequencing data for parentage assignment is developed here. It entails the use of relatedness estimates along with a metric termed excess mismatch rate which, for parent-offspring pairs or trios, is the difference between the observed mismatch rate and the rate expected under a model of inheritance and allele reads without error. When more than one putative parent has similar statistics, bootstrapping can provide a measure of the relatedness similarity. Putative parent-offspring trios can be further checked for consistency by comparing the offspring’s estimated inbreeding to half the parent relatedness. Suitable thresholds are required for each metric. These methods were applied to a deer breeding operation consisting of two herds of different breeds. Relatedness estimates were more in line with expectation when the herds were analyzed separately than when combined, although this did not alter which parents were the best matches with each offspring. Parentage results were largely consistent with those based on a microsatellite parentage panel with three discordant parent assignments out of 1561. Two models are investigated to allow the parentage metrics to be calculated with non-random selection of alleles. The tools and strategies given here allow parentage to be assigned from low-depth sequencing data.

Download Full-text

Parentage Analysis in Giant Grouper (Epinephelus lanceolatus) Using Microsatellite and SNP Markers from Genotyping-by-Sequencing Data

Genes ◽

10.3390/genes12071042 ◽

2021 ◽

Vol 12 (7) ◽

pp. 1042

Author(s):

Zhuoying Weng ◽

Yang Yang ◽

Xi Wang ◽

Lina Wu ◽

Sijie Hua ◽

...

Keyword(s):

Fishery Management ◽

Genotyping By Sequencing ◽

Parentage Analysis ◽

Snp Markers ◽

Individual Identification ◽

Pedigree Information ◽

Nucleotide Polymorphisms ◽

Sequencing Data ◽

Polymorphic Snps ◽

Mixed Family

Pedigree information is necessary for the maintenance of diversity for wild and captive populations. Accurate pedigree is determined by molecular marker-based parentage analysis, which may be influenced by the polymorphism and number of markers, integrity of samples, relatedness of parents, or different analysis programs. Here, we described the first development of 208 single nucleotide polymorphisms (SNPs) and 11 microsatellites for giant grouper (Epinephelus lanceolatus) taking advantage of Genotyping-by-sequencing (GBS), and compared the power of SNPs and microsatellites for parentage and relatedness analysis, based on a mixed family composed of 4 candidate females, 4 candidate males and 289 offspring. CERVUS, PAPA and COLONY were used for mutually verification. We found that SNPs had a better potential for relatedness estimation, exclusion of non-parentage and individual identification than microsatellites, and > 98% accuracy of parentage assignment could be achieved by 100 polymorphic SNPs (MAF cut-off < 0.4) or 10 polymorphic microsatellites (mean Ho = 0.821, mean PIC = 0.651). This study provides a reference for the development of molecular markers for parentage analysis taking advantage of next-generation sequencing, and contributes to the molecular breeding, fishery management and population conservation.

Download Full-text

AFLAP: Assembly-Free Linkage Analysis Pipeline using k-mers from whole genome sequencing data

10.1101/2020.09.14.296525 ◽

2020 ◽

Author(s):

Kyle Fletcher ◽

Lin Zhang ◽

Juliana Gil ◽

Rongkui Han ◽

Keri Cavanaugh ◽

...

Keyword(s):

Linkage Analysis ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Genetic Map ◽

Genotyping By Sequencing ◽

Genetic Maps ◽

Whole Genome ◽

Sequencing Data ◽

Analysis Pipeline ◽

Genome Assemblies

AbstractBackgroundGenetic maps are an important resource for validation of genome assemblies, trait discovery, and breeding. Next generation sequencing has enabled production of high-density genetic maps constructed with 10,000s of markers. Most current approaches require a genome assembly to identify markers. Our Assembly Free Linkage Analysis Pipeline (AFLAP) removes this requirement by using uniquely segregating k-mers as markers to rapidly construct a genotype table and perform subsequent linkage analysis. This avoids potential biases including preferential read alignment and variant calling.ResultsThe performance of AFLAP was determined in simulations and contrasted to a conventional workflow. We tested AFLAP using 100 F2 individuals of Arabidopsis thaliana, sequenced to low coverage. Genetic maps generated using k-mers contained over 130,000 markers that were concordant with the genomic assembly. The utility of AFLAP was then demonstrated by generating an accurate genetic map using genotyping-by-sequencing data of 235 recombinant inbred lines of Lactuca spp. AFLAP was then applied to 83 F1 individuals of the oomycete Bremia lactucae, sequenced to >5x coverage. The genetic map contained over 90,000 markers ordered in 19 large linkage groups. This genetic map was used to fragment, order, orient, and scaffold the genome, resulting in a much-improved reference assembly.ConclusionsAFLAP can be used to generate high density linkage maps and improve genome assemblies of any organism when a mapping population is available using whole genome sequencing or genotyping-by-sequencing data. Genetic maps produced for B. lactucae were accurately aligned to the genome and guided significant improvements of the reference assembly.

Download Full-text

Ancient introgression between distantly related white oaks (Quercus sect Quercus) shows evidence of climate-associated asymmetric gene exchange

Journal of Heredity ◽

10.1093/jhered/esab053 ◽

2021 ◽

Author(s):

Scott T O’Donnell ◽

Sorel T Fitz-Gibbon ◽

Victoria L Sork

Keyword(s):

Gene Flow ◽

Genotyping By Sequencing ◽

Nucleotide Polymorphisms ◽

Sequencing Data ◽

Single Nucleotide ◽

Scrub Oak ◽

Population Genetic Inference ◽

Genetic Inference ◽

California Floristic Province ◽

Python Package

Abstract Ancient introgression can be an important source of genetic variation that shapes the evolution and diversification of many taxa. Here, we estimate the timing, direction and extent of gene flow between two distantly related oak species in the same section (Quercus sect. Quercus). We estimated these demographic events using genotyping by sequencing data (GBS), which generated 25,702 single nucleotide polymorphisms (SNPs) for 24 individuals of California scrub oak (Quercus berberidifolia) and 23 individuals of Engelmann oak (Q. engelmannii). We tested several scenarios involving gene flow between these species using the diffusion approximation-based population genetic inference framework and model-testing approach of the Python package DaDi. We found that the most likely demographic scenario includes a bottleneck in Q. engelmannii that coincides with asymmetric gene flow from Q. berberidifolia into Q. engelmannii. Given that the timing of this gene flow coincides with the advent of a Mediterranean-type climate in the California Floristic Province, we propose that changing precipitation patterns and seasonality may have favored the introgression of climate-associated genes from the endemic into the non-endemic California oak.

Download Full-text

Colonization history of the Canary Islands endemic Lavatera acerifolia, (Malvaceae) unveiled with genotyping‐by‐sequencing data and niche modelling

Journal of Biogeography ◽

10.1111/jbi.13808 ◽

2020 ◽

Vol 47 (4) ◽

pp. 993-1005 ◽

Cited By ~ 1

Author(s):

Irene Villa‐Machío ◽

Alejandro G. Fernández de Castro ◽

Javier Fuertes‐Aguilar ◽

Gonzalo Nieto Feliner

Keyword(s):

Canary Islands ◽

Genotyping By Sequencing ◽

Sequencing Data ◽

Niche Modelling ◽

Colonization History ◽

History Of

Download Full-text

GIbPSs: a toolkit for fast and accurate analyses of genotyping-by-sequencing data without a reference genome

Molecular Ecology Resources ◽

10.1111/1755-0998.12510 ◽

2016 ◽

Vol 16 (4) ◽

pp. 979-990 ◽

Cited By ~ 12

Author(s):

A. Hapke ◽

D. Thiele

Keyword(s):

Reference Genome ◽

Genotyping By Sequencing ◽

Sequencing Data

Download Full-text

Genetic sex assignment in wild populations using genotyping-by-sequencing data: A statistical threshold approach

Molecular Ecology Resources ◽

10.1111/1755-0998.12767 ◽

2018 ◽

Vol 18 (2) ◽

pp. 179-190 ◽

Cited By ~ 9

Author(s):

William R. Stovall ◽

Helen R. Taylor ◽

Michael Black ◽

Stefanie Grosser ◽

Kim Rutherford ◽

...

Keyword(s):

Genotyping By Sequencing ◽

Wild Populations ◽

Sequencing Data ◽

Sex Assignment ◽

Threshold Approach

Download Full-text

Short-read fastA files dataset from complexity-reduced genotyping by sequencing data of bacterial isolates from a public hospital in Australia

Data in Brief ◽

10.1016/j.dib.2019.104273 ◽

2019 ◽

Vol 25 ◽

pp. 104273

Author(s):

Berenice Talamantes-Becerra ◽

Jason Carling ◽

Karina Kennedy ◽

Michelle E. Gahan ◽

Arthur Georges

Keyword(s):

Public Hospital ◽

Genotyping By Sequencing ◽

Bacterial Isolates ◽

Sequencing Data ◽

Short Read

Download Full-text