Applying Small-Scale DNA Signatures as an Aid in Assembling Soybean Chromosome Sequences

Advances in Bioinformatics ◽

10.1155/2010/976792 ◽

2010 ◽

Vol 2010 ◽

pp. 1-7

Author(s):

Myron Peto ◽

David M. Grant ◽

Randy C. Shoemaker ◽

Steven B. Cannon

Keyword(s):

Quality Control ◽

Binding Energy ◽

Dna Binding ◽

Large Scale ◽

Repetitive Sequences ◽

Soybean Genome ◽

Small Scale ◽

Whole Genome ◽

Soybean Chromosome ◽

A Genome

Previous work has established a genomic signature based on relative counts of the 16 possible dinucleotides. Until now, it has been generally accepted that the dinucleotide signature is characteristic of a genome and is relatively homogeneous across a genome. However, we found some local regions of the soybean genome with a signature differing widely from that of the rest of the genome. Those regions were mostly centromeric and pericentromeric, and enriched for repetitive sequences. We found that DNA binding energy also presented large-scale patterns across soybean chromosomes. These two patterns were helpful during assembly and quality control of soybean whole genome shotgun scaffold sequences into chromosome pseudomolecules.

Download Full-text

Pharmaceutical Industry in Uganda: A Review of the Common GMP Non-conformances during Regulatory Inspections

10.5703/1288284317442 ◽

2021 ◽

Author(s):

Nasser Lubowa ◽

Zita Ekeocha ◽

Stephen Robert Byrn ◽

Kari L Clase

Keyword(s):

Quality Control ◽

Pharmaceutical Industry ◽

Large Scale ◽

Production Capacity ◽

Parameter Analysis ◽

Small Scale ◽

Middle Income ◽

Substandard Medicines ◽

The Status ◽

Pharmaceutical Industries

The prevalence of substandard medicines in Africa is high but not well documented. Low and Middle-Income Countries (LMICs) are likely to face considerable challenges with substandard medications. Africa faces inadequate drug regulatory practices, and in general, compliance with Good Manufacturing Practices (GMP) in most of the pharmaceutical industries is lacking. The majority of pharmaceutical manufacturers in developing countries are often overwhelmed by the GMP requirements and therefore are unable to operate in line with internationally acceptable standards. Non-conformances observed during regulatory inspections provide the status of the compliance to GMP requirements. The study aimed to identify the GMP non-conformances during regulatory inspections and gaps in the production of pharmaceuticals locally manufactured in Uganda by review of the available 50 GMP reports of 21 local pharmaceutical companies in Uganda from 2016. The binary logistic generalized estimating equations (GEE) model was applied to estimate the association between odds of a company failing to comply with the GMP requirements and non-conformances under each GMP inspection parameter. Analysis using dummy estimation to linear regression included determination of the relationship that existed between the selected variables (GMP inspection parameters) and the production capacity of the local pharmaceutical industry. Oral liquids, external liquid preparations, powders, creams, and ointments were the main categories of products manufactured locally. The results indicated that 86% of the non-conformances were major, 11% were minor, and 3% critical. The majority of the non-conformances were related to production (30.1%), documentation (24.5%), and quality control (17.6%). Regression results indicated that for every non-conformance under premises, equipment, and utilities, there was a 7-fold likelihood of the manufacturer failing to comply with the GMP standards (aOR=6.81, P=0.001). The results showed that major non-conformances were significantly higher in industries of small scale (B=6.77, P=0.02) and medium scale (B=8.40, P=0.04), as compared to those of large scale. This study highlights the failures in quality assurance systems and stagnated GMP improvements in these industries that need to be addressed by the manufacturers with support from the regulator. The addition of risk assessment to critical production and quality control operations and establishment of appropriate corrective and preventive actions as part of quality management systems are required to ensure that quality pharmaceuticals are manufactured locally.

Download Full-text

Quality control challenges post covid-19 crisis: an integrated IoT and IoP approach

Journal of Physics Conference Series ◽

10.1088/1742-6596/2161/1/012001 ◽

2022 ◽

Vol 2161 (1) ◽

pp. 012001

Author(s):

Harshit Sharma ◽

G Sumathi

Keyword(s):

Quality Control ◽

Six Sigma ◽

Damage Assessment ◽

Large Scale ◽

Smart Manufacturing ◽

Small Scale ◽

Industrial Iot ◽

Customer Trust ◽

Near Future

Abstract The Covid -19 is arguably the biggest pandemic in history and there are a lot of challenges that must be dealt with. One of the biggest challenges post Covid-19 is to tackle quality control challenges. This research paper discusses some of these challenges and solutions using an integrated internet of things (IoT) and internet of protocols (IoP) based approach and further showing its implementation in the industry world and hence, proving to be a solution for damage assessment. With the help of IoT- enabled quality control system, six-sigma rule is also analysed. Post Covid crisis, it is important for every institution to gain back customer trust so quality of materials should be maintained and IoT enables us to do the same. The unification of industrial IoT (IIoT) and industry 4.0 is also discussed as it leads us to understand that this unification is the next evolution of smart manufacturing and digital technologies. This methodology can lead us to accelerated innovation in applications for overcoming the eventual challenges post Covid in the near future. Also, small-scale/large-scale companies making use of the above research methodology can adhere to six-sigma criterion.

Download Full-text

Resolving the Full Spectrum of Human Genome Variation using Linked-Reads

10.1101/230946 ◽

2017 ◽

Cited By ~ 8

Author(s):

Patrick Marks ◽

Sarah Garcia ◽

Alvaro Martinez Barrio ◽

Kamila Belhocine ◽

Jorge Bernate ◽

...

Keyword(s):

Human Genome ◽

Large Scale ◽

De Novo ◽

Simultaneous Detection ◽

Whole Genome ◽

Structural Variations ◽

Full Spectrum ◽

Short Read ◽

Short Reads ◽

A Genome

AbstractLarge-scale population based analyses coupled with advances in technology have demonstrated that the human genome is more diverse than originally thought. To date, this diversity has largely been uncovered using short read whole genome sequencing. However, standard short-read approaches, used primarily due to accuracy, throughput and costs, fail to give a complete picture of a genome. They struggle to identify large, balanced structural events, cannot access repetitive regions of the genome and fail to resolve the human genome into its two haplotypes. Here we describe an approach that retains long range information while harnessing the advantages of short reads. Starting from only ∼1ng of DNA, we produce barcoded short read libraries. The use of novel informatic approaches allows for the barcoded short reads to be associated with the long molecules of origin producing a novel datatype known as ‘Linked-Reads’. This approach allows for simultaneous detection of small and large variants from a single Linked-Read library. We have previously demonstrated the utility of whole genome Linked-Reads (lrWGS) for performing diploid, de novo assembly of individual genomes (Weisenfeld et al. 2017). In this manuscript, we show the advantages of Linked-Reads over standard short read approaches for reference based analysis. We demonstrate the ability of Linked-Reads to reconstruct megabase scale haplotypes and to recover parts of the genome that are typically inaccessible to short reads, including phenotypically important genes such as STRC, SMN1 and SMN2. We demonstrate the ability of both lrWGS and Linked-Read Whole Exome Sequencing (lrWES) to identify complex structural variations, including balanced events, single exon deletions, and single exon duplications. The data presented here show that Linked-Reads provide a scalable approach for comprehensive genome analysis that is not possible using short reads alone.

Download Full-text

A fully automated approach for quality control of cancer mutations in the era of high-resolution whole genome sequencing

10.1101/2021.02.13.429885 ◽

2021 ◽

Author(s):

Jacob Househam ◽

William CH Cross ◽

Giulio Caravagna

Keyword(s):

Quality Control ◽

Data Quality ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Large Scale ◽

Control Method ◽

Point Mutations ◽

Basic Research ◽

Whole Genome ◽

Colorectal Cancer Patients

AbstractCancer is a global health issue that places enormous demands on healthcare systems. Basic research, the development of targeted treatments, and the utility of DNA sequencing in clinical settings, have been significantly improved with the introduction of whole genome sequencing. However the broad applications of this technology come with complications. To date there has been very little standardisation in how data quality is assessed, leading to inconsistencies in analyses and disparate conclusions. Manual checking and complex consensus calling strategies often do not scale to large sample numbers, which leads to procedural bottlenecks. To address this issue, we present a quality control method that integrates point mutations, copy numbers, and other metrics into a single quantitative score. We demonstrate its power on 1,065 whole-genomes from a large-scale pan-cancer cohort, and on multi-region data of two colorectal cancer patients. We highlight how our approach significantly improves the generation of cancer mutation data, providing visualisations for cross-referencing with other analyses. Our approach is fully automated, designed to work downstream of any bioinformatic pipeline, and can automatise tool parameterization paving the way for fast computational assessment of data quality in the era of whole genome sequencing.

Download Full-text

Customized optical mapping by CRISPR–Cas9 mediated DNA labeling with multiple sgRNAs

Nucleic Acids Research ◽

10.1093/nar/gkaa1088 ◽

2020 ◽

Author(s):

Heba Z Abid ◽

Eleanor Young ◽

Jennifer McCaffrey ◽

Kaitlin Raseley ◽

Dharma Varapula ◽

...

Keyword(s):

Genome Mapping ◽

Repetitive Sequences ◽

Genomic Region ◽

Sequence Motif ◽

Whole Genome ◽

Long Distance ◽

Dna Labeling ◽

Guide Rnas ◽

Multiple Loci ◽

A Genome

Abstract Whole-genome mapping technologies have been developed as a complementary tool to provide scaffolds for genome assembly and structural variation analysis (1,2). We recently introduced a novel DNA labeling strategy based on a CRISPR–Cas9 genome editing system, which can target any 20bp sequences. The labeling strategy is specifically useful in targeting repetitive sequences, and sequences not accessible to other labeling methods. In this report, we present customized mapping strategies that extend the applications of CRISPR–Cas9 DNA labeling. We first design a CRISPR–Cas9 labeling strategy to interrogate and differentiate the single allele differences in NGG protospacer adjacent motifs (PAM sequence). Combined with sequence motif labeling, we can pinpoint the single-base differences in highly conserved sequences. In the second strategy, we design mapping patterns across a genome by selecting sets of specific single-guide RNAs (sgRNAs) for labeling multiple loci of a genomic region or a whole genome. By developing and optimizing a single tube synthesis of multiple sgRNAs, we demonstrate the utility of CRISPR–Cas9 mapping with 162 sgRNAs targeting the 2Mb Haemophilus influenzae chromosome. These CRISPR–Cas9 mapping approaches could be particularly useful for applications in defining long-distance haplotypes and pinpointing the breakpoints in large structural variants in complex genomes and microbial mixtures.

Download Full-text

A large-scale whole-genome sequencing analysis reveals highly specific genome editing by both Cas9 and Cpf1 nucleases in rice

10.1101/292086 ◽

2018 ◽

Cited By ~ 2

Author(s):

Xu Tang ◽

Guanqing Liu ◽

Jianping Zhou ◽

Qiurong Ren ◽

Qi You ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Editing ◽

Genome Sequencing ◽

Large Scale ◽

High Specificity ◽

Whole Genome ◽

Sequencing Analysis ◽

Guide Rnas ◽

A Genome ◽

Multiple Sample

Targeting specificity has been an essential issue for applying genome editing systems in functional genomics, precise medicine and plant breeding. Understanding the scope of off-target mutations in Cas9 or Cpf1-edited crops is critical for research and regulation. In plants, only limited studies had used whole-genome sequencing (WGS) to test off-target effects of Cas9. However, the cause of numerous discovered mutations is still controversial. Furthermore, WGS based off-target analysis of Cpf1 has not been reported in any higher organism to date. Here, we conducted a WGS analysis of 34 plants edited by Cas9 and 15 plants edited by Cpf1 in T0 and T1 generations along with 20 diverse control plants in rice, a major food crop with a genome size of ~380 Mb. The sequencing depth ranged from 45X to 105X with reads mapping rate above 96%. Our results clearly show that most mutations in edited plants were created by tissue culture process, which caused ~102 to 148 single nucleotide variations (SNVs) and ~32 to 83 insertions/deletions (indels) per plant. Among 12 Cas9 single guide RNAs (sgRNAs) and 3 Cpf1 CRISPR RNAs (crRNAs) assessed by WGS, only one Cas9 sgRNA resulted in off-target mutations in T0 lines at sites predicted by computer programs. Moreover, we cannot find evidence for bona fide off-target mutations due to continued expression of Cas9 or Cpf1 with guide RNAs in T1 generation. Taken together, our comprehensive and rigorous analysis of WGS big data across multiple sample types suggests both Cas9 and Cpf1 nucleases are very specific in generating targeted DNA modifications and off-targeting can be avoided by designing guide RNAs with high specificity.

Download Full-text

Evolution of binding preferences among whole-genome duplicated transcription factors

10.1101/2021.07.27.453962 ◽

2021 ◽

Author(s):

Tamar Gera ◽

Felix Jonas ◽

Roye More ◽

Naama Barkai

Keyword(s):

Transcription Factors ◽

Dna Binding ◽

Genome Duplication ◽

Transcriptional Networks ◽

Whole Genome ◽

Dna Binding Domains ◽

Binding Domains ◽

A Genome ◽

Differential Binding ◽

Genome Scale

Throughout evolution, new transcription factors (TFs) emerge by gene duplication, promoting growth and rewiring of transcriptional networks. How TF duplicates diverge is known for only a few studied cases. To provide a genome-scale view, we considered the 35% of budding yeast TFs, classified as whole-genome duplication (WGD)-retained paralogs. Using high-resolution profiling, we find that ~60% of paralogs evolved differential binding preferences. We show that this divergence results primarily from variations outside the DNA binding domains (DBDs), while DBD preferences remain largely conserved. Analysis of non-WGD orthologs revealed that ancestral preferences are unevenly split between duplicates, while new targets are acquired preferentially by the least conserved paralog (biased sub/neo-functionalization). Dimer-forming paralogs evolved mostly one-sided dependency, while other paralogs interacted through low-magnitude DNA-binding competition that minimized paralog interference. We discuss the implications of our findings for the evolutionary design of transcriptional networks.

Download Full-text

Zanthoxylum-specific whole genome duplication and recent activity of transposable elements in the highly repetitive paleotetraploid Z. bungeanum genome

Horticulture Research ◽

10.1038/s41438-021-00665-1 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Shijing Feng ◽

Zhenshan Liu ◽

Jian Cheng ◽

Zihe Li ◽

Lu Tian ◽

...

Keyword(s):

Whole Genome Duplication ◽

Genome Duplication ◽

Repetitive Sequences ◽

Close Relative ◽

Specific Gene ◽

Gene Gain ◽

Whole Genome ◽

Genome Duplication Event ◽

A Genome ◽

The Impact

AbstractZanthoxylum bungeanum is an important spice and medicinal plant that is unique for its accumulation of abundant secondary metabolites, which create a characteristic aroma and tingling sensation in the mouth. Owing to the high proportion of repetitive sequences, high heterozygosity, and increased chromosome number of Z. bungeanum, the assembly of its chromosomal pseudomolecules is extremely challenging. Here, we present a genome sequence for Z. bungeanum, with a dramatically expanded size of 4.23 Gb, assembled into 68 chromosomes. This genome is approximately tenfold larger than that of its close relative Citrus sinensis. After the divergence of Zanthoxylum and Citrus, the lineage-specific whole-genome duplication event η-WGD approximately 26.8 million years ago (MYA) and the recent transposable element (TE) burst ~6.41 MYA account for the substantial genome expansion in Z. bungeanum. The independent Zanthoxylum-specific WGD event was followed by numerous fusion/fission events that shaped the genomic architecture. Integrative genomic and transcriptomic analyses suggested that prominent species-specific gene family expansions and changes in gene expression have shaped the biosynthesis of sanshools, terpenoids, and anthocyanins, which contribute to the special flavor and appearance of Z. bungeanum. In summary, the reference genome provides a valuable model for studying the impact of WGDs with recent TE activity on gene gain and loss and genome reconstruction and provides resources to accelerate Zanthoxylum improvement.

Download Full-text

Under-representation of repetitive sequences in whole-genome shotgun sequence databases: an illustration using a recently acquired transposable element

Genome ◽

10.1139/g11-088 ◽

2012 ◽

Vol 55 (2) ◽

pp. 172-175 ◽

Cited By ~ 3

Author(s):

Akihiko Koga

Keyword(s):

Southern Blot Analysis ◽

Repetitive Sequences ◽

Whole Genome Shotgun ◽

Whole Genome ◽

Sequence Database ◽

Shotgun Sequence ◽

Whole Genome Shotgun Sequence ◽

A Genome ◽

Genome Shotgun Sequence ◽

Sequence Databases

It is widely accepted in a conceptual framework that repetitive sequences, especially those with high sequence homogeneity among copies, tend to be under-represented in whole-genome shotgun sequence databases, because of the difficulty of assembling sequence reads into contigs. Although this is easily inferred, there is no quantitative illsutration of this phenomenon. An example using a currently used database is expected to contribute to the intuitive understanding of how serious the under-representation is. The present study provides the first quantitative example (in the case of 16 copies of virtually identical, 4.7-kb sequences in a genome of 7 × 10 8 bp) by comparing the results of BLAST searches of a sequence database (contig N50; 9.8 kb) with those of Southern blot analysis of genomic DNA. This has revealed that the internal regions of the repetitive sequences are under-represented to a striking extent.

Download Full-text

DeepVariant-on-Spark: Small-Scale Genome Analysis Using a Cloud-Based Computing Framework

Computational and Mathematical Methods in Medicine ◽

10.1155/2020/7231205 ◽

2020 ◽

Vol 2020 ◽

pp. 1-7

Author(s):

Po-Jung Huang ◽

Jui-Huan Chang ◽

Hou-Hsien Lin ◽

Yu-Xuan Li ◽

Chi-Ching Lee ◽

...

Keyword(s):

Genome Analysis ◽

Genetic Variants ◽

Large Scale ◽

Sequence Data ◽

Classification Model ◽

Whole Genome Sequence ◽

Small Scale ◽

Whole Genome ◽

Gold Standard Method ◽

Computing Framework

Although sequencing a human genome has become affordable, identifying genetic variants from whole-genome sequence data is still a hurdle for researchers without adequate computing equipment or bioinformatics support. GATK is a gold standard method for the identification of genetic variants and has been widely used in genome projects and population genetic studies for many years. This was until the Google Brain team developed a new method, DeepVariant, which utilizes deep neural networks to construct an image classification model to identify genetic variants. However, the superior accuracy of DeepVariant comes at the cost of computational intensity, largely constraining its applications. Accordingly, we present DeepVariant-on-Spark to optimize resource allocation, enable multi-GPU support, and accelerate the processing of the DeepVariant pipeline. To make DeepVariant-on-Spark more accessible to everyone, we have deployed the DeepVariant-on-Spark to the Google Cloud Platform (GCP). Users can deploy DeepVariant-on-Spark on the GCP following our instruction within 20 minutes and start to analyze at least ten whole-genome sequencing datasets using free credits provided by the GCP. DeepVaraint-on-Spark is freely available for small-scale genome analysis using a cloud-based computing framework, which is suitable for pilot testing or preliminary study, while reserving the flexibility and scalability for large-scale sequencing projects.

Download Full-text