Purple: A Computational Workflow for Strategic Selection of Peptides for Viral Diagnostics Using MS-Based Targeted Proteomics

Johanna Lechner; Felix Hartkopf; Pauline Hiort; Andreas Nitsche; Marica Grossegesse; Joerg Doellinger; Bernhard Y. Renard; Thilo Muth

doi:10.3390/v11060536

Purple: A Computational Workflow for Strategic Selection of Peptides for Viral Diagnostics Using MS-Based Targeted Proteomics

Viruses ◽

10.3390/v11060536 ◽

2019 ◽

Vol 11 (6) ◽

pp. 536 ◽

Cited By ~ 3

Author(s):

Johanna Lechner ◽

Felix Hartkopf ◽

Pauline Hiort ◽

Andreas Nitsche ◽

Marica Grossegesse ◽

...

Keyword(s):

Sequence Data ◽

Software Tool ◽

High Sensitivity ◽

Search Space ◽

Candidate Selection ◽

Virus Species ◽

Targeted Proteomics ◽

Global Threat ◽

Peptide Candidate ◽

Viral Diagnostics

Emerging virus diseases present a global threat to public health. To detect viral pathogens in time-critical scenarios, accurate and fast diagnostic assays are required. Such assays can now be established using mass spectrometry-based targeted proteomics, by which viral proteins can be rapidly detected from complex samples down to the strain-level with high sensitivity and reproducibility. Developing such targeted assays involves tedious steps of peptide candidate selection, peptide synthesis, and assay optimization. Peptide selection requires extensive preprocessing by comparing candidate peptides against a large search space of background proteins. Here we present Purple (Picking unique relevant peptides for viral experiments), a software tool for selecting target-specific peptide candidates directly from given proteome sequence data. It comes with an intuitive graphical user interface, various parameter options and a threshold-based filtering strategy for homologous sequences. Purple enables peptide candidate selection across various taxonomic levels and filtering against backgrounds of varying complexity. Its functionality is demonstrated using data from different virus species and strains. Our software enables to build taxon-specific targeted assays and paves the way to time-efficient and robust viral diagnostics using targeted proteomics.

Download Full-text

On-Shelf Utility Mining of Sequence Data

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3457570 ◽

2021 ◽

Vol 16 (2) ◽

pp. 1-31

Author(s):

Chunkai Zhang ◽

Zilin Du ◽

Yuting Yang ◽

Wensheng Gan ◽

Philip S. Yu

Keyword(s):

High Efficiency ◽

Sequence Data ◽

Real Life ◽

Search Space ◽

Upper Bounds ◽

Utility Mining ◽

Limited Memory ◽

Time Periods ◽

High Utility ◽

Synthetic Datasets

Utility mining has emerged as an important and interesting topic owing to its wide application and considerable popularity. However, conventional utility mining methods have a bias toward items that have longer on-shelf time as they have a greater chance to generate a high utility. To eliminate the bias, the problem of on-shelf utility mining (OSUM) is introduced. In this article, we focus on the task of OSUM of sequence data, where the sequential database is divided into several partitions according to time periods and items are associated with utilities and several on-shelf time periods. To address the problem, we propose two methods, OSUM of sequence data (OSUMS) and OSUMS + , to extract on-shelf high-utility sequential patterns. For further efficiency, we also design several strategies to reduce the search space and avoid redundant calculation with two upper bounds time prefix extension utility ( TPEU ) and time reduced sequence utility ( TRSU ). In addition, two novel data structures are developed for facilitating the calculation of upper bounds and utilities. Substantial experimental results on certain real and synthetic datasets show that the two methods outperform the state-of-the-art algorithm. In conclusion, OSUMS may consume a large amount of memory and is unsuitable for cases with limited memory, while OSUMS + has wider real-life applications owing to its high efficiency.

Download Full-text

Limited Genetic Diversity Detected in Middle East Respiratory Syndrome-Related Coronavirus Variants Circulating in Dromedary Camels in Jordan

Viruses ◽

10.3390/v13040592 ◽

2021 ◽

Vol 13 (4) ◽

pp. 592

Author(s):

Stephanie N. Seifert ◽

Jonathan E. Schulz ◽

Stacy Ricklefs ◽

Michael Letko ◽

Elangeni Yabba ◽

...

Keyword(s):

Genetic Diversity ◽

Middle East ◽

United Arab Emirates ◽

Sequence Data ◽

Case Fatality ◽

High Sensitivity ◽

Middle East Respiratory Syndrome ◽

Full Genome Sequence ◽

Genome Sequences ◽

Dromedary Camels

Middle East respiratory syndrome-related coronavirus (MERS-CoV) is a persistent zoonotic pathogen with frequent spillover from dromedary camels to humans in the Arabian Peninsula, resulting in limited outbreaks of MERS with a high case-fatality rate. Full genome sequence data from camel-derived MERS-CoV variants show diverse lineages circulating in domestic camels with frequent recombination. More than 90% of the available full MERS-CoV genome sequences derived from camels are from just two countries, the Kingdom of Saudi Arabia (KSA) and United Arab Emirates (UAE). In this study, we employ a novel method to amplify and sequence the partial MERS-CoV genome with high sensitivity from nasal swabs of infected camels. We recovered more than 99% of the MERS-CoV genome from field-collected samples with greater than 500 TCID50 equivalent per nasal swab from camel herds sampled in Jordan in May 2016. Our subsequent analyses of 14 camel-derived MERS-CoV genomes show a striking lack of genetic diversity circulating in Jordan camels relative to MERS-CoV genome sequences derived from large camel markets in KSA and UAE. The low genetic diversity detected in Jordan camels during our study is consistent with a lack of endemic circulation in these camel herds and reflective of data from MERS outbreaks in humans dominated by nosocomial transmission following a single introduction as reported during the 2015 MERS outbreak in South Korea. Our data suggest transmission of MERS-CoV among two camel herds in Jordan in 2016 following a single introduction event.

Download Full-text

Caractérisation génétique des virus Tilligerry et Mitchell River

Revue d’élevage et de médecine vétérinaire des pays tropicaux ◽

10.19182/remvt.10060 ◽

2009 ◽

Vol 62 (2-4) ◽

pp. 151

Author(s):

M. Belaganahalli ◽

S. Maan ◽

P. P.C. Mertens

Keyword(s):

Reverse Genetics ◽

Sequence Data ◽

Phylogenetic Analyses ◽

Taxonomic Status ◽

Cross Reactivity ◽

Emerging Diseases ◽

Full Length ◽

Virus Species ◽

Livestock Farming

Viruses that are normally safely contained within their host species can emerge due to intense livestock farming, trade, travel, climate change and encroachment of human activities into new environments. The unexpected emergence of bluetongue virus (BTV), the prototype species of the genus Orbivirus, in economically important livestock species (sheep and cattle) across the whole of Europe (since 1998), indicates that other orbiviruses represent a potential further threat to animal and human populations in Europe and elsewhere. The genus Orbivirus is the largest within the family Reoviridae, containing 22 virus species, as well as 14 unclassified orbiviruses, some of which may represent additional or novel species. The orbiviruses are transmitted primarily by arthropod vectors (e.g. Culicoides, mosquitoes or ticks). Viral genome sequence data provide a basis for virus taxonomy and diagnostic test development, and make it possible to address fundamental questions concerning virus biology, pathogenesis, virulence and evolution, that can be further explored in mutation and reverse genetics studies. Genome sequences also provide criteria for the classification of novel isolates within individual Orbivirus species, as well as the identification of different serotypes, topotypes, reassortants and even closely related but distinct virus lineages. Full-length genome characterization of Tilligerry virus (TILV), a member of the Eubenangee virus species, and Mitchell River virus (MRV), a member of the Warrego virus species, have revealed highly conserved 5’ and 3’ terminal hexanucleotide sequences. Phylogenetic analyses of orbivirus T2 ‘sub-core-shell’ protein sequences reinforce the hypothesis that this protein is an important evolutionary marker for these viruses. The T2 protein shows high levels of amino acid (AA) sequence identity (> 91%) within a single Orbivirus species / serogroup, which can be used for species identification. The T2-protein gene has therefore been given priority in sequencing studies. The T2 protein of TILV is closely related to that of Eubenangee virus (~91% identity), confirming that they are both members of the same Eubenangee virus species. Although TILV is reported to be related to BTV in serological assays, the TILV T2 protein shows only 68-70% AA identity to BTV. This supports its current classification within a different serogroup (Eubenangee). Warrego virus and MRV are currently classified as two distinct members (different serotypes) within the Warrego virus species. However, they show only about 79% AA identity in their T2 protein (based on partial sequences). It is therefore considered likely that they could be reclassified as members of distinct Orbivirus species. The taxonomic classification of MRV will be reviewed after generating full length sequences for the entire genomes of both viruses. The taxonomic status of each of these viruses will also be tested further by co-infections and attempts to create reassortants between them (only viruses belonging to the same species can reassort their genome segments). TILV and MRV are the first viruses from their respective serogroups / virus species to be genetically fully characterized, and will provide a basis for the further characterization / identification of additional viruses within each group / species. These data will assist in the development of specific diagnostic assays and potentially in control of emerging diseases. The sequences generated will also help to evaluate current diagnostic [reverse transcriptase - polymerase chain reaction (RT-PCR)] tests for BTV, African horse sickness virus, epizootic haemorrhagic disease virus, etc., in silico, by identifying any possibility of cross reactivity.

Download Full-text

An iterative and automated computational pipeline for untargeted strain-level identification using MS/MS spectra from pathogenic samples

10.1101/812313 ◽

2019 ◽

Author(s):

Mathias Kuhring ◽

Joerg Doellinger ◽

Andreas Nitsche ◽

Thilo Muth ◽

Bernhard Y. Renard

Keyword(s):

Statistical Power ◽

Sequence Data ◽

A Priori ◽

Search Space ◽

Strain Level ◽

Reference Sequence ◽

Viral Origin ◽

Identification Of Species ◽

Taxonomic Assignments

AbstractUntargeted accurate strain-level classification of a priori unidentified organisms using tandem mass spectrometry is a challenging task. Reference databases often lack taxonomic depth, limiting peptide assignments to the species level. However, the extension with detailed strain information increases runtime and decreases statistical power. In addition, larger databases contain a higher number of similar proteomes.We present TaxIt, an iterative workflow to address the increasing search space required for MS/MS-based strain-level classification of samples with unknown taxonomic origin. TaxIt first applies reference sequence data for initial identification of species candidates, followed by automated acquisition of relevant strain sequences for low level classification. Furthermore, proteome similarities resulting in ambiguous taxonomic assignments are addressed with an abundance weighting strategy to improve candidate confidence.We apply our iterative workflow on several samples of bacterial and viral origin. In comparison to non-iterative approaches using unique peptides or advanced abundance correction, TaxIt identifies microbial strains correctly in all examples presented (with one tie), thereby demonstrating the potential for untargeted and deeper taxonomic classification. TaxIt makes extensive use of public, unrestricted and continuously growing sequence resources such as the NCBI databases and is available under open-source license at https://gitlab.com/rki_bioinformatics.

Download Full-text

VisFeature: a stand-alone program for visualizing and analyzing statistical features of biological sequences

Bioinformatics ◽

10.1093/bioinformatics/btz689 ◽

2019 ◽

Cited By ~ 3

Author(s):

Jun Wang ◽

Pu-Feng Du ◽

Xin-Yu Xue ◽

Guang-Ping Li ◽

Yuan-Ke Zhou ◽

...

Keyword(s):

Sequence Data ◽

Software Tool ◽

Data Retrieval ◽

Supplementary Information ◽

Statistical Features ◽

Biological Sequence ◽

Sequence Alignments ◽

Multiple Sequence ◽

Source Codes ◽

Multiple Sequence Alignments

Abstract Summary Many efforts have been made in developing bioinformatics algorithms to predict functional attributes of genes and proteins from their primary sequences. One challenge in this process is to intuitively analyze and to understand the statistical features that have been selected by heuristic or iterative methods. In this paper, we developed VisFeature, which aims to be a helpful software tool that allows the users to intuitively visualize and analyze statistical features of all types of biological sequence, including DNA, RNA and proteins. VisFeature also integrates sequence data retrieval, multiple sequence alignments and statistical feature generation functions. Availability and implementation VisFeature is a desktop application that is implemented using JavaScript/Electron and R. The source codes of VisFeature are freely accessible from the GitHub repository (https://github.com/wangjun1996/VisFeature). The binary release, which includes an example dataset, can be freely downloaded from the same GitHub repository (https://github.com/wangjun1996/VisFeature/releases). Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Detection of long repeat expansions from PCR-free whole-genome sequence data

10.1101/093831 ◽

2016 ◽

Cited By ~ 3

Author(s):

Egor Dolzhenko ◽

Joke J.F.A. van Vugt ◽

Richard J. Shaw ◽

Mitchell A. Bekritsky ◽

Marka van Blitterswijk ◽

...

Keyword(s):

Fragile X Syndrome ◽

Sequence Data ◽

Fragile X ◽

Software Tool ◽

Whole Genome Sequence ◽

Read Length ◽

Whole Genome ◽

Wild Type ◽

Short Read ◽

Repeat Expansions

AbstractIdentifying large repeat expansions such as those that cause amyotrophic lateral sclerosis (ALS) and Fragile X syndrome is challenging for short-read (100-150 bp) whole genome sequencing (WGS) data. A solution to this problem is an important step towards integrating WGS into precision medicine. We have developed a software tool called ExpansionHunter that, using PCR-free WGS short-read data, can genotype repeats at the locus of interest, even if the expanded repeat is larger than the read length. We applied our algorithm to WGS data from 3,001 ALS patients who have been tested for the presence of the C9orf72 repeat expansion with repeat-primed PCR (RP-PCR). Taking the RP-PCR calls as the ground truth, our WGS-based method identified pathogenic repeat expansions with 98.1% sensitivity and 99.7% specificity. Further inspection identified that all 11 conflicts were resolved as errors in the original RP-PCR results. Compared against this updated result, ExpansionHunter correctly classified all (212/212) of the expanded samples as either expansions (208) or potential expansions (4). Additionally, 99.9% (2,786/2,789) of the wild type samples were correctly classified as wild type by this method with the remaining two identified as possible expansions. We further applied our algorithm to a set of 144 samples where every sample had one of eight different pathogenic repeat expansions including examples associated with fragile X syndrome, Friedreich’s ataxia and Huntington’s disease and correctly flagged all of the known repeat expansions. Finally, we tested the accuracy of our method for short repeats by comparing our genotypes with results from 860 samples sized using fragment length analysis and determined that our calls were >95% accurate. ExpansionHunter can be used to accurately detect known pathogenic repeat expansions and provides researchers with a tool that can be used to identify new pathogenic repeat expansions.

Download Full-text

An RNA Virome associated to the Golden Orb-weaver SpiderNephila clavipes

10.1101/140814 ◽

2017 ◽

Author(s):

Humberto J. Debat

Keyword(s):

Sequence Data ◽

Rna Virus ◽

Virus Species ◽

Nephila Clavipes ◽

Differential Distribution ◽

Intrinsic Nature ◽

Arthropod Species ◽

Organ Specific ◽

Ethical Approval ◽

Venom Glands

AbstractThe golden orb-weaver spiderNephila clavipes, known for its sexual size dimorphism, is abundant and widespread in the New World. The first annotated genome of orb-weaver spiders, exploringN. clavipes, has recently been reported. The study, focused primarily on the diversity of silk specific genes, shed light into the complex evolutionary history of spiders. Furthermore, a robust transcriptome analysis provided a massive resource forN. clavipesRNA survey. Here, I present evidence of viral sequences corresponding to the first 10 extant virus species associated toN. clavipesand indeed, nephilids. The putatively new species are linked to ssRNA positive-strand viruses, such asPicornavirales, and to ssRNA negative-strand and dsRNA viruses. In addition, I detected sequence data of new strains of two recently reported arthropod viruses, which complemented and extended the corresponding sequence references. The identified viruses appear to be complete, potentially functional, and presenting the typical architecture and consistent viral domains. The intrinsic nature of the detected sequences and their absence in the recently generated genome assembly, suggest that they correspond tobona fideRNA virus sequences. The available RNA data allowed for the first time to address a tissue/organ specific analysis of virus loads/presence in spiders, suggesting a complex spatial and differential distribution of the tentative viruses, encompassing the spider brain and also silk and venom glands. Until recently, the virus landscape associated to spiders remained elusive. The discovered viruses described here provide only a fragmented glimpse of the potential magnitude of theAraneavirosphere. Future studies should focus not only on complementing and expanding these findings, but also on addressing the potential ecological role of these viruses, which might influence the biology of these outstanding arthropod species.Funding statementThe author received no specific funding for this study.Ethics statements(Authors are required to state the ethical considerations of their study in the manuscript, including for cases where the study was exempt from ethical approval procedures)

Download Full-text

GAPPadder: A Sensitive Approach for Closing Gaps on Draft Genomes with Short Sequence Reads

10.1101/125534 ◽

2017 ◽

Author(s):

Chong Chu ◽

Xin Li ◽

Yufeng Wu

Keyword(s):

Sequence Data ◽

Bacterial Genome ◽

Software Tool ◽

Sea Bass ◽

Short Sequence ◽

Asian Sea Bass ◽

Long Reads ◽

Local Assembly ◽

Genomic Repeats ◽

Gap Closing

AbstractBackgroundClosing gaps in draft genomes is an important post processing step in genome assembly. It leads to more complete genomes, which benefits downstream genome analysis such as annotation and genotyping. Several tools have been developed for gap closing. However, these tools don’t fully utilize the information contained in the sequence data. For example, while it is known that many gaps are caused by genomic repeats, existing tools often ignore many sequence reads that originate from a repeat-related gap.ResultsIn this paper, we propose a new approach called GAPPadder for gap closing. The main advantage of GAPPadder is that it uses more information in sequence data for gap closing. In particular, GAPPadder finds and uses reads that originate from repeate-related gaps. We show that these repeat-associated reads are useful for gap closing, even though they are ignored by all existing tools. Other main features of GAPPadder include utilizing the information in sequence reads with different insert sizes and performing two-stage local assembly of gap sequences. We compare GAPPadder with GapCloser, GapFiller and Sealer on one bacterial genome, human chromosome 14 and the human whole genome with paired-end and mate-paired reads with both short and long insert sizes. Empirical results show that GAPPadder can close more gaps than these existing tools. Besides closing gaps on draft genomes assembled only from short sequence reads, GAPPadder can also be used to close gaps for draft genomes assembled with long reads. We show GAPPadder can close gaps on the bed bug genome and the Asian sea bass genome that are assembled partially and fully with long reads respectively. We also show GAPPadder is efficient in both time and memory usage. The software tool, GAPPadder, is available for download at https://github.com/Reedwarbler/GAPPadder.

Download Full-text

Comparative analysis of the complete genome sequences of Helicoverpa zea and Helicoverpa armigera single-nucleocapsid nucleopolyhedroviruses

Journal of General Virology ◽

10.1099/0022-1317-83-3-673 ◽

2002 ◽

Vol 83 (3) ◽

pp. 673-684 ◽

Cited By ~ 75

Author(s):

Xinwen Chen ◽

W.-J. Zhang ◽

J. Wong ◽

G. Chun ◽

A. Lu ◽

...

Keyword(s):

Nucleotide Sequence ◽

Helicoverpa Armigera ◽

Helicoverpa Zea ◽

Sequence Data ◽

Biological Data ◽

Open Reading Frames ◽

Virus Species ◽

Coding Regions ◽

Helicoverpa Armígera ◽

Small Orfs

The complete nucleotide sequence of Helicoverpa zea single-nucleocapsid nucleopolyhedrovirus (HzSNPV) has been determined (130869 bp) and compared to the nucleotide sequence of Helicoverpa armigera (Ha) SNPV. These two genomes are very similar in their nucleotide (97% identity) and amino acid (99% identity) sequences. The coding regions are much more conserved than the non-coding regions. In HzSNPV/HaSNPV, the 63 open reading frames (ORFs) present in all baculoviruses sequenced so far are much more conserved than other ORFs. HzSNPV has four additional small ORFs compared with HaSNPV, one of these (Hz42) being in a correct transcriptional context. The major differences between HzSNPV and HaSNPV are found in the sequence and organization of the homologous regions (hrs) and the baculovirus repeat ORFs (bro genes). The sequence identity between the HzSNPV and HaSNPV hrs ranges from 90% (hr1) to almost 100% (hr5) and the hrs differ in the presence/absence of one or more type A and/or B repeats. The three HzSNPV bro genes differ significantly from those in HaSNPV and may have been acquired independently in the ancestral past. The sequence data suggest strongly that HzSNPV and HaSNPV are variants of the same virus species, a conclusion that is supported by the physical and biological data.

Download Full-text

SARS-CoV-2 antigens expressed in plants detect antibody responses in COVID-19 patients

10.1101/2020.08.04.20167940 ◽

2020 ◽

Cited By ~ 1

Author(s):

Mohau S. Makatsa ◽

Marius B. Tincho ◽

Jerome M. Wendoh ◽

Sherazaan D. Ismail ◽

Rofhiwa Nesamari ◽

...

Keyword(s):

High Sensitivity ◽

High Specificity ◽

Viral Proteins ◽

Antibody Responses ◽

Enzyme Linked Immunosorbent Assay ◽

Robust Detection ◽

Recombinant Plant ◽

Specific Igg ◽

African Patients ◽

Global Threat

AbstractBackgroundThe SARS-CoV-2 pandemic has swept the world and poses a significant global threat to lives and livelihoods, with over 16 million confirmed cases and at least 650 000 deaths from COVID-19 in the first 7 months of the pandemic. Developing tools to measure seroprevalence and understand protective immunity to SARS-CoV-2 is a priority. We aimed to develop a serological assay using plant-derived recombinant viral proteins, which represent important tools in less-resourced settings.MethodsWe established an indirect enzyme-linked immunosorbent assay (ELISA) using the S1 and receptor-binding domain (RBD) portions of the spike protein from SARS-CoV-2, expressed in Nicotiana benthamiana. We measured antibody responses in sera from South African patients (n=77) who had tested positive by PCR for SARS-CoV-2. Samples were taken a median of six weeks after the diagnosis, and the majority of participants had mild and moderate COVID-19 disease. In addition, we tested the reactivity of pre-pandemic plasma (n=58) and compared the performance of our in-house ELISA with a commercial assay. We also determined whether our assay could detect SARS-CoV-2-specific IgG and IgA in saliva.ResultsWe demonstrate that SARS-CoV-2-specific immunoglobulins are readily detectable using recombinant plant-derived viral proteins, in patients who tested positive for SARS-CoV-2 by PCR. Reactivity to S1 and RBD was detected in 51 (66%) and 48 (62%) of participants, respectively. Notably, we detected 100% of samples identified as having S1-specific antibodies by a validated, high sensitivity commercial ELISA, and OD values were strongly and significantly correlated between the two assays. For the pre-pandemic plasma, 1/58 (1.7%) of samples were positive, indicating a high specificity for SARS-CoV-2 in our ELISA. SARS-CoV-2-specific IgG correlated significantly with IgA and IgM responses. Endpoint titers of S1- and RBD-specific immunoglobulins ranged from 1:50 to 1:3200. S1-specific IgG and IgA were found in saliva samples from convalescent volunteers.ConclusionsWe demonstrate that recombinant SARS-CoV-2 proteins produced in plants enable robust detection of SARS-CoV-2 humoral responses. This assay can be used for seroepidemiological studies and to measure the strength and durability of antibody responses to SARS-CoV-2 in infected patients in our setting.

Download Full-text