Genome ARTIST: a robust, high-accuracy aligner tool for mapping transposon insertions and self-insertions

Mapping Intimacies ◽

10.1101/024976 ◽

2015 ◽

Author(s):

Alexandru Al. Ecovoiu ◽

Iulian Constantin Ghionoiu ◽

Andrei Mihai Ciuca ◽

Attila Cristian Ratiu

Keyword(s):

Insertional Mutagenesis ◽

Genomic Sequence ◽

Fruit Fly ◽

High Accuracy ◽

Model Organisms ◽

Robust Solution ◽

Small Indels ◽

Mapping Tool ◽

Transposon Insertions ◽

User Friendly

A critical topic of insertional mutagenesis experiments performed on model organisms is mapping the hits of artificial transposons (ATs) at nucleotide level accuracy. Obviously, mapping errors may occur when sequencing artifacts or mutations as SNPs and small indels are present very close to the junction between a genomic sequence and a transposon inverted repeat (TIR). Another particular item of insertional mutagenesis is mapping of the transposon self-insertions and, to our best knowledge, there is no publicly available mapping tool designed to analyze such molecular events. We developed Genome ARTIST, a pairwise gapped aligner tool which works out both issues by means of an original, robust mapping strategy. Genome ARTIST is not designed to use NGS data but to analyze ATs insertions obtained in small to medium-scale mutagenesis experiments. Genome ARTIST employs a heuristic approach to find DNA sequence similarities and harnesses a multi-step implementation of a Smith-Waterman adapted algorithm to compute the mapping alignments. The experience is enhanced by easily customizable parameters and a user-friendly interface that describes the genomic landscape surrounding the insertion. Genome ARTIST deals with many genomes of bacteria and eukaryotes available in Ensembl and GenBank repositories. Our tool specifically harnesses/exploits the sequence annotation data provided by FlyBase for Drosophila melanogaster (the fruit fly), which enables mapping of insertions relative to various genomic features such as natural transposons. Genome ARTIST was tested against other alignment tools using relevant query sequences derived from the D. melanogaster and Mus musculus (mouse) genomes. Real and simulated query sequences were also comparatively inquired, revealing that Genome ARTIST is a very robust solution for mapping transposon insertions. Genome ARTIST is a stand-alone user-friendly application, designed for high-accuracy mapping of transposon insertions and self-insertions. The tool is also useful for routine aligning assessments like detection of SNPs or checking the specificity of primers and probes. Genome ARTIST is an open source software and is available for download at www.genomeartist.ro and at www.bioinformatics.org.

Download Full-text

BREC: an R package/Shiny app for automatically identifying heterochromatin boundaries and estimating local recombination rates along chromosomes

BMC Bioinformatics ◽

10.1186/s12859-021-04233-1 ◽

2021 ◽

Vol 22 (S6) ◽

Author(s):

Yasmine Mansour ◽

Annie Chateau ◽

Anna-Sophie Fiston-Lavier

Keyword(s):

Data Quality ◽

Data Science ◽

Fruit Fly ◽

R Package ◽

Model Organisms ◽

Data Quality Control ◽

Recombination Rates ◽

Functional Dynamics ◽

Shiny App ◽

User Friendly

Abstract Background Meiotic recombination is a vital biological process playing an essential role in genome's structural and functional dynamics. Genomes exhibit highly various recombination profiles along chromosomes associated with several chromatin states. However, eu-heterochromatin boundaries are not available nor easily provided for non-model organisms, especially for newly sequenced ones. Hence, we miss accurate local recombination rates necessary to address evolutionary questions. Results Here, we propose an automated computational tool, based on the Marey maps method, allowing to identify heterochromatin boundaries along chromosomes and estimating local recombination rates. Our method, called BREC (heterochromatin Boundaries and RECombination rate estimates) is non-genome-specific, running even on non-model genomes as long as genetic and physical maps are available. BREC is based on pure statistics and is data-driven, implying that good input data quality remains a strong requirement. Therefore, a data pre-processing module (data quality control and cleaning) is provided. Experiments show that BREC handles different markers' density and distribution issues. Conclusions BREC's heterochromatin boundaries have been validated with cytological equivalents experimentally generated on the fruit fly Drosophila melanogaster genome, for which BREC returns congruent corresponding values. Also, BREC's recombination rates have been compared with previously reported estimates. Based on the promising results, we believe our tool has the potential to help bring data science into the service of genome biology and evolution. We introduce BREC within an R-package and a Shiny web-based user-friendly application yielding a fast, easy-to-use, and broadly accessible resource. The BREC R-package is available at the GitHub repository https://github.com/GenomeStructureOrganization.

Download Full-text

BREC: An R package/Shiny app for automatically identifying heterochromatin boundaries and estimating local recombination rates along chromosomes

10.1101/2020.06.29.178095 ◽

2020 ◽

Author(s):

Yasmine Mansour ◽

Annie Chateau ◽

Anna-Sophie Fiston-Lavier

Keyword(s):

Data Quality ◽

Data Science ◽

Fruit Fly ◽

R Package ◽

Model Organisms ◽

Data Quality Control ◽

Recombination Rates ◽

Functional Dynamics ◽

Shiny App ◽

User Friendly

AbstractMotivationMeiotic recombination is a vital biological process playing an essential role in genomes structural and functional dynamics. Genomes exhibit highly various recombination profiles along chromosomes associated with several chromatin states. However, eu-heterochromatin boundaries are not available nor easily provided for non-model organisms, especially for newly sequenced ones. Hence, we miss accurate local recombination rates, necessary to address evolutionary questions.ResultsHere, we propose an automated computational tool, based on the Marey maps method, allowing to identify heterochromatin boundaries along chromosomes and estimating local recombination rates. Our method, called BREC (heterochromatin Boundaries and RECombination rate estimates) is non-genome-specific, running even on non-model genomes as long as genetic and physical maps are available. BREC is based on pure statistics and is data-driven, implying that good input data quality remains a strong requirement. Therefore, a data pre-processing module (data quality control and cleaning) is provided. Experiments show that BREC handles different markers density and distribution issues. BREC’s heterochromatin boundaries have been validated with cytological equivalents experimentally generated on the fruit fly Drosophila melanogaster genome, for which BREC returns congruent corresponding values. Also, BREC’s recombination rates have been compared with previously reported estimates. Based on the promising results, we believe our tool has the potential to help bring data science into the service of genome biology and evolution. We introduce BREC within an R-package and a Shiny web-based user-friendly application yielding a fast, easy-to-use, and broadly accessible resource.AvailabilityBREC R-package is available at the GitHub repository https://github.com/ymansour21/BREC.

Download Full-text

Shotgun Cloning of Transposon Insertions in the Genome ofCaenorhabditis elegans

Comparative and Functional Genomics ◽

10.1002/cfg.392 ◽

2004 ◽

Vol 5 (3) ◽

pp. 225-229 ◽

Cited By ~ 8

Author(s):

Alexander M. van der Linden ◽

Ronald H. A. Plasterk

Keyword(s):

Transposable Elements ◽

Large Scale ◽

Insertional Mutagenesis ◽

Genomic Sequence ◽

Mutagenesis Screen ◽

Mutator Strain ◽

C Elegans ◽

Large Numbers ◽

Transposon Insertions ◽

Insertional Mutagenesis Screen

We present a strategy to identify and map large numbers of transposon insertions in the genome ofCaenorhabditis elegans. Our approach makes use of the mutator strainmut-7, which has germline-transposition activity of the Tc1/mariner family of transposons, a display protocol to detect new transposon insertions, and the availability of the genomic sequence ofC. elegans. From a pilot insertional mutagenesis screen, we have obtained 351 new Tc1 transposons inserted in or near 219 predictedC. elegansgenes. The strategy presented provides an approach to isolate insertions of natural transposable elements in manyC. elegansgenes and to create a large-scale collection ofC. elegansmutants.

Download Full-text

Circadian Control of Global Transcription

BioMed Research International ◽

10.1155/2015/187809 ◽

2015 ◽

Vol 2015 ◽

pp. 1-8 ◽

Cited By ~ 13

Author(s):

Shujing Li ◽

Luoying Zhang

Keyword(s):

Transcription Factors ◽

Environmental Conditions ◽

Fruit Fly ◽

Biological Pathways ◽

Model Organisms ◽

Molecular Clocks ◽

Rhythmic Expression ◽

The Earth ◽

Circadian Control ◽

And Behavior

Circadian rhythms exist in most if not all organisms on the Earth and manifest in various aspects of physiology and behavior. These rhythmic processes are believed to be driven by endogenous molecular clocks that regulate rhythmic expression of clock-controlled genes (CCGs). CCGs consist of a significant portion of the genome and are involved in diverse biological pathways. The transcription of CCGs is tuned by rhythmic actions of transcription factors and circadian alterations in chromatin. Here, we review the circadian control of CCG transcription in five model organisms that are widely used, including cyanobacterium, fungus, plant, fruit fly, and mouse. Comparing the similarity and differences in the five organisms could help us better understand the function of the circadian clock, as well as its output mechanisms adapted to meet the demands of diverse environmental conditions.

Download Full-text

A cross-species approach for the identification of Drosophila male sterility genes

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab183 ◽

2021 ◽

Author(s):

Kimihide Ibaraki ◽

Mihoko Nakatsuka ◽

Takashi Ohsako ◽

Masahide Watanabe ◽

Yu Miyazaki ◽

...

Keyword(s):

Male Sterility ◽

Focal Point ◽

Fruit Fly ◽

Male Reproduction ◽

Model Organisms ◽

Germline Stem Cells ◽

New Genes ◽

Starting Point ◽

Sperm Development ◽

Sterility Genes

Abstract Male reproduction encompasses many essential cellular processes and interactions. As a focal point for these events, sperm offer opportunities for advancing our understanding of sexual reproduction at multiple levels during development. Using male sterility genes identified in human, mouse and fruit fly databases as a starting point, 103 Drosophila melanogaster genes were screened for their association with male sterility by tissue-specific RNAi knockdown and CRISPR/Cas9-mediated mutagenesis. This list included 56 genes associated with male infertility in the human databases, but not found in the Drosophila database, resulting in the discovery of 63 new genes associated with male fertility in Drosophila. The phenotypes identified were categorized into six distinct classes affecting sperm development. Interestingly, the second largest class (Class VI) caused sterility despite apparently normal testis and sperm morphology suggesting that these proteins may have functions in the mature sperm following spermatogenesis. We focused on one such gene, Rack 1, and found that it plays an important role in two developmental periods, in early germline cells or germline stem cells and in spermatogenic cells or sperm. Taken together, many genes are yet to be identified and their role in male reproduction, especially after ejaculation, remains to be elucidated in Drosophila, where a wealth of data from human and other model organisms would be useful.

Download Full-text

The Making of Long-Lasting Memories: A Fruit Fly Perspective

Frontiers in Behavioral Neuroscience ◽

10.3389/fnbeh.2021.662129 ◽

2021 ◽

Vol 15 ◽

Author(s):

Camilla Roselli ◽

Mani Ramaswami ◽

Tamara Boto ◽

Isaac Cervantes-Sandoval

Keyword(s):

Learning And Memory ◽

Neural Activity ◽

Molecular Mechanisms ◽

De Novo ◽

Fruit Fly ◽

Memory Formation ◽

Synaptic Activity ◽

Model Organisms ◽

Transient Wave ◽

Control Of Gene Expression

Understanding the nature of the molecular mechanisms underlying memory formation, consolidation, and forgetting are some of the fascinating questions in modern neuroscience. The encoding, stabilization and elimination of memories, rely on the structural reorganization of synapses. These changes will enable the facilitation or depression of neural activity in response to the acquisition of new information. In other words, these changes affect the weight of specific nodes within a neural network. We know that these plastic reorganizations require de novo protein synthesis in the context of Long-term memory (LTM). This process depends on neural activity triggered by the learned experience. The use of model organisms like Drosophila melanogaster has been proven essential for advancing our knowledge in the field of neuroscience. Flies offer an optimal combination of a more straightforward nervous system, composed of a limited number of cells, and while still displaying complex behaviors. Studies in Drosophila neuroscience, which expanded over several decades, have been critical for understanding the cellular and molecular mechanisms leading to the synaptic and behavioral plasticity occurring in the context of learning and memory. This is possible thanks to sophisticated technical approaches that enable precise control of gene expression in the fruit fly as well as neural manipulation, like chemogenetics, thermogenetics, or optogenetics. The search for the identity of genes expressed as a result of memory acquisition has been an active interest since the origins of behavioral genetics. From screenings of more or less specific candidates to broader studies based on transcriptome analysis, our understanding of the genetic control behind LTM has expanded exponentially in the past years. Here we review recent literature regarding how the formation of memories induces a rapid, extensive and, in many cases, transient wave of transcriptional activity. After a consolidation period, transcriptome changes seem more stable and likely represent the synthesis of new proteins. The complexity of the circuitry involved in memory formation and consolidation is such that there are localized changes in neural activity, both regarding temporal dynamics and the nature of neurons and subcellular locations affected, hence inducing specific temporal and localized changes in protein expression. Different types of neurons are recruited at different times into memory traces. In LTM, the synthesis of new proteins is required in specific subsets of cells. This de novo translation can take place in the somatic cytoplasm and/or locally in distinct zones of compartmentalized synaptic activity, depending on the nature of the proteins and the plasticity-inducing processes that occur. We will also review recent advances in understanding how localized changes are confined to the relevant synapse. These recent studies have led to exciting discoveries regarding proteins that were not previously involved in learning and memory processes. This invaluable information will lead to future functional studies on the roles that hundreds of new molecular actors play in modulating neural activity.

Download Full-text

Containment strategies for synthetic gene drive organisms and impacts on gene flow.

10.1079/9781789247480.0010 ◽

2021 ◽

pp. 137-152

Author(s):

Lei Pei ◽

Markus Schmidt

Keyword(s):

Gene Flow ◽

Fungal Infections ◽

Basic Research ◽

Fruit Fly ◽

Mitigation Strategies ◽

Synthetic Gene ◽

Model Organisms ◽

Gene Drive ◽

Comprehensive Overview ◽

Gene Drives

Abstract Gene drives, particularly synthetic gene drives, may help to address some important challenges, by efficiently altering specific sections of DNA in entire populations of wild organisms. Here we review the current development of the synthetic gene drives, especially those RNA-guided synthetic gene drives based on the CRISPR nuclease Cas. Particular focuses are on their possible applications in agriculture (e.g. disease resistance, weed control management), ecosystem conservation (e.g. evasion species control), health (e.g. to combat insect-borne and fungal infections), and for basic research in model organisms (e.g. Saccharomyces, fruit fly, and zebra fish). The physical, chemical, biological, and ecological containment strategies that might help to confine these gene drive-modified organisms are then explored. The gene flow issues, those from gene drive-derived organisms to the environment, are discussed, while possible mitigation strategies for gene drive research are explored. Last but not least, the regulatory context and opinions from key stakeholders (regulators, scientists, and concerned organizations) are reviewed, aiming to provide a more comprehensive overview of the field.

Download Full-text

A Universal, Genomewide GuideFinder for CRISPR/Cas9 Targeting in Microbial Genomes

mSphere ◽

10.1128/msphere.00086-20 ◽

2020 ◽

Vol 5 (1) ◽

Author(s):

Michelle Spoto ◽

Changhui Guan ◽

Elizabeth Fleming ◽

Julia Oh

Keyword(s):

Gene Function ◽

Large Scale ◽

Essential Gene ◽

Bacterial Species ◽

Bacterial Genome ◽

Model Organisms ◽

Design Parameters ◽

Bacterial Genomes ◽

Wide Range ◽

User Friendly

ABSTRACT The CRISPR/Cas system has significant potential to facilitate gene editing in a variety of bacterial species. CRISPR interference (CRISPRi) and CRISPR activation (CRISPRa) represent modifications of the CRISPR/Cas9 system utilizing a catalytically inactive Cas9 protein for transcription repression and activation, respectively. While CRISPRi and CRISPRa have tremendous potential to systematically investigate gene function in bacteria, few programs are specifically tailored to identify guides in draft bacterial genomes genomewide. Furthermore, few programs offer open-source code with flexible design parameters for bacterial targeting. To address these limitations, we created GuideFinder, a customizable, user-friendly program that can design guides for any annotated bacterial genome. GuideFinder designs guides from NGG protospacer-adjacent motif (PAM) sites for any number of genes by the use of an annotated genome and FASTA file input by the user. Guides are filtered according to user-defined design parameters and removed if they contain any off-target matches. Iteration with lowered parameter thresholds allows the program to design guides for genes that did not produce guides with the more stringent parameters, one of several features unique to GuideFinder. GuideFinder can also identify paired guides for targeting multiplicity, whose validity we tested experimentally. GuideFinder has been tested on a variety of diverse bacterial genomes, finding guides for 95% of genes on average. Moreover, guides designed by the program are functionally useful—focusing on CRISPRi as a potential application—as demonstrated by essential gene knockdown in two staphylococcal species. Through the large-scale generation of guides, this open-access software will improve accessibility to CRISPR/Cas studies of a variety of bacterial species. IMPORTANCE With the explosion in our understanding of human and environmental microbial diversity, corresponding efforts to understand gene function in these organisms are strongly needed. CRISPR/Cas9 technology has revolutionized interrogation of gene function in a wide variety of model organisms. Efficient CRISPR guide design is required for systematic gene targeting. However, existing tools are not adapted for the broad needs of microbial targeting, which include extraordinary species and subspecies genetic diversity, the overwhelming majority of which is characterized by draft genomes. In addition, flexibility in guide design parameters is important to consider the wide range of factors that can affect guide efficacy, many of which can be species and strain specific. We designed GuideFinder, a customizable, user-friendly program that addresses the limitations of existing software and that can design guides for any annotated bacterial genome with numerous features that facilitate guide design in a wide variety of microorganisms.

Download Full-text

miR2Diabetes: A Literature-Curated Database of microRNA Expression Patterns, in Diabetic Microvascular Complications

Genes ◽

10.3390/genes10100784 ◽

2019 ◽

Vol 10 (10) ◽

pp. 784 ◽

Cited By ~ 1

Author(s):

Sungjin Park ◽

SeongRyeol Moon ◽

Kiyoung Lee ◽

Ie Byung Park ◽

Dae Ho Lee ◽

...

Keyword(s):

Association Studies ◽

Microvascular Complications ◽

Expression Patterns ◽

Disease Model ◽

Model Organisms ◽

Web Interface ◽

Diabetic Microvascular Complications ◽

User Friendly ◽

Rats And Mice ◽

Kidney Liver

microRNAs (miRNAs) have been established as critical regulators of the pathogenesis of diabetes mellitus (DM), and diabetes microvascular complications (DMCs). However, manually curated databases for miRNAs, and DM (including DMCs) association studies, have yet to be established. Here, we constructed a user-friendly database, “miR2Diabetes,” equipped with a graphical web interface for simple browsing or searching manually curated annotations. The annotations in our database cover 14 DM and DMC phenotypes, involving 156 miRNAs, by browsing diverse sample origins (e.g., blood, kidney, liver, and other tissues). Additionally, we provide miRNA annotations for disease-model organisms (including rats and mice), of DM and DMCs, for the purpose of improving knowledge of the biological complexity of these pathologies. We assert that our database will be a comprehensive resource for miRNA biomarker studies, as well as for prioritizing miRNAs for functional validation, in DM and DMCs, with likely extension to other diseases.

Download Full-text

DiscoSnp++: de novo detection of small variants from raw unassembled read set(s)

10.1101/209965 ◽

2017 ◽

Cited By ~ 12

Author(s):

Pierre Peterlongo ◽

Chloé Riou ◽

Erwan Drezen ◽

Claire Lemaitre

Keyword(s):

Reference Genome ◽

De Novo ◽

Model Organisms ◽

Small Indels ◽

Desktop Computers ◽

Resource Requirements ◽

Computational Resources ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Source Of Information

AbstractMotivationNext Generation Sequencing (NGS) data provide an unprecedented access to life mechanisms. In particular, these data enable to detect polymorphisms such as SNPs and indels. As these polymorphisms represent a fundamental source of information in agronomy, environment or medicine, their detection in NGS data is now a routine task. The main methods for their prediction usually need a reference genome. However, non-model organisms and highly divergent genomes such as in cancer studies are extensively investigated.ResultsWe propose DiscoSnp++, in which we revisit the DiscoSnp algorithm. DiscoSnp++ is designed for detecting and ranking all kinds of SNPs and small indels from raw read set(s). It outputs files in fasta and VCF formats. In particular, predicted variants can be automatically localized afterwards on a reference genome if available. Its usage is extremely simple and its low resource requirements make it usable on common desktop computers. Results show that DiscoSnp++ performs better than state-of-the-art methods in terms of computational resources and in terms of results quality. An important novelty is the de novo detection of indels, for which we obtained 99% precision when calling indels on simulated human datasets and 90% recall on high confident indels from the Platinum dataset.LicenseGNU Affero general public licenseAvailabilityhttps://github.com/GATB/[email protected]

Download Full-text