nucleotide sequence database Latest Research Papers

Connecting molecular sequences to their voucher specimens

10.37044/osf.io/93qf4 ◽

2021 ◽

Author(s):

Quentin John Groom ◽

Mathias Dillen ◽

Pieter Huybrechts ◽

Rukaya Johaadien ◽

Niki Kyriakopoulou ◽

...

Keyword(s):

Sequence Data ◽

Environmental Data ◽

Sequence Database ◽

Human In The Loop ◽

European Nucleotide Archive ◽

Voucher Specimens ◽

Data Elements ◽

Machine Readable ◽

And Training ◽

Nucleotide Sequence Database

When sequencing molecules from an organism it is standard practice to create voucher specimens. This ensures that the results are repeatable and that the identification of the organism can be verified. It also means that the sequence data can be linked to a whole host of other data related to the specimen, including traits, other sequences, environmental data, and geography. It is therefore critical that explicit, preferably machine readable, links exist between voucher specimens and sequence. However, such links do not exist in the databases of the International Nucleotide Sequence Database Collaboration (INSDC). If it were possible to create permanent bidirectional links between specimens and sequence it would not only make data more findable, but would also open new avenues for research. In the Biohackathon we built a semi-automated workflow to take specimen data from the Meise Herbarium and search for references to those specimens in the European Nucleotide Archive (ENA). We achieved this by matching data elements of the specimen and sequence together and by adding a “human-in-the-loop” process whereby possible matches could be confirmed. Although we found that it was possible to discover and match sequences to their vouchers in our collection, we encountered many problems of data standardization, missing data and errors. These problems make the process unreliable and unsuitable to rediscover all the possible links that exist. Ultimately, improved standards and training would remove the need for retrospective relinking of specimens with their sequence. Therefore, we make some tentative recommendations for how this could be achieved in the future.

Download Full-text

The international nucleotide sequence database collaboration

Nucleic Acids Research ◽

10.1093/nar/gkaa967 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D121-D124

Author(s):

Masanori Arita ◽

Ilene Karsch-Mizrachi ◽

Guy Cochrane

Keyword(s):

Nucleotide Sequence ◽

Sequence Data ◽

Data Bank ◽

National Institutes Of Health ◽

Sequence Database ◽

National Library ◽

Nucleotide Sequence Data ◽

International Nucleotide Sequence Database ◽

European Nucleotide Archive ◽

Nucleotide Sequence Database

Abstract The International Nucleotide Sequence Database Collaboration (INSDC; http://www.insdc.org/) has been the core infrastructure for collecting and providing nucleotide sequence data and metadata for >30 years. Three partner organizations, the DNA Data Bank of Japan (DDBJ) at the National Institute of Genetics in Mishima, Japan; the European Nucleotide Archive (ENA) at the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) in Hinxton, UK; and GenBank at National Center for Biotechnology Information (NCBI), National Library of Medicine, National Institutes of Health in Bethesda, Maryland, USA have been collaboratively maintaining the INSDC for the benefit of not only science but all types of community worldwide.

Download Full-text

NCBI Taxonomy: a comprehensive update on curation, resources and tools

Database ◽

10.1093/database/baaa062 ◽

2020 ◽

Vol 2020 ◽

Cited By ~ 5

Author(s):

Conrad L Schoch ◽

Stacy Ciufo ◽

Mikhail Domrachev ◽

Carol L Hotton ◽

Sivakumar Kannan ◽

...

Keyword(s):

Nucleotide Sequence ◽

Protein Sequence ◽

Ncbi Taxonomy ◽

Sequence Database ◽

External Resources ◽

International Nucleotide Sequence Database ◽

Sql Database ◽

Data Elements ◽

Taxonomic Groups ◽

Nucleotide Sequence Database

Abstract The National Center for Biotechnology Information (NCBI) Taxonomy includes organism names and classifications for every sequence in the nucleotide and protein sequence databases of the International Nucleotide Sequence Database Collaboration. Since the last review of this resource in 2012, it has undergone several improvements. Most notable is the shift from a single SQL database to a series of linked databases tied to a framework of data called NameBank. This means that relations among data elements can be adjusted in more detail, resulting in expanded annotation of synonyms, the ability to flag names with specific nomenclatural properties, enhanced tracking of publications tied to names and improved annotation of scientific authorities and types. Additionally, practices utilized by NCBI Taxonomy curators specific to major taxonomic groups are described, terms peculiar to NCBI Taxonomy are explained, external resources are acknowledged and updates to tools and other resources are documented. Database URL: https://www.ncbi.nlm.nih.gov/taxonomy

Download Full-text

DDBJ Database updates and computational infrastructure enhancement

Nucleic Acids Research ◽

10.1093/nar/gkz982 ◽

2019 ◽

Author(s):

Osamu Ogasawara ◽

Yuichi Kodama ◽

Jun Mashima ◽

Takehide Kosuge ◽

Takatomo Fujisawa

Keyword(s):

Nucleotide Sequence ◽

Graphics Processing Units ◽

Large Scale ◽

Sequence Data ◽

Sequence Database ◽

Cloud Infrastructure ◽

File Transfer ◽

Nucleotide Sequence Data ◽

Starting Point ◽

Nucleotide Sequence Database

Abstract The Bioinformation and DDBJ Center (https://www.ddbj.nig.ac.jp) in the National Institute of Genetics (NIG) maintains a primary nucleotide sequence database as a member of the International Nucleotide Sequence Database Collaboration (INSDC) in partnership with the US National Center for Biotechnology Information and the European Bioinformatics Institute. The NIG operates the NIG supercomputer as a computational basis for the construction of DDBJ databases and as a large-scale computational resource for Japanese biologists and medical researchers. In order to accommodate the rapidly growing amount of deoxyribonucleic acid (DNA) nucleotide sequence data, NIG replaced its supercomputer system, which is designed for big data analysis of genome data, in early 2019. The new system is equipped with 30 PB of DNA data archiving storage; large-scale parallel distributed file systems (13.8 PB in total) and 1.1 PFLOPS computation nodes and graphics processing units (GPUs). Moreover, as a starting point of developing multi-cloud infrastructure of bioinformatics, we have also installed an automatic file transfer system that allows users to prevent data lock-in and to achieve cost/performance balance by exploiting the most suitable environment from among the supercomputer and public clouds for different workloads.

Download Full-text

Construction of a Comprehensive Database from the Existing Viral Sequences Available from the International Nucleotide Sequence Database Collaboration

The Human Virome - Methods in Molecular Biology ◽

10.1007/978-1-4939-8682-8_16 ◽

2018 ◽

pp. 231-243

Author(s):

Rodrigo García-López

Keyword(s):

Nucleotide Sequence ◽

Sequence Database ◽

International Nucleotide Sequence Database ◽

Comprehensive Database ◽

Viral Sequences ◽

Nucleotide Sequence Database

Download Full-text

The international nucleotide sequence database collaboration

Nucleic Acids Research ◽

10.1093/nar/gkx1097 ◽

2017 ◽

Vol 46 (D1) ◽

pp. D48-D51 ◽

Cited By ~ 80

Author(s):

Ilene Karsch-Mizrachi ◽

Toshihisa Takagi ◽

Guy Cochrane ◽

Keyword(s):

Biological Sciences ◽

Nucleotide Sequence ◽

Sequence Data ◽

Data Bank ◽

National Institutes Of Health ◽

Sequence Database ◽

Nucleotide Sequence Data ◽

International Nucleotide Sequence Database ◽

European Nucleotide Archive ◽

Nucleotide Sequence Database

Abstract For more than 30 years, the International Nucleotide Sequence Database Collaboration (INSDC; http://www.insdc.org/) has been committed to capturing, preserving and providing access to comprehensive public domain nucleotide sequence and associated metadata which enables discovery in biomedicine, biodiversity and biological sciences. Since 1987, the DNA Data Bank of Japan (DDBJ) at the National Institute for Genetics in Mishima, Japan; the European Nucleotide Archive (ENA) at the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) in Hinxton, UK; and GenBank at National Center for Biotechnology Information (NCBI), National Library of Medicine, National Institutes of Health in Bethesda, Maryland, USA have worked collaboratively to enable access to nucleotide sequence data in standardized formats for the worldwide scientific community. In this article, we reiterate the principles of the INSDC collaboration and briefly summarize the trends of the archival content.

Download Full-text

Globalizing Genomics: The Origins of the International Nucleotide Sequence Database Collaboration

Journal of the History of Biology ◽

10.1007/s10739-017-9490-y ◽

2017 ◽

Vol 51 (4) ◽

pp. 657-691 ◽

Cited By ~ 6

Author(s):

Hallam Stevens

Keyword(s):

Nucleotide Sequence ◽

Sequence Database ◽

International Nucleotide Sequence Database ◽

Nucleotide Sequence Database

Download Full-text

EMBL nucleotide sequence database

The Dictionary of Genomics, Transcriptomics and Proteomics ◽

10.1002/9783527678679.dg03797 ◽

2015 ◽

pp. 1-1 ◽

Cited By ~ 1

Keyword(s):

Nucleotide Sequence ◽

Sequence Database ◽

Embl Nucleotide Sequence Database ◽

Nucleotide Sequence Database

Download Full-text

The International Nucleotide Sequence Database Collaboration

Nucleic Acids Research ◽

10.1093/nar/gkv1323 ◽

2015 ◽

Vol 44 (D1) ◽

pp. D48-D50 ◽

Cited By ~ 100

Author(s):

Guy Cochrane ◽

Ilene Karsch-Mizrachi ◽

Toshihisa Takagi ◽

International Nucleotide Sequence Database Collaboration

Keyword(s):

Nucleotide Sequence ◽

Sequence Database ◽

International Nucleotide Sequence Database ◽

Nucleotide Sequence Database

Download Full-text

The International Nucleotide Sequence Database Collaboration

Nucleic Acids Research ◽

10.1093/nar/gks1084 ◽

2012 ◽

Vol 41 (D1) ◽

pp. D21-D24 ◽

Cited By ~ 93

Author(s):

Y. Nakamura ◽

G. Cochrane ◽

I. Karsch-Mizrachi ◽

Keyword(s):

Nucleotide Sequence ◽

Sequence Database ◽

International Nucleotide Sequence Database ◽

Nucleotide Sequence Database

Download Full-text

nucleotide sequence database
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Connecting molecular sequences to their voucher specimens

The international nucleotide sequence database collaboration

NCBI Taxonomy: a comprehensive update on curation, resources and tools

DDBJ Database updates and computational infrastructure enhancement

Construction of a Comprehensive Database from the Existing Viral Sequences Available from the International Nucleotide Sequence Database Collaboration

The international nucleotide sequence database collaboration

Globalizing Genomics: The Origins of the International Nucleotide Sequence Database Collaboration

EMBL nucleotide sequence database

The International Nucleotide Sequence Database Collaboration

The International Nucleotide Sequence Database Collaboration

Export Citation Format

nucleotide sequence databaseRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Connecting molecular sequences to their voucher specimens

The international nucleotide sequence database collaboration

NCBI Taxonomy: a comprehensive update on curation, resources and tools

DDBJ Database updates and computational infrastructure enhancement

Construction of a Comprehensive Database from the Existing Viral Sequences Available from the International Nucleotide Sequence Database Collaboration

The international nucleotide sequence database collaboration

Globalizing Genomics: The Origins of the International Nucleotide Sequence Database Collaboration

EMBL nucleotide sequence database

The International Nucleotide Sequence Database Collaboration

The International Nucleotide Sequence Database Collaboration

nucleotide sequence database
Recently Published Documents