Computational approaches to standard-compliant biofilm data for reliable analysis and integration

Ana Margarida Sousa; Andreia Ferreira; Nuno F. Azevedo; Maria Olivia Pereira; Anália Lourenço

doi:10.1515/jib-2012-203

Computational approaches to standard-compliant biofilm data for reliable analysis and integration

Journal of Integrative Bioinformatics ◽

10.1515/jib-2012-203 ◽

2012 ◽

Vol 9 (3) ◽

pp. 57-68 ◽

Cited By ~ 3

Author(s):

Ana Margarida Sousa ◽

Andreia Ferreira ◽

Nuno F. Azevedo ◽

Maria Olivia Pereira ◽

Anália Lourenço

Keyword(s):

Computational Modelling ◽

Colony Morphology ◽

Experimental Procedure ◽

Data Intensive ◽

Data Interchange ◽

Share Data ◽

Semantic Resources ◽

Areas Of Interest ◽

Machine Readable ◽

Readable Format

Summary The study of microorganism consortia, also known as biofilms, is associated to a number of applications in biotechnology, ecotechnology and clinical domains. Nowadays, biofilm studies are heterogeneous and data-intensive, encompassing different levels of analysis. Computational modelling of biofilm studies has become thus a requirement to make sense of these vast and ever-expanding biofilm data volumes.The rationale of the present work is a machine-readable format for representing biofilm studies and supporting biofilm data interchange and data integration. This format is supported by the Biofilm Science Ontology (BSO), the first ontology on biofilms information. The ontology is decomposed into a number of areas of interest, namely: the Experimental Procedure Ontology (EPO) which describes biofilm experimental procedures; the Colony Morphology Ontology (CMO) which characterises morphologically microorganism colonies; and other modules concerning biofilm phenotype, antimicrobial susceptibility and virulence traits. The overall objective behind BSO is to develop semantic resources to capture, represent and share data on biofilms and related experiments in a regularized fashion manner. Furthermore, the present work also introduces a framework in assistance of biofilm data interchange and analysis - BiofOmics (http://biofomics.org) - and a public repository on colony morphology signatures - MorphoCol (http://stardust.deb.uminho.pt/morphocol).

Download Full-text

Epi Archive: automated data collection of notifiable disease data

Online Journal of Public Health Informatics ◽

10.5210/ojphi.v9i1.7615 ◽

2017 ◽

Vol 9 (1) ◽

Cited By ~ 2

Author(s):

Nicholas Generous ◽

Geoffrey Fairchild ◽

Hari Khalsa ◽

Byron Tasseff ◽

James Arnold

Keyword(s):

Public Health ◽

Time Series ◽

State Level ◽

Notifiable Disease ◽

Public Health Departments ◽

Data Intensive ◽

Share Data ◽

Data Source ◽

Machine Readable ◽

Political Economic

ObjectiveLANL has built a software program that automatically collectsglobal notifiable disease data—particularly data stored in files—andmakes it available and shareable within the Biosurveillance Ecosystem(BSVE) as a new data source. This will improve the prediction andearly warning of disease events and other applications.IntroductionMost countries do not report national notifiable disease data in amachine-readable format. Data are often in the form of a file thatcontains text, tables and graphs summarizing weekly or monthlydisease counts. This presents a problem when information is neededfor more data intensive approaches to epidemiology, biosurveillanceand public health as exemplified by the Biosurveillance Ecosystem(BSVE).While most nations do likely store their data in a machine-readableformat, the governments are often hesitant to share data openly fora variety of reasons that include technical, political, economic, andmotivational issues [1]. For example, an attempt by LANL to obtaina weekly version of openly available monthly data, reported by theAustralian government, resulted in an onerous bureaucratic reply. Theobstacles to obtaining data included: paperwork to request data fromeach of the Australian states and territories, a long delay to obtaindata (up to 3 months) and extensive limitations on the data’s use thatprohibit collaboration and sharing. This type of experience whenattempting to contact public health departments or ministries of healthfor data is not uncommon.A survey conducted by LANL of notifiable disease data reportingin 52 countries identified only 10 as being machine-readable and42 being reported in pdf files on a regular basis. Within the 42 nationsthat report in pdf files, 32 report in a structured, tabular format and10 in a non-structured way.As a result, LANL has developed a tool-Epi Archive (formerlyknown as EPIC)-to automatically and continuously collect globalnotifiable disease data and make it readily accesible.MethodsWe conducted a survey of the national notifiable disease reportingsystems notating how the data is reported in two important dimensions:date standards and case definitions.The development of software to regularly ingests notifiabledisease data frand makes this data available involved four main stepsscraping, extracting, parsing and persisting.For scraping: we would examine website designs and determinereporting mechanisms for each country/website as well as what variesacross the reporting mechanisms. We then designed and wrote codeto automate the downloading of report pdf files, for each country.We stored report pdfs along with appropriate metadata for extractingand parsing.For extracting: we developed software that can extract notifiabledisease data presented in tabular form from a pdf file. We combinedthe methodology of figure placement detection with the in-housedeveloped table extraction and annotation heuristics.For parsing: we determined what to extract from each pdf dataset from the survey conducted. We then parsed the extracted datainto uniform data structures correctly accommodating the dimensionssurveyed and the various human languages. This task involvedingesting notifiable disease data in many disparate formats extractedfrom pdf files and coalescing the data into a standardized format.For persisting: We then store the data in the Epi ArchivePostgreSQL database and make it available through the BSVE.ResultsThe EpiArchive tool currently contains subnational notifiabledisease data from 10 nations. When a user accesses the EpiArchivesite, they are prompted with four fields: country, region, disease,and date duration. These fields allow the user to specify the location(down to the state level), the disease of interest, and the durationof interest. Upon form submission, a time series is generated fromthe users’ specifications. The generated time series can then bedownloaded into a csv file if a user is interested in performingpersonal analysis. Additionally, the data from EpiArchive can bereached through an API.ConclusionsLANL as part of a currently funded DTRA effort so that it willautomatically and continuously collect global notifiable diseasedata—particularly data stored in pdf files—and make it available andshareable within the Biosurveillance Ecosystem (BSVE) as a newdata source. This will provide data to analytics and users that willimprove the prediction and early warning of disease events and otherapplications.

Download Full-text

CFTR Lifecycle Map—A Systems Medicine Model of CFTR Maturation to Predict Possible Active Compound Combinations

International Journal of Molecular Sciences ◽

10.3390/ijms22147590 ◽

2021 ◽

Vol 22 (14) ◽

pp. 7590

Author(s):

Liza Vinhoven ◽

Frauke Stanke ◽

Sylvia Hafkemeyer ◽

Manuel Manfred Nietert

Keyword(s):

Large Scale ◽

Synergistic Effects ◽

Small Scale ◽

Systems Medicine ◽

Promising Candidate ◽

Cftr Mutations ◽

Machine Readable ◽

High Throughput Screens ◽

Readable Format ◽

Machine Readable Format

Different causative therapeutics for CF patients have been developed. There are still no mutation-specific therapeutics for some patients, especially those with rare CFTR mutations. For this purpose, high-throughput screens have been performed which result in various candidate compounds, with mostly unclear modes of action. In order to elucidate the mechanism of action for promising candidate substances and to be able to predict possible synergistic effects of substance combinations, we used a systems biology approach to create a model of the CFTR maturation pathway in cells in a standardized, human- and machine-readable format. It is composed of a core map, manually curated from small-scale experiments in human cells, and a coarse map including interactors identified in large-scale efforts. The manually curated core map includes 170 different molecular entities and 156 reactions from 221 publications. The coarse map encompasses 1384 unique proteins from four publications. The overlap between the two data sources amounts to 46 proteins. The CFTR Lifecycle Map can be used to support the identification of potential targets inside the cell and elucidate the mode of action for candidate substances. It thereby provides a backbone to structure available data as well as a tool to develop hypotheses regarding novel therapeutics.

Download Full-text

biotoolsSchema: a formalized schema for bioinformatics software description

GigaScience ◽

10.1093/gigascience/giaa157 ◽

2021 ◽

Vol 10 (1) ◽

Author(s):

Jon Ison ◽

Hans Ienasescu ◽

Emil Rydza ◽

Piotr Chmura ◽

Kristoffer Rapacki ◽

...

Keyword(s):

Scientific Community ◽

Life Science ◽

Technical Information ◽

Heterogeneous Data ◽

Digital Resources ◽

Data Intensive ◽

Research Software ◽

Bioinformatics Software ◽

Scientific Context ◽

Machine Readable

Abstract Background Life scientists routinely face massive and heterogeneous data analysis tasks and must find and access the most suitable databases or software in a jungle of web-accessible resources. The diversity of information used to describe life-scientific digital resources presents an obstacle to their utilization. Although several standardization efforts are emerging, no information schema has been sufficiently detailed to enable uniform semantic and syntactic description—and cataloguing—of bioinformatics resources. Findings Here we describe biotoolsSchema, a formalized information model that balances the needs of conciseness for rapid adoption against the provision of rich technical information and scientific context. biotoolsSchema results from a series of community-driven workshops and is deployed in the bio.tools registry, providing the scientific community with >17,000 machine-readable and human-understandable descriptions of software and other digital life-science resources. We compare our approach to related initiatives and provide alignments to foster interoperability and reusability. Conclusions biotoolsSchema supports the formalized, rigorous, and consistent specification of the syntax and semantics of bioinformatics resources, and enables cataloguing efforts such as bio.tools that help scientists to find, comprehend, and compare resources. The use of biotoolsSchema in bio.tools promotes the FAIRness of research software, a key element of open and reproducible developments for data-intensive sciences.

Download Full-text

KiMoSys 2.0: an upgraded database for submitting, storing and accessing experimental data for kinetic modeling

Database ◽

10.1093/database/baaa093 ◽

2020 ◽

Vol 2020 ◽

Author(s):

Hugo Mochão ◽

Pedro Barahona ◽

Rafael S Costa

Keyword(s):

Experimental Data ◽

Metabolic Networks ◽

Model Simulation ◽

Web Interface ◽

Concentration Data ◽

Web Based ◽

Share Data ◽

Visualization Tools ◽

Filter Mechanism ◽

Machine Readable

Abstract The KiMoSys (https://kimosys.org), launched in 2014, is a public repository of published experimental data, which contains concentration data of metabolites, protein abundances and flux data. It offers a web-based interface and upload facility to share data, making it accessible in structured formats, while also integrating associated kinetic models related to the data. In addition, it also supplies tools to simplify the construction process of ODE (Ordinary Differential Equations)-based models of metabolic networks. In this release, we present an update of KiMoSys with new data and several new features, including (i) an improved web interface, (ii) a new multi-filter mechanism, (iii) introduction of data visualization tools, (iv) the addition of downloadable data in machine-readable formats, (v) an improved data submission tool, (vi) the integration of a kinetic model simulation environment and (vii) the introduction of a unique persistent identifier system. We believe that this new version will improve its role as a valuable resource for the systems biology community. Database URL: www.kimosys.org

Download Full-text

Optical Music Analysis for Printed Music Score and Handwritten Music Manuscript

Visual Perception of Music Notation ◽

10.4018/978-1-59140-298-5.ch004 ◽

2011 ◽

pp. 108-127 ◽

Cited By ~ 5

Author(s):

Kia Ng

Keyword(s):

Domain Knowledge ◽

Imaging System ◽

Data Representation ◽

Divide And Conquer ◽

Music Score ◽

Segmentation Approach ◽

Printed Music ◽

High Level ◽

Machine Readable ◽

Readable Format

This chapter describes an optical document imaging system to transform paper-based music scores and manuscripts into machine-readable format and a restoration system to touch-up small imperfections (for example broken stave lines and stems), to restore deteriorated master copy for reprinting. The chapter presents a brief background of this field, discusses the main obstacles, and presents the processes involved for printed music scores processing; using a divide-and-conquer approach to sub-segment compound musical symbols (e.g., chords) and inter-connected groups (e.g., beamed quavers) into lower-level graphical primitives (e.g., lines and ellipses) before recognition and reconstruction. This is followed by discussions on the developments of a handwritten manuscripts prototype with a segmentation approach to separate handwritten musical primitives. Issues and approaches for recognition, reconstruction and revalidation using basic music syntax and high-level domain knowledge, and data representation are also presented.

Download Full-text

A Design Tool for Business Process Design and Representation

Selected Readings on Information Technology Management ◽

10.4018/978-1-60566-092-9.ch010 ◽

2010 ◽

pp. 160-177

Author(s):

Roberto Paiano ◽

Anna Lisa Guido

Keyword(s):

Business Process ◽

Process Design ◽

Design Tool ◽

Formal Representation ◽

Middle Point ◽

Business Process Design ◽

Semantic Web Technology ◽

Web Information System ◽

Machine Readable ◽

Readable Format

In this chapter the focus is on business process design as middle point between requirement elicitation and implementation of a Web information system. We face both the problem of the notation to adopt in order to represent in a simple way the business process and the problem of a formal representation, in a machine-readable format, of the design. We adopt Semantic Web technology to represent process and we explain how this technology has been used to reach our goals.

Download Full-text

A Design Tool for Business Process Design and Representation

Electronic Business ◽

10.4018/978-1-60566-056-1.ch029 ◽

2009 ◽

pp. 451-468

Author(s):

Roberto Paiano ◽

Anna Lisa Guido

Keyword(s):

Business Process ◽

Process Design ◽

Design Tool ◽

Formal Representation ◽

Middle Point ◽

Business Process Design ◽

Semantic Web Technology ◽

Web Information System ◽

Machine Readable ◽

Readable Format

Download Full-text

Dynamical entropic analysis of scientific concepts

Journal of Information Science ◽

10.1177/0165551520972034 ◽

2020 ◽

pp. 016555152097203

Author(s):

Artem Chumachenko ◽

Boris Kreminskyi ◽

Iurii Mosenkis ◽

Alexander Yakimenko

Keyword(s):

Scientific Progress ◽

Knowledge Mapping ◽

Scientific Publications ◽

Actual Structure ◽

Information Theoretic ◽

Semantic Component ◽

Human Interpretation ◽

Basic Concepts ◽

Machine Readable ◽

Readable Format

In the present era of information, the problem of effective knowledge retrieval from a collection of scientific documents becomes especially important for continuous scientific progress. The information available in scientific publications traditionally consists of bibliometric metadata and its semantic component such as title, abstract and text. While the former having a machine-readable format usually used for knowledge mapping and pattern recognition, the latter designed for human interpretation and analysis. Only a few studies use full-text analysis, based on carefully selected scientific ontology, to map the actual structure of the scientific knowledge or uncover similarities between documents. Unfortunately, the presence of common (basic) concepts across semantically unrelated documents creates spurious connections between different topics. We revise the known method based on the entropic information-theoretic measure used for selecting basic concepts and propose to analyse the dynamics of Shannon entropy for more rigorous sorting of concepts by their generality.

Download Full-text