scholarly journals Epi Archive: Automated Synthesis of Global Notifiable Disease Data

2019 ◽  
Vol 11 (1) ◽  
Author(s):  
Hari S. Kkalsa ◽  
Sergio Rene Cordova ◽  
Nicholas Generous

ObjectiveAutomatically collect and synthesize global notifiable disease data and make it available to humans and computers. Provide the data on the web and within the Biosurveillance Ecosystem (BSVE) as a novel data stream. These data have many applications including improving the prediction and early warning of disease events.IntroductionGovernment reporting of notifiable disease data is common and widespread, though most countries do not report in a machine-readable format. This is despite the WHO International Health Regulations stating that “[e]ach State Party shall notify WHO, by the most efficient means of communication available.” 1Data are often in the form of a file that contains text, tables and graphs summarizing weekly or monthly disease counts. This presents a problem when information is needed for more data intensive approaches to epidemiology, biosurveillance and public health. While most nations likely store incident data in a machine-readable format, governments can be hesitant to share data openly for a variety of reasons that include technical, political, economic, and motivational2.A survey conducted by LANL of notifiable disease data reporting in over fifty countries identified only a few websites that report data in a machine-readable format. The majority (>70%) produce reports as PDF files on a regular basis. The bulk of the PDF reports present data in a structured tabular format, while some report in natural language or graphical charts.The structure and format of PDF reports change often; this adds to the complexity of identifying and parsing the desired data. Not all websites publish in English, and it is common to find typos and clerical errors.LANL has developed a tool, Epi Archive, to collect global notifiable disease data automatically and continuously and make it uniform and readily accessible.MethodsA survey of the national notifiable disease reporting systems is periodically conducted notating how the data are reported and in what formats. We determined the minimal metadata that is required to contextualize incident counts properly, as well as optional metadata that is commonly found.The development of software to regularly ingest notifiable disease data and make it available involves three to four main steps: scraping, detecting, parsing and persisting.Scraping: we examine website design and determine reporting mechanisms for each country/website, as well as what varies across the reporting mechanisms. We then design and write code to automate the downloading of data for each country. We store all artifacts presented as files (PDF, XLSX, etc.) in their original form, along with appropriate metadata for parsing and data provenance.Detecting: This step is required when parsing structured non-machine-readable data, such as tabular data in PDF files. We combine the Nurminen methodology of PDF table detection with in-house heuristics to find the desired data within PDF reports3.Parsing: We determine what to extract from each dataset and parse these data into uniform data structures, correctly accommodating the variations in metadata (e.g., time interval definitions) and the various human languages.Persisting: We store the data in the Epi Archive database and make it available on the internet and through the BSVE. The data is persisted into a structured and normalized SQL database.ResultsEpi Archive currently contains national and/or subnational notifiable disease data from thirty-nine nations. When a user accesses the Epi Archive site, they are able to peruse, chart and download data by country, subregion, disease and time interval. Access to a cached version of the original artifacts (e.g. PDF files), a link to the source and additional metadata is also available through the user interface. Finally, to ensure machine-readability, the data from Epi Archive can be reached through a REST API. http://epiarchive.bsvgateway.org/ConclusionsLANL, as part of a currently funded DTRA effort, is automatically and continually collecting global notifiable disease data. While thirty-nine nations are in production, more are being brought online in the near future. These data are already being utilized and have many applications, including improving the prediction and early warning of disease events.References[1] WHO International Health Regulations, edition 3. http://apps.who.int/iris/bitstream/10665/246107/1/9789241580496-eng.pdf[2] van Panhuis WG, Paul P, Emerson C, et al. A systematic review of barriers to data sharing in public health. BMC Public Health. 2014. 14:1144. doi:10.1186/1471-2458-14-1144[3] Nurminen, Anssi. "Algorithmic extraction of data in tables in PDF documents." (2013). 

2018 ◽  
Vol 10 (1) ◽  
Author(s):  
Hari S. Khalsa ◽  
Sergio Cordova ◽  
Nicholas Generous ◽  
Prabhu S. Khalsa ◽  
Byron Tasseff ◽  
...  

ObjectiveLANL has built software that automatically collects global notifiable disease data, synthesizes the data, and makes it available to humans and computers within the Biosurveillance Ecosystem (BSVE) as a novel data stream. These data have many applications including improving the prediction and early warning of disease events.IntroductionMost countries do not report national notifiable disease data in a machine-readable format. Data are often in the form of a file that contains text, tables and graphs summarizing weekly or monthly disease counts. This presents a problem when information is needed for more data intensive approaches to epidemiology, biosurveillance and public health.While most nations likely store incident data in a machine-readable format, governments are often hesitant to share data openly for a variety of reasons that include technical, political, economic, and motivational issues1.A survey conducted by LANL of notifiable disease data reporting in over fifty countries identified only a few websites that report data in a machine-readable format. The majority (>70%) produce reports as PDF files on a regular basis. The bulk of the PDF reports present data in a structured tabular format, while some report in natural language.The structure and format of PDF reports change often; this adds to the complexity of identifying and parsing the desired data. Not all websites publish in English, and it is common to find typos and clerical errors.LANL has developed a tool, Epi Archive, to collect global notifiable disease data automatically and continuously and make it uniform and readily accessible.MethodsWe conducted a survey of the national notifiable disease reporting systems notating how the data are reported and in what formats. We determined the minimal metadata that is required to contextualize incident counts properly, as well as optional metadata that is commonly found.The development of software to regularly ingest notifiable disease data and make it available involves three or four main steps: scraping, detecting, parsing and persisting.Scraping: we examine website design and determine reporting mechanisms for each country/website, as well as what varies across the reporting mechanisms. We then designed and wrote code to automate the downloading of the data for each country. We store all artifacts presented as files (PDF, XLSX, etc.) in their original form, along with appropriate metadata for parsing and data provenance.Detecting: This step is required when parsing structured non-machine-readable data such as tabular data in PDF files. We combined the Nurminen methodology of PDF table detection with in-house heuristics to find the desired data within PDF reports2.Parsing: We determined what to extract from each dataset and parsed these data into uniform data structures, correctly accommodating the variations in metadata (e.g., time interval definitions) and the various human languages.Persisting: We store the data in the Epi Archive database and make it available on the internet and through the BSVE. The data is persisted into a structured and normalized SQL database.ResultsThe Epi Archive tool currently contains national and/or subnational notifiable disease data from twenty nations. When a user accesses the Epi Archive site, they are prompted with four fields: country, subregion, disease of interest, and date duration. Upon form submission, a time series is generated from the users’ specifications. The generated graph can then be downloaded into a CSV file if a user is interested in performing personal analysis. Additionally, the data from Epi Archive can be reached through a REST API (Representational State Transfer Application Programming Interface).ConclusionsLANL, as part of a currently funded DTRA effort, is automatically and continually collecting global notifiable disease data. While 20 nations are in production, more are being brought online in the near future. These data are already being utilized and will have many applications including improving the prediction and early warning of disease events.References[1] van Panhuis WG, Paul P, Emerson C, et al. A systematic review of barriers to data sharing in public health. BMC Public Health. 2014. 14:1144. doi:10.1186/1471-2458-14-1144[2] Nurminen, Anssi. "Algorithmic extraction of data in tables in PDF documents." (2013).


2017 ◽  
Vol 22 (24) ◽  
Author(s):  
Rosa M López-Gigosos ◽  
Marina Segura ◽  
Rosa M Díez-Díaz ◽  
Isabel Ureña ◽  
David Urzay ◽  
...  

The international maritime traffic of people and goods has often contributed to the spread of pathogens affecting public health. The Maritime Declaration of Health (MDH), according to the International Health Regulations (IHR) (2005), is a document containing data related to the state of health on board a ship during passage and on arrival at port. It is a useful tool for early detection of public health risks. The main objective of our study was to evaluate compliance with the model provided in the IHR, focusing on the format and degree of completion of MDH forms received at Spanish ports. We reviewed the content of 802 MDH forms submitted to nine Spanish ports between October 2014 and March 2015. Study results show that 22% of MDH forms presented did not comply with the recommended model and 39% were incomplete. The proportion of cargo ships with correct and complete MDH forms was lower than passenger ships; thus, the nine health questions were answered less frequently by cargo ships than passenger ships (63% vs 90%, p value < 0.001). The appropriate demand and usage of MDH forms by competent authorities should improve the quality of the document as a tool and improve risk assessment.


2014 ◽  
Vol 3 (3) ◽  
pp. 266 ◽  
Author(s):  
AnneHarwood Peruski ◽  
Maureen Birmingham ◽  
Chawalit Tantinimitkul ◽  
Ladawan Chungsamanukool ◽  
Preecha Chungsamanukool ◽  
...  

2020 ◽  
Vol 5 (6) ◽  
pp. e002502 ◽  
Author(s):  
Lucia Mullen ◽  
Christina Potter ◽  
Lawrence O Gostin ◽  
Anita Cicero ◽  
Jennifer B Nuzzo

IntroductionNine events have been assessed for potential declaration of a Public Health Emergency of International Concern (PHEIC). A PHEIC is defined as an extraordinary event that constitutes a public health risk to other states through international spread and requires a coordinated international response. The WHO Director-General convenes Emergency Committees (ECs) to provide their advice on whether an event constitutes a PHEIC. The EC rationales have been criticised for being non-transparent and contradictory to the International Health Regulations (IHR). This first comprehensive analysis of EC rationale provides recommendations to increase clarity of EC decisions which will strengthen the IHR and WHO’s legitimacy in future outbreaks.Methods66 EC statements were reviewed from nine public health outbreaks of influenza A, Middle East respiratory syndrome coronavirus, polio, Ebola virus disease, Zika, yellow fever and coronavirus disease-2019. Statements were analysed to determine which of the three IHR criteria were noted as contributing towards the EC’s justification on whether to declare a PHEIC and what language was used to explain the decision.ResultsInterpretation of the criteria were often vague and applied inconsistently. ECs often failed to describe and justify which criteria had been satisfied.DiscussionGuidelines must be developed for the standardised interpretation of IHR core criteria. The ECs must clearly identify and justify which criteria have contributed to their rationale for or against PHEIC declaration.ConclusionStriving for more consistency and transparency in EC justifications would benefit future deliberations and provide more understanding and support for the process.


2021 ◽  
Vol 60 (91) ◽  
pp. 271-286
Author(s):  
Jovana Blešić

The World Health Organization (WHO) is one of the UN specialized agencies. Its work and functions gained even more importance in 2020 with the emergence of the corona virus. The eyes of the entire international community focused on this organization and its Director General. Nowadays, its efficiency has been subject to various forms of criticism. In this paper, the author first provides a brief overview of this organization and its significance. The central part of the paper focuses on the activities of the WHO during the Covid-19 pandemic, through the clarification of the concept of public health emergency of international concern and the use of International Health Regulations. Finally, the author discusses the possible reform of this body. The aim of this paper is to familiarize the readers with the World Health Organization and put its activities in the context of the ongoing Covid-19 pandemic.


Sign in / Sign up

Export Citation Format

Share Document