Epi Archive: Automated Synthesis of Global Notifiable Disease Data

Hari S. Kkalsa; Sergio Rene Cordova; Nicholas Generous

doi:10.5210/ojphi.v11i1.9707

Epi Archive: Automated Synthesis of Global Notifiable Disease Data

Online Journal of Public Health Informatics ◽

10.5210/ojphi.v11i1.9707 ◽

2019 ◽

Vol 11 (1) ◽

Author(s):

Hari S. Kkalsa ◽

Sergio Rene Cordova ◽

Nicholas Generous

Keyword(s):

Public Health ◽

Early Warning ◽

International Health ◽

Time Interval ◽

International Health Regulations ◽

Notifiable Disease ◽

Machine Readable ◽

Health Regulations ◽

Readable Format ◽

Machine Readable Format

ObjectiveAutomatically collect and synthesize global notifiable disease data and make it available to humans and computers. Provide the data on the web and within the Biosurveillance Ecosystem (BSVE) as a novel data stream. These data have many applications including improving the prediction and early warning of disease events.IntroductionGovernment reporting of notifiable disease data is common and widespread, though most countries do not report in a machine-readable format. This is despite the WHO International Health Regulations stating that “[e]ach State Party shall notify WHO, by the most efficient means of communication available.” 1Data are often in the form of a file that contains text, tables and graphs summarizing weekly or monthly disease counts. This presents a problem when information is needed for more data intensive approaches to epidemiology, biosurveillance and public health. While most nations likely store incident data in a machine-readable format, governments can be hesitant to share data openly for a variety of reasons that include technical, political, economic, and motivational2.A survey conducted by LANL of notifiable disease data reporting in over fifty countries identified only a few websites that report data in a machine-readable format. The majority (>70%) produce reports as PDF files on a regular basis. The bulk of the PDF reports present data in a structured tabular format, while some report in natural language or graphical charts.The structure and format of PDF reports change often; this adds to the complexity of identifying and parsing the desired data. Not all websites publish in English, and it is common to find typos and clerical errors.LANL has developed a tool, Epi Archive, to collect global notifiable disease data automatically and continuously and make it uniform and readily accessible.MethodsA survey of the national notifiable disease reporting systems is periodically conducted notating how the data are reported and in what formats. We determined the minimal metadata that is required to contextualize incident counts properly, as well as optional metadata that is commonly found.The development of software to regularly ingest notifiable disease data and make it available involves three to four main steps: scraping, detecting, parsing and persisting.Scraping: we examine website design and determine reporting mechanisms for each country/website, as well as what varies across the reporting mechanisms. We then design and write code to automate the downloading of data for each country. We store all artifacts presented as files (PDF, XLSX, etc.) in their original form, along with appropriate metadata for parsing and data provenance.Detecting: This step is required when parsing structured non-machine-readable data, such as tabular data in PDF files. We combine the Nurminen methodology of PDF table detection with in-house heuristics to find the desired data within PDF reports3.Parsing: We determine what to extract from each dataset and parse these data into uniform data structures, correctly accommodating the variations in metadata (e.g., time interval definitions) and the various human languages.Persisting: We store the data in the Epi Archive database and make it available on the internet and through the BSVE. The data is persisted into a structured and normalized SQL database.ResultsEpi Archive currently contains national and/or subnational notifiable disease data from thirty-nine nations. When a user accesses the Epi Archive site, they are able to peruse, chart and download data by country, subregion, disease and time interval. Access to a cached version of the original artifacts (e.g. PDF files), a link to the source and additional metadata is also available through the user interface. Finally, to ensure machine-readability, the data from Epi Archive can be reached through a REST API. http://epiarchive.bsvgateway.org/ConclusionsLANL, as part of a currently funded DTRA effort, is automatically and continually collecting global notifiable disease data. While thirty-nine nations are in production, more are being brought online in the near future. These data are already being utilized and have many applications, including improving the prediction and early warning of disease events.References[1] WHO International Health Regulations, edition 3. http://apps.who.int/iris/bitstream/10665/246107/1/9789241580496-eng.pdf[2] van Panhuis WG, Paul P, Emerson C, et al. A systematic review of barriers to data sharing in public health. BMC Public Health. 2014. 14:1144. doi:10.1186/1471-2458-14-1144[3] Nurminen, Anssi. "Algorithmic extraction of data in tables in PDF documents." (2013).

Download Full-text

Epi Archive: Automated Synthesis of Global Notifiable Disease Data

Online Journal of Public Health Informatics ◽

10.5210/ojphi.v10i1.8323 ◽

2018 ◽

Vol 10 (1) ◽

Author(s):

Hari S. Khalsa ◽

Sergio Cordova ◽

Nicholas Generous ◽

Prabhu S. Khalsa ◽

Byron Tasseff ◽

...

Keyword(s):

Public Health ◽

Early Warning ◽

Application Programming Interface ◽

Original Form ◽

Time Interval ◽

Website Design ◽

Notifiable Disease ◽

Machine Readable ◽

Readable Format ◽

Machine Readable Format

ObjectiveLANL has built software that automatically collects global notifiable disease data, synthesizes the data, and makes it available to humans and computers within the Biosurveillance Ecosystem (BSVE) as a novel data stream. These data have many applications including improving the prediction and early warning of disease events.IntroductionMost countries do not report national notifiable disease data in a machine-readable format. Data are often in the form of a file that contains text, tables and graphs summarizing weekly or monthly disease counts. This presents a problem when information is needed for more data intensive approaches to epidemiology, biosurveillance and public health.While most nations likely store incident data in a machine-readable format, governments are often hesitant to share data openly for a variety of reasons that include technical, political, economic, and motivational issues1.A survey conducted by LANL of notifiable disease data reporting in over fifty countries identified only a few websites that report data in a machine-readable format. The majority (>70%) produce reports as PDF files on a regular basis. The bulk of the PDF reports present data in a structured tabular format, while some report in natural language.The structure and format of PDF reports change often; this adds to the complexity of identifying and parsing the desired data. Not all websites publish in English, and it is common to find typos and clerical errors.LANL has developed a tool, Epi Archive, to collect global notifiable disease data automatically and continuously and make it uniform and readily accessible.MethodsWe conducted a survey of the national notifiable disease reporting systems notating how the data are reported and in what formats. We determined the minimal metadata that is required to contextualize incident counts properly, as well as optional metadata that is commonly found.The development of software to regularly ingest notifiable disease data and make it available involves three or four main steps: scraping, detecting, parsing and persisting.Scraping: we examine website design and determine reporting mechanisms for each country/website, as well as what varies across the reporting mechanisms. We then designed and wrote code to automate the downloading of the data for each country. We store all artifacts presented as files (PDF, XLSX, etc.) in their original form, along with appropriate metadata for parsing and data provenance.Detecting: This step is required when parsing structured non-machine-readable data such as tabular data in PDF files. We combined the Nurminen methodology of PDF table detection with in-house heuristics to find the desired data within PDF reports2.Parsing: We determined what to extract from each dataset and parsed these data into uniform data structures, correctly accommodating the variations in metadata (e.g., time interval definitions) and the various human languages.Persisting: We store the data in the Epi Archive database and make it available on the internet and through the BSVE. The data is persisted into a structured and normalized SQL database.ResultsThe Epi Archive tool currently contains national and/or subnational notifiable disease data from twenty nations. When a user accesses the Epi Archive site, they are prompted with four fields: country, subregion, disease of interest, and date duration. Upon form submission, a time series is generated from the users’ specifications. The generated graph can then be downloaded into a CSV file if a user is interested in performing personal analysis. Additionally, the data from Epi Archive can be reached through a REST API (Representational State Transfer Application Programming Interface).ConclusionsLANL, as part of a currently funded DTRA effort, is automatically and continually collecting global notifiable disease data. While 20 nations are in production, more are being brought online in the near future. These data are already being utilized and will have many applications including improving the prediction and early warning of disease events.References[1] van Panhuis WG, Paul P, Emerson C, et al. A systematic review of barriers to data sharing in public health. BMC Public Health. 2014. 14:1144. doi:10.1186/1471-2458-14-1144[2] Nurminen, Anssi. "Algorithmic extraction of data in tables in PDF documents." (2013).

Download Full-text

WHO events addressing public health priorities: national focal points for International Health Regulations

Eastern Mediterranean Health Journal ◽

10.26719/2017.23.10.715 ◽

2017 ◽

Vol 23 (10) ◽

pp. 715-716

Keyword(s):

Public Health ◽

International Health ◽

Health Priorities ◽

Focal Points ◽

International Health Regulations ◽

Health Regulations

Download Full-text

The Maritime Declaration of Health (MDH) as a tool to detect maritime traffic-related health risks: analysis of MDH forms submitted to Spanish ports, October 2014 to March 2015

Eurosurveillance ◽

10.2807/1560-7917.es.2017.22.24.30551 ◽

2017 ◽

Vol 22 (24) ◽

Author(s):

Rosa M López-Gigosos ◽

Marina Segura ◽

Rosa M Díez-Díaz ◽

Isabel Ureña ◽

David Urzay ◽

...

Keyword(s):

Public Health ◽

Health Risks ◽

International Health ◽

P Value ◽

International Health Regulations ◽

Maritime Traffic ◽

Study Results ◽

Passenger Ships ◽

Health Regulations

The international maritime traffic of people and goods has often contributed to the spread of pathogens affecting public health. The Maritime Declaration of Health (MDH), according to the International Health Regulations (IHR) (2005), is a document containing data related to the state of health on board a ship during passage and on arrival at port. It is a useful tool for early detection of public health risks. The main objective of our study was to evaluate compliance with the model provided in the IHR, focusing on the format and degree of completion of MDH forms received at Spanish ports. We reviewed the content of 802 MDH forms submitted to nine Spanish ports between October 2014 and March 2015. Study results show that 22% of MDH forms presented did not comply with the recommended model and 39% were incomplete. The proportion of cargo ships with correct and complete MDH forms was lower than passenger ships; thus, the nine health questions were answered less frequently by cargo ships than passenger ships (63% vs 90%, p value < 0.001). The appropriate demand and usage of MDH forms by competent authorities should improve the quality of the document as a tool and improve risk assessment.

Download Full-text

Strengthening public health laboratory capacity in Thailand for International Health Regulations (IHR) (2005)

WHO South-East Asia Journal of Public Health ◽

10.4103/2224-3151.206749 ◽

2014 ◽

Vol 3 (3) ◽

pp. 266 ◽

Cited By ~ 1

Author(s):

AnneHarwood Peruski ◽

Maureen Birmingham ◽

Chawalit Tantinimitkul ◽

Ladawan Chungsamanukool ◽

Preecha Chungsamanukool ◽

...

Keyword(s):

Public Health ◽

International Health ◽

Public Health Laboratory ◽

International Health Regulations ◽

Laboratory Capacity ◽

Health Regulations

Download Full-text

Assessment of Public Health Events through International Health Regulations, United States, 2007–2011

Emerging Infectious Diseases ◽

10.3201/eid1807.120231 ◽

2012 ◽

Vol 18 (7) ◽

pp. 1047-1053 ◽

Cited By ~ 11

Author(s):

Katrin S. Kohl ◽

Ray R. Arthur ◽

Ralph O’Connor ◽

Jose Fernandez

Keyword(s):

Public Health ◽

United States ◽

International Health ◽

International Health Regulations ◽

Health Events ◽

Health Regulations

Download Full-text

An analysis of International Health Regulations Emergency Committees and Public Health Emergency of International Concern Designations

BMJ Global Health ◽

10.1136/bmjgh-2020-002502 ◽

2020 ◽

Vol 5 (6) ◽

pp. e002502 ◽

Cited By ~ 4

Author(s):

Lucia Mullen ◽

Christina Potter ◽

Lawrence O Gostin ◽

Anita Cicero ◽

Jennifer B Nuzzo

Keyword(s):

Public Health ◽

Influenza A ◽

Ebola Virus ◽

Virus Disease ◽

International Health ◽

Public Health Emergency ◽

Director General ◽

International Health Regulations ◽

International Response ◽

Health Regulations

IntroductionNine events have been assessed for potential declaration of a Public Health Emergency of International Concern (PHEIC). A PHEIC is defined as an extraordinary event that constitutes a public health risk to other states through international spread and requires a coordinated international response. The WHO Director-General convenes Emergency Committees (ECs) to provide their advice on whether an event constitutes a PHEIC. The EC rationales have been criticised for being non-transparent and contradictory to the International Health Regulations (IHR). This first comprehensive analysis of EC rationale provides recommendations to increase clarity of EC decisions which will strengthen the IHR and WHO’s legitimacy in future outbreaks.Methods66 EC statements were reviewed from nine public health outbreaks of influenza A, Middle East respiratory syndrome coronavirus, polio, Ebola virus disease, Zika, yellow fever and coronavirus disease-2019. Statements were analysed to determine which of the three IHR criteria were noted as contributing towards the EC’s justification on whether to declare a PHEIC and what language was used to explain the decision.ResultsInterpretation of the criteria were often vague and applied inconsistently. ECs often failed to describe and justify which criteria had been satisfied.DiscussionGuidelines must be developed for the standardised interpretation of IHR core criteria. The ECs must clearly identify and justify which criteria have contributed to their rationale for or against PHEIC declaration.ConclusionStriving for more consistency and transparency in EC justifications would benefit future deliberations and provide more understanding and support for the process.

Download Full-text

International Health Regulations: putting public health on the centre stage

The Lancet ◽

10.1016/s0140-6736(05)76812-4 ◽

1999 ◽

Vol 354 (9195) ◽

pp. 2062 ◽

Cited By ~ 1

Author(s):

Haroon Ashraf

Keyword(s):

Public Health ◽

International Health ◽

International Health Regulations ◽

Health Regulations

Download Full-text

The world health organization: The (un)attainable ideal?

Zbornik radova Pravnog fakulteta Nis ◽

10.5937/zrpfn0-33746 ◽

2021 ◽

Vol 60 (91) ◽

pp. 271-286

Author(s):

Jovana Blešić

Keyword(s):

Public Health ◽

World Health Organization ◽

International Health ◽

Public Health Emergency ◽

World Health ◽

Director General ◽

International Health Regulations ◽

The World ◽

Health Organization ◽

Health Regulations

The World Health Organization (WHO) is one of the UN specialized agencies. Its work and functions gained even more importance in 2020 with the emergence of the corona virus. The eyes of the entire international community focused on this organization and its Director General. Nowadays, its efficiency has been subject to various forms of criticism. In this paper, the author first provides a brief overview of this organization and its significance. The central part of the paper focuses on the activities of the WHO during the Covid-19 pandemic, through the clarification of the concept of public health emergency of international concern and the use of International Health Regulations. Finally, the author discusses the possible reform of this body. The aim of this paper is to familiarize the readers with the World Health Organization and put its activities in the context of the ongoing Covid-19 pandemic.

Download Full-text

The New International Health Regulations: An Historic Development for International Law and Public Health

Global Health ◽

10.4324/9781315254227-36 ◽

2017 ◽

pp. 489-498

Author(s):

David P. Fidler ◽

Lawrence O. Gostin

Keyword(s):

Public Health ◽

International Law ◽

International Health ◽

International Health Regulations ◽

Historic Development ◽

Health Regulations

Download Full-text

Who Lyon Office: Supporting countries in achieving the international health regulations (2005) core capacities for public health surveillance

Revue d Épidémiologie et de Santé Publique ◽

10.1016/j.respe.2018.05.421 ◽

2018 ◽

Vol 66 ◽

pp. S391

Author(s):

J. Guerra ◽

S. Cognat ◽

F. Fuchs

Keyword(s):

Public Health ◽

Public Health Surveillance ◽

International Health ◽

Health Surveillance ◽

International Health Regulations ◽

Health Regulations

Download Full-text