scholarly journals Creating and Maintaining TDWG Vocabularies using Spreadsheets

Author(s):  
Steven J Baskauf

Because TDWG vocabularies change and grow as they are developed by the community, it is nearly impossible to document their version history and generate both machine and human readable documentation by manual editing of multiple documents in several formats. In this talk, I will provide an overview of the workflow that has been established to maintain vocabularies in accordance with the TDWG Standards Documentation and Vocabulary Maintenance specifications. I will show how vocabulary creators and maintainers can use simple CSV spreadsheets to create new vocabularies or to update existing ones. I will also provide an overview of the Python scripts that TDWG infrastructure maintainers use to process those simple spreadsheets to turn them into the authoritative files in TDWG's rs.tdwg.org GitHub repository, which serves as the data source for both machine readable serializations of the vocabularies and human readable standards documents.

2018 ◽  
Vol 25 (9) ◽  
pp. 1240-1247
Author(s):  
Catherine Anne Grandy ◽  
Jennifer R Donnan ◽  
Justin T Peddle ◽  
Kristen Romme ◽  
Satpyul Kim ◽  
...  

Abstract Objective To identify and describe clinical drug data sources that have the potential to serve as a repository of information for developing drug knowledge translation products. Methods Two reviewers independently screened citations from PubMed and Embase, websites from the web search engine Google, and references from selected journals. Publicly licensed or non-proprietary data sources containing clinical drug information accessible in a machine-readable format were eligible. Data sources were assessed for their coverage across 18 pre-specified domains and 74 elements of clinical drug information. Results Of the 3369 unique citations or webpages screened, 44 drug information data sources were identified. Of these, 22 data sources met the study inclusion criteria. There was a mean of 4.5 (SD = 5.19) domains covered by each source and a mean of 10.9 (SD = 18) elements covered by each source. None of the data sources covered all domains and eight elements were not addressed by any source. All of the data sources identified by the study are government or academic databases. Conclusion Our study demonstrated the availability of machine-readable clinical drug data that could help facilitate the creation of novel drug knowledge translation products. However, we identified clinical content gaps in the available non-proprietary drug information sources. Further evaluation of the quality of each data source would be necessary prior to incorporating these sources into any knowledge translation products intended for clinical use.


Author(s):  
Nicholas Generous ◽  
Geoffrey Fairchild ◽  
Hari Khalsa ◽  
Byron Tasseff ◽  
James Arnold

ObjectiveLANL has built a software program that automatically collectsglobal notifiable disease data—particularly data stored in files—andmakes it available and shareable within the Biosurveillance Ecosystem(BSVE) as a new data source. This will improve the prediction andearly warning of disease events and other applications.IntroductionMost countries do not report national notifiable disease data in amachine-readable format. Data are often in the form of a file thatcontains text, tables and graphs summarizing weekly or monthlydisease counts. This presents a problem when information is neededfor more data intensive approaches to epidemiology, biosurveillanceand public health as exemplified by the Biosurveillance Ecosystem(BSVE).While most nations do likely store their data in a machine-readableformat, the governments are often hesitant to share data openly fora variety of reasons that include technical, political, economic, andmotivational issues [1]. For example, an attempt by LANL to obtaina weekly version of openly available monthly data, reported by theAustralian government, resulted in an onerous bureaucratic reply. Theobstacles to obtaining data included: paperwork to request data fromeach of the Australian states and territories, a long delay to obtaindata (up to 3 months) and extensive limitations on the data’s use thatprohibit collaboration and sharing. This type of experience whenattempting to contact public health departments or ministries of healthfor data is not uncommon.A survey conducted by LANL of notifiable disease data reportingin 52 countries identified only 10 as being machine-readable and42 being reported in pdf files on a regular basis. Within the 42 nationsthat report in pdf files, 32 report in a structured, tabular format and10 in a non-structured way.As a result, LANL has developed a tool-Epi Archive (formerlyknown as EPIC)-to automatically and continuously collect globalnotifiable disease data and make it readily accesible.MethodsWe conducted a survey of the national notifiable disease reportingsystems notating how the data is reported in two important dimensions:date standards and case definitions.The development of software to regularly ingests notifiabledisease data frand makes this data available involved four main stepsscraping, extracting, parsing and persisting.For scraping: we would examine website designs and determinereporting mechanisms for each country/website as well as what variesacross the reporting mechanisms. We then designed and wrote codeto automate the downloading of report pdf files, for each country.We stored report pdfs along with appropriate metadata for extractingand parsing.For extracting: we developed software that can extract notifiabledisease data presented in tabular form from a pdf file. We combinedthe methodology of figure placement detection with the in-housedeveloped table extraction and annotation heuristics.For parsing: we determined what to extract from each pdf dataset from the survey conducted. We then parsed the extracted datainto uniform data structures correctly accommodating the dimensionssurveyed and the various human languages. This task involvedingesting notifiable disease data in many disparate formats extractedfrom pdf files and coalescing the data into a standardized format.For persisting: We then store the data in the Epi ArchivePostgreSQL database and make it available through the BSVE.ResultsThe EpiArchive tool currently contains subnational notifiabledisease data from 10 nations. When a user accesses the EpiArchivesite, they are prompted with four fields: country, region, disease,and date duration. These fields allow the user to specify the location(down to the state level), the disease of interest, and the durationof interest. Upon form submission, a time series is generated fromthe users’ specifications. The generated time series can then bedownloaded into a csv file if a user is interested in performingpersonal analysis. Additionally, the data from EpiArchive can bereached through an API.ConclusionsLANL as part of a currently funded DTRA effort so that it willautomatically and continuously collect global notifiable diseasedata—particularly data stored in pdf files—and make it available andshareable within the Biosurveillance Ecosystem (BSVE) as a newdata source. This will provide data to analytics and users that willimprove the prediction and early warning of disease events and otherapplications.


2021 ◽  
Vol 10 (1) ◽  
pp. 86
Author(s):  
Ali Najafi ◽  
Neda Emami ◽  
Taha Samad-Soltani

Introduction: Integration of rapidly expanding high-throughput omics technologies and electronic health record (EHR) has created an unprecedented advantage in terms of acquiring routine healthcare data to accelerate genetic discovery. In this regard, EHR can also provide several important advantages to omics research if the integration challenges are well handled. The main purpose of the present study was to review available and published knowledge in the related literature and then to classify and discuss stakeholders’ requirements in this domain.Material and Methods: At first, a broad electronic search of all available literature in English was conducted on the topic through a search in the databases of Medline, Web of Science, Institute of Electrical and Electronics Engineers (IEEE), Scopus, and Cochrane. Then, stakeholders’ requirements were tabulated, and finally, a word cloud was generated and analyzed to achieve functional and non-functional cases.Results: A total of 81 articles were included in the given analysis. Integration requirements also consisted of nine functional cases including a uniform approach to the interpretation of genetic tests, standardized terminologies and ontologies, structured data entry as much as possible, an integrated online patient portal, multiple data source handling, machine-readable storing and reporting, research-oriented requirements, pharmacogenomics decision support capabilities, and phenotyping algorithms and knowledge base. Besides, there were three non-functional cases comprised of interoperability of multiple systems, ethical, legal, security factor, and big data computations.Conclusion: The main challenges in this way could also have semantic and technical themes. Therefore, system developers could guarantee the success of systems by overcoming the given challenges.


1986 ◽  
Vol 65 (1) ◽  
pp. 9
Author(s):  
C.W. Painter
Keyword(s):  

1969 ◽  
Vol 08 (01) ◽  
pp. 07-11 ◽  
Author(s):  
H. B. Newcombe

Methods are described for deriving personal and family histories of birth, marriage, procreation, ill health and death, for large populations, from existing civil registrations of vital events and the routine records of ill health. Computers have been used to group together and »link« the separately derived records pertaining to successive events in the lives of the same individuals and families, rapidly and on a large scale. Most of the records employed are already available as machine readable punchcards and magnetic tapes, for statistical and administrative purposes, and only minor modifications have been made to the manner in which these are produced.As applied to the population of the Canadian province of British Columbia (currently about 2 million people) these methods have already yielded substantial information on the risks of disease: a) in the population, b) in relation to various parental characteristics, and c) as correlated with previous occurrences in the family histories.


2018 ◽  
Vol 10 (2) ◽  
pp. 269-295
Author(s):  
Sri Waluyo

This paper discusses the content of Q.S. al-Baqarah ([2]: 67-73). The data used in the preparation of this paper is the data that is primary and secondary. The primary source is data obtained from the core source. In conducting a study of a verse, it is clear that the primary data source is derived from the Qur'an,precisely on Q.S. al-Baqarah ([2]: 67-73). Secondary data is dataobtained from other sources that are still related to the problemand provide interpretation of the primary source. The method usedin analyzing this paper is the tahlili method. This method describesthe meaning contained by the Qur'an, verse by verse, and letterafter letter according to the order in the Mushaf. The descriptionincludes the various aspects which the interpreted verses contain,such as the meaning of the vocabulary, the connotation of thesentence, the background of the verse down, its relation to otherverses, both before and after. And do not miss the opinion that hasbeen given regarding the interpretation of these verses, whetherdelivered by the Prophet, companions, the tabi'in, as well as othercommentators. This study shows that in Q.S. (2): 67-73) there arevalues of moral education which include: 1) morals in asking, (2)morals to parents, (3) patience of educators, (4) educator honesty,and (5) obedience of learners.


Author(s):  
Anton Wahyudi

The novel Sepertiga Malam di Manhattan by Arumi E is very interesting to study. This novel is a novel about the struggle of a family to get happiness. This novel is the Arumi E's 27th newest novel. The struggle in this novel is to make the family happy, expecting for the baby. Before writing the novel, Arumi E did a research in the places written in the novel to achieve a very interesting fictional story and most of this story was taken from the traveling results so it was so interesting. The objective of this research is to describe (1) the Autopoetic System in the novel Sepertiga MalamdiManhattan by Arumi E. (2) The differentiation system in the Novel Sepertiga Malamdi Manhattan by Arumi E.The research method used is in the form of a descriptive qualitative method that uses a social system approach. The method used by the researcher is the dialectical method. The data source used in this research is the novel Sepertiga Malamdi Manhattan by Arumi E, published by Gramedia publisher in 2018. The data collection in this study uses the steps of reading the novel. To collect data, the researcher use any instrument.There are two results of the study: (1) The autopoetic system in the novel Sepertiga MalamdiManhattan by Arumi E. is concerning to some characters who have their own beliefs or rules in their lives who do not want to follow the rules of others, they are more confident in their own way to success and purpose of life. (2) The system of differentiation in the novel Sepertiga Malamdi Manhattan by Arumi E. is covering the handling of changes in the environment, the characters are able to adapt to the new environment, which has a different culture from the original culture. This shows evidence of the system autopoetic and differentiation in the novel Sepertiga MalamdiManhattan by Arumi E.


2019 ◽  
Vol 3 (2) ◽  
pp. 123-131
Author(s):  
Ervina CM Simatupang

The title of this study is Syntactic and Sematic Analysis on Slogans of Aviation in Asean Countries. The aim of this study is to analyze and describe the slogans of aviation companies in Asean companies syntactically and semantically. The method used in this study is descriptive method. Data source are taken from official websites of various aviation companies in Asean countries, and there are taken from Wikipedia as the website has listed in chart. The chart has covered the profile of the aviation companies in Asean countries. The theories used to analyze the data syntactically are from O


1997 ◽  
Vol 9 (1-3) ◽  
pp. 58-77
Author(s):  
Vitaly Kliatskine ◽  
Eugene Shchepin ◽  
Gunnar Thorvaldsen ◽  
Konstantin Zingerman ◽  
Valery Lazarev

In principle, printed source material should be made machine-readable with systems for Optical Character Recognition, rather than being typed once more. Offthe-shelf commercial OCR programs tend, however, to be inadequate for lists with a complex layout. The tax assessment lists that assess most nineteenth century farms in Norway, constitute one example among a series of valuable sources which can only be interpreted successfully with specially designed OCR software. This paper considers the problems involved in the recognition of material with a complex table structure, outlining a new algorithmic model based on ‘linked hierarchies’. Within the scope of this model, a variety of tables and layouts can be described and recognized. The ‘linked hierarchies’ model has been implemented in the ‘CRIPT’ OCR software system, which successfully reads tables with a complex structure from several different historical sources.


Sign in / Sign up

Export Citation Format

Share Document