scholarly journals Automatic Document Classification for Environmental Risk Assessment

Author(s):  
Kyle Painter ◽  
Steven Dutton ◽  
Elizabeth O Owens ◽  
Lyle Burgoon

Motivation: In environmental risk assessment, information about potential health risks of chemicals released into the environment is compiled and distilled for use in informing public policy. The U.S. Environmental Protection Agency (EPA) produces Integrated Science Assessments (ISA) that provide a review of literature on air pollutants, including nitrogen oxides (NOx). That review process currently requires much human labor to evaluate thousands of potentially-relevant documents published each year, a problem this study seeks to alleviate by using automated topic classification methods. Results: For this study, abstracts and titles of scientific documents about NOx were labeled by subject matter experts in four domains relevant to ISAs: toxicology, atmospheric science, epidemiology, and exposure science. In addition, documents not relevant to the four domains were included to simulate the background literature that we want to filter out of consideration. The labeled documents were used to train models using a Naive Bayes Multinomial classifier, via the Weka data mining platform. Separate tests were performed using multi-class or single-class models, and including background literature or not including it. For the multi-class models, recall (% of all documents in a class that are classified correctly) for scientific domains ranged between 74% and 94%, with precision (% of classified documents that are in the desired class) between 38% and 93%, with models created with background literature performing worse than models without the background documents. Single-class models had precision that ranged from 31% to 90%, and recall that ranged from 84% to 98%, with better precision for models not using background literature, but better overall recall for models using background literature. Single-class models generally performed better than multi-class models in recall, though multi-class models without the background screen tended to be best for precision.

2014 ◽  
Author(s):  
Kyle Painter ◽  
Steven Dutton ◽  
Elizabeth O Owens ◽  
Lyle Burgoon

Motivation: In environmental risk assessment, information about potential health risks of chemicals released into the environment is compiled and distilled for use in informing public policy. The U.S. Environmental Protection Agency (EPA) produces Integrated Science Assessments (ISA) that provide a review of literature on air pollutants, including nitrogen oxides (NOx). That review process currently requires much human labor to evaluate thousands of potentially-relevant documents published each year, a problem this study seeks to alleviate by using automated topic classification methods. Results: For this study, abstracts and titles of scientific documents about NOx were labeled by subject matter experts in four domains relevant to ISAs: toxicology, atmospheric science, epidemiology, and exposure science. In addition, documents not relevant to the four domains were included to simulate the background literature that we want to filter out of consideration. The labeled documents were used to train models using a Naive Bayes Multinomial classifier, via the Weka data mining platform. Separate tests were performed using multi-class or single-class models, and including background literature or not including it. For the multi-class models, recall (% of all documents in a class that are classified correctly) for scientific domains ranged between 74% and 94%, with precision (% of classified documents that are in the desired class) between 38% and 93%, with models created with background literature performing worse than models without the background documents. Single-class models had precision that ranged from 31% to 90%, and recall that ranged from 84% to 98%, with better precision for models not using background literature, but better overall recall for models using background literature. Single-class models generally performed better than multi-class models in recall, though multi-class models without the background screen tended to be best for precision.


1983 ◽  
Vol 2 (1) ◽  
pp. 113-123 ◽  
Author(s):  
Gilbert S. Omenn

Effective and efficient assessment of the potential health effects from environmental chemicals requires well-validated tests to identify the toxicity of the chemical and then extensive scientific and analytical work to characterize the potential risk for humans in light of the relative potency of the chemical in test systems, the nature and routes of exposures for humans, and the differences in susceptibility across species and among human population subgroups. Such risk assessment is being applied increasingly for potential carcinogens and is beginning to be applied to mutagenic and teratogenic chemicals. One of the fertile areas for research is the study of mechanisms that may permit us to predict carcinogenic, mutagenic, or teratogenic activity of compounds from tests of each of these classes of end-points.


Apidologie ◽  
2003 ◽  
Vol 34 (2) ◽  
pp. 139-145 ◽  
Author(s):  
Henrik F. Brodsgaard ◽  
Camilla J. Brodsgaard ◽  
Henrik Hansen ◽  
G�bor L. L�vei

2008 ◽  
Vol 15 (5) ◽  
pp. 394-404 ◽  
Author(s):  
Stefan Scholz ◽  
Stephan Fischer ◽  
Ulrike Gündel ◽  
Eberhard Küster ◽  
Till Luckenbach ◽  
...  

2007 ◽  
Vol preprint (2009) ◽  
pp. 1
Author(s):  
Heike Schmitt ◽  
Tatiana Boucard ◽  
Jeanne Garric ◽  
John Jensen ◽  
Joanne Parrott ◽  
...  

2016 ◽  
Vol 88 (8) ◽  
pp. 713-830
Author(s):  
John H. Duffus ◽  
Michael Schwenk ◽  
Douglas M. Templeton

Abstract The primary objective of this glossary is to give clear definitions for those who contribute to studies relevant to these disciplines, or who must interpret them, but are not themselves reproductive physiologists or physicians. This applies especially to chemists who need to understand the literature of reproductive and teratogenic effects of substances without recourse to a multiplicity of other glossaries or dictionaries. The glossary includes terms related to basic and clinical reproductive biology and teratogenesis, insofar as they are necessary for a self-contained document, particularly terms related to diagnosing, measuring, and understanding the effects of substances on the embryo, the fetus, and on the male and female reproductive systems. The glossary consists of about 1200 primary alphabetical entries and includes Annexes of common abbreviations and examples of chemicals with known effects on human reproduction and development. The authors hope that toxicologists, pharmacologists, medical practitioners, risk assessors, and regulatory authorities are among the groups who will find this glossary helpful, in addition to chemists. In particular, the glossary should facilitate the worldwide use of chemical terminology in relation to occupational and environmental risk assessment.


Sign in / Sign up

Export Citation Format

Share Document