DEXTER: A system that experiments with choices of training data using expert knowledge in the domain of DNA hydration

Dawn M. Cohen; Casimir Kulikowski; Helen Berman

doi:10.1007/bf00993380

Imitation Learning System Design with Small Training Data for Flexible Tool Manipulation

International Journal of Automation Technology ◽

10.20965/ijat.2021.p0669 ◽

2021 ◽

Vol 15 (5) ◽

pp. 669-677

Author(s):

Harumo Sasatake ◽

Ryosuke Tasaki ◽

Takahito Yamashita ◽

Naoki Uchiyama ◽

◽

...

Keyword(s):

Deep Learning ◽

Expert Knowledge ◽

Teaching Method ◽

Population Aging ◽

Developed Countries ◽

Learning System ◽

Training Data ◽

Robot Arm ◽

Flexible Tool ◽

Human Labor

Population aging has become a major problem in developed countries. As the labor force declines, robot arms are expected to replace human labor for simple tasks. A robotic arm attaches a tool specialized for a task and acquires the movement through teaching by an engineer with expert knowledge. However, the number of such engineers is limited; therefore, a teaching method that can be used by non-technical personnel is necessitated. As a teaching method, deep learning can be used to imitate human behavior and tool usage. However, deep learning requires a large amount of training data for learning. In this study, the target task of the robot is to sweep multiple pieces of dirt using a broom. The proposed learning system can estimate the initial parameters for deep learning based on experience, as well as the shape and physical properties of the tools. It can reduce the number of training data points when learning a new tool. A virtual reality system is used to move the robot arm easily and safely, as well as to create training data for imitation. In this study, cleaning experiments are conducted to evaluate the effectiveness of the proposed method. The experimental results confirm that the proposed method can accelerate the learning speed of deep learning and acquire cleaning ability using a small amount of training data.

Download Full-text

The LAILAPS Search Engine: Relevance Ranking in Life Science Databases

Journal of Integrative Bioinformatics ◽

10.1515/jib-2010-110 ◽

2010 ◽

Vol 7 (2) ◽

pp. 1-11 ◽

Cited By ~ 2

Author(s):

Matthias Lange ◽

Karl Spies ◽

Joachim Bargsten ◽

Gregor Haberhauer ◽

Matthias Klapperstück ◽

...

Keyword(s):

Neural Network ◽

Search Engine ◽

Life Science ◽

Expert Knowledge ◽

Feature Model ◽

Training Data ◽

Relevance Ranking ◽

User Interactions ◽

Daily Work ◽

Machine Learning Approach

SummarySearch engines and retrieval systems are popular tools at a life science desktop. The manual inspection of hundreds of database entries, that reflect a life science concept or fact, is a time intensive daily work. Hereby, not the number of query results matters, but the relevance does. In this paper, we present the LAILAPS search engine for life science databases. The concept is to combine a novel feature model for relevance ranking, a machine learning approach to model user relevance profiles, ranking improvement by user feedback tracking and an intuitive and slim web user interface, that estimates relevance rank by tracking user interactions. Queries are formulated as simple keyword lists and will be expanded by synonyms. Supporting a flexible text index and a simple data import format, LAILAPS can easily be used both as search engine for comprehensive integrated life science databases and for small in-house project databases.With a set of features, extracted from each database hit in combination with user relevance preferences, a neural network predicts user specific relevance scores. Using expert knowledge as training data for a predefined neural network or using users own relevance training sets, a reliable relevance ranking of database hits has been implemented.In this paper, we present the LAILAPS system, the concepts, benchmarks and use cases. LAILAPS is public available for SWISSPROT data at http://lailaps.ipk-gatersleben.de

Download Full-text

Literature Mining for Incorporating Inductive Bias in Biomedical Prediction Tasks (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7264 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13983-13984

Author(s):

Qizhen Zhang ◽

Audrey Durand ◽

Joelle Pineau

Keyword(s):

Machine Learning ◽

Prior Knowledge ◽

Expert Knowledge ◽

Training Data ◽

Literature Mining ◽

Prediction Task ◽

Inductive Bias ◽

Prediction Outcome ◽

Applications Of Machine Learning

Applications of machine learning in biomedical prediction tasks are often limited by datasets that are unrepresentative of the sampling population. In these situations, we can no longer rely only on the the training data to learn the relations between features and the prediction outcome. Our method proposes to learn an inductive bias that indicates the relevance of each feature to outcomes through literature mining in PubMed, a centralized source of biomedical documents. The inductive bias acts as a source of prior knowledge from experts, which we leverage by imposing an extra penalty for model weights that differ from this inductive bias. We empirically evaluate our method on a medical prediction task and highlight the importance of incorporating expert knowledge that can capture relations not present in the training data.

Download Full-text

Use of machine learning to identify relevant research publications in clinical oncology.

Journal of Clinical Oncology ◽

10.1200/jco.2019.37.15_suppl.6558 ◽

2019 ◽

Vol 37 (15_suppl) ◽

pp. 6558-6558

Author(s):

Fernando Jose Suarez Saiz ◽

Corey Sanders ◽

Rick J Stevens ◽

Robert Nielsen ◽

Michael W Britt ◽

...

Keyword(s):

Machine Learning ◽

Expert Knowledge ◽

Grey Literature ◽

Clinical Decision ◽

Training Data ◽

Surgical Approaches ◽

High Quality ◽

Test Set ◽

N Gram ◽

Abstract Content

6558 Background: Finding high-quality science to support decisions for individual patients is challenging. Common approaches to assess clinical literature quality and relevance rely on bibliometrics or expert knowledge. We describe a method to automatically identify clinically relevant, high-quality scientific citations using abstract content. Methods: We used machine learning trained on text from PubMed papers cited in 3 expert resources: NCCN, NCI-PDQ, and Hemonc.org. Balanced training data included text cited in at least two sources to form an “on topic” set (i.e., relevant and high quality), and an “off-topic” set, not cited in any of the above 3 sources. The off-topic set was published in lower ranked journals, using a citation-based score. Articles were part of an Oncology Clinical Trial corpus generated using a standard PubMed query. We used a gradient boosted-tree approach with a binary logistic supervised learning classification. Briefly, 988 texts were processed to produce a term frequency-inverse document frequency (tf-idf) n-gram representation of both the training and the test set (70/30 split). Ideal parameters were determined using 1000-fold cross validation. Results: Our model classified papers in the test set with 0.93 accuracy (95% CI (0.09:0.96) p ≤ 0.0001), with sensitivity 0.95 and specificity 0.91. Some false positives contained language considered clinically relevant that may have been missed or not yet included in expert resources. False negatives revealed a potential bias towards chemotherapy-focused research over radiation therapy or surgical approaches. Conclusions: Machine learning can be used to automatically identify relevant clinical publications from biographic databases, without relying on expert curation or bibliometric methods. The use of machine learning to identify relevant publications may reduce the time clinicians spend finding pertinent evidence for a patient. This approach is generalizable to cases where a corpus of high-quality publications that can serve as a training set exists or cases where document metadata is unreliable, as is the case of “grey” literature within oncology and beyond to other diseases. Future work will extend this approach and may integrate it into oncology clinical decision-support tools.

Download Full-text

Mapping Burned Areas in a Mediterranean Environment Using Soft Integration of Spectral Indices from High-Resolution Satellite Images

Earth Interactions ◽

10.1175/2010ei349.1 ◽

2010 ◽

Vol 14 (17) ◽

pp. 1-20 ◽

Cited By ~ 24

Author(s):

Mirco Boschetti ◽

Daniela Stroppiana ◽

Pietro Alessandro Brivio

Keyword(s):

High Resolution ◽

Satellite Images ◽

Expert Knowledge ◽

Training Data ◽

Burned Area ◽

Mediterranean Ecosystem ◽

Commission Error ◽

Spectral Indices ◽

Burned Areas ◽

High Resolution Satellite Images

Abstract This article presents a new method for burned area mapping using high-resolution satellite images in the Mediterranean ecosystem. In such a complex environment, high-resolution satellite images represent an appropriate data source for identifying fire-affected areas, and single postfire data are often the only available source of information. The method proposed here integrates several spectral indices into a fuzzy synthetic indicator of likelihood of burn. The indices are interpreted through fuzzy membership functions that have been derived with a partially data-driven approach exploiting training data and expert knowledge. The final map of fire-affected areas is produced by applying a region growing algorithm on the basis of seed pixels selected on a conservative threshold of the synthetic fuzzy score. The algorithm has been developed and tested on a set of Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) scenes acquired over Southern Italy. Validation showed that the accuracy of the burned area maps is comparable or even better [overall accuracy (OA) > 90%, K > 0.76] than that obtained with approaches based on single index thresholds adapted to each image. The method described here provides an automatic approach for mapping fire-affected areas with very few false alarms (low commission error), whereas omission errors are mainly related to undetected small burned areas and are located in heterogeneous sparse vegetation cover.

Download Full-text

Expert-augmented machine learning

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1906831117 ◽

2020 ◽

Vol 117 (9) ◽

pp. 4571-4577 ◽

Cited By ~ 10

Author(s):

Efstathios D. Gennatas ◽

Jerome H. Friedman ◽

Lyle H. Ungar ◽

Romain Pirracchio ◽

Eric Eaton ◽

...

Keyword(s):

Machine Learning ◽

Learning Strategy ◽

Expert Knowledge ◽

Decision Rules ◽

Total Sample ◽

Care Patient ◽

Training Data ◽

Empirical Risk ◽

Automated Method ◽

Out Of Sample

Machine learning is proving invaluable across disciplines. However, its success is often limited by the quality and quantity of available data, while its adoption is limited by the level of trust afforded by given models. Human vs. machine performance is commonly compared empirically to decide whether a certain task should be performed by a computer or an expert. In reality, the optimal learning strategy may involve combining the complementary strengths of humans and machines. Here, we present expert-augmented machine learning (EAML), an automated method that guides the extraction of expert knowledge and its integration into machine-learned models. We used a large dataset of intensive-care patient data to derive 126 decision rules that predict hospital mortality. Using an online platform, we asked 15 clinicians to assess the relative risk of the subpopulation defined by each rule compared to the total sample. We compared the clinician-assessed risk to the empirical risk and found that, while clinicians agreed with the data in most cases, there were notable exceptions where they overestimated or underestimated the true risk. Studying the rules with greatest disagreement, we identified problems with the training data, including one miscoded variable and one hidden confounder. Filtering the rules based on the extent of disagreement between clinician-assessed risk and empirical risk, we improved performance on out-of-sample data and were able to train with less data. EAML provides a platform for automated creation of problem-specific priors, which help build robust and dependable machine-learning models in critical applications.

Download Full-text

Segmentation-driven Hierarchical RetinaNet for Detecting Protozoa in Micrograph

International Journal of Semantic Computing ◽

10.1142/s1793351x19400178 ◽

2019 ◽

Vol 13 (03) ◽

pp. 393-413

Author(s):

Khoa Pho ◽

Muhamad Kamal Mohammed Amin ◽

Atsuo Yoshitaka

Keyword(s):

Life Cycle ◽

Data Augmentation ◽

Expert Knowledge ◽

Animal Health ◽

Training Data ◽

Life Cycle Stages ◽

Detection And Identification ◽

Training Samples ◽

Augmentation Techniques ◽

The Cost

Protozoa detection and identification play important roles in many practical domains such as parasitology, scientific research, biological treatment processes, and environmental quality evaluation. Traditional laboratory methods for protozoan identification are time-consuming and require expert knowledge and expensive equipment. Another approach is using micrographs to identify the species of protozoans that can save a lot of time and reduce the cost. However, the existing methods in this approach only identify the species when the protozoan are already segmented. These methods study features of shapes and sizes. In this work, we detect and identify the images of cysts and oocysts of various species such as: Giardia lamblia, Iodamoeba butschilii, Toxoplasma gondi, Cyclospora cayetanensis, Balantidium coli, Sarcocystis, Cystoisospora belli and Acanthamoeba, which have round shapes in common and affect human and animal health seriously. We propose Segmentation-driven Hierarchical RetinaNet to automatically detect, segment, and identify protozoans in their micrographs. By applying multiple techniques such as transfer learning, and data augmentation techniques, and dividing training samples into life-cycle stages of protozoans, we successfully overcome the lack of data issue in applying deep learning for this problem. Even though there are at most 5 samples per life-cycle category in the training data, our proposed method still achieves promising results and outperforms the original RetinaNet on our protozoa dataset.

Download Full-text

Depth-to-Bedrock Map of China at a Spatial Resolution of 100 Meters

10.5194/essd-2018-103 ◽

2018 ◽

Cited By ~ 2

Author(s):

Fapeng Yan ◽

Wei Shangguan ◽

Jing Zhang ◽

Bifeng Hu

Keyword(s):

Land Surface ◽

Vegetation Index ◽

Prediction Models ◽

Expert Knowledge ◽

Lower Boundary ◽

Spatial Prediction ◽

Training Data ◽

Ensemble Prediction ◽

Gradient Boosting ◽

Soil Database

Abstract. Depth to bedrock serves as the lower boundary of soil, which influences or controls many of the Earth’s physical and chemical processes. It plays important roles in geology, hydrology, land surface processes, civil engineering, and other related fields. This paper describes the materials and methods to produce a high-resolution (100 m) depth-to-bedrock map of China. Observations were interpreted from borehole log data (ca. 6,382 locations) sampled from the Chinese National Important Geological Borehole Database. To fill in large sampling gaps, additional pseudo-observations generated based on expert knowledge were added. Then, we overlaid the training points on a stack of 133 covariates including climatic images, DEM-derived parameters, land-cover and land-use maps, MODIS surface reflectance bands, vegetation index images, and the Harmonized World Soil Database. Spatial prediction models were developed using the random forests and gradient boosting tree, and ensemble prediction results were then obtained by these two independently fitted models. Finally, uncertainty estimation was generated by the quantile regression forest model. The 10-fold cross-validation showed that the ensemble models explain 57 % of the variation in depth to bedrock. Based on comparison with depth-to-bedrock maps of China extracted from previous global predictions, our predictions showed higher accuracy. More observations, especially those in data-sparse areas, should be added to training data, and more covariates with high precision should be used to further improve the accuracy of spatial predictions. The resulting maps of this study are available on Figshare at the following DOI: https://doi.org/10.6084/m9.figshare.7011524.v1. And they are also available for download at http://globalchange.bnu.edu.cn/ .

Download Full-text

Performance Comparison of Rule Generation Method Substractive Clustering and Fuzzy C-Means Clustering on Sugeno's Inference for Stroke Risk Detection

MATICS ◽

10.18860/mat.v9i2.4587 ◽

2017 ◽

Vol 9 (2) ◽

pp. 72

Author(s):

Rekyan Regasari Mardi Putri ◽

Edy Santoso

Keyword(s):

Processing Time ◽

Fuzzy Inference ◽

Expert Knowledge ◽

Disease Risk ◽

Performance Comparison ◽

Training Data ◽

Fuzzy C Means ◽

Risk Detection ◽

Fuzzy C Means Clustering ◽

Grouping Algorithm

Abstract - Fuzzy Inference is one method that can solve the problem of uncertainty in a decision-making or classification well. In inference, fuzzy rules that represent the need of expert knowledge in the relevant fields, so that the classification given decision or be appropriate expert knowledge. However there are times when experts are less able to represent the rules of the appropriate knowledge or knowledge that there is need of too many rules, so we need a method that can generate rules based on the data given expert. At issue troke s disease risk detection, it also occurs because of the research that has been done by taking the direct rule of experts, it turns out less than the maximum accuracy, still 82.89%. Substractive methods Clustering and Fuzzy C-Means (FCM) could generate rules by grouping algorithm, in which the existing training data are grouped in common and the rules of the group raised. Differences in the two methods are in determining the center of the cluster and assign each incoming data which groups. Based on research that has been done, substractive average Clustering membrika better accuracy is 84.46%, while 73.81% FCM. However, in the processing time FCM faster at 16.75 seconds to give an average processing time of 13:02 seconds.

Download Full-text

Knowledge Based 2D Blade Design Using Multi-Objective Aerodynamic Optimization and a Neural Network

Volume 6: Turbo Expo 2007, Parts A and B ◽

10.1115/gt2007-28204 ◽

2007 ◽

Cited By ~ 6

Author(s):

Andre´ Huppertz ◽

Peter M. Flassig ◽

Robert J. Flassig ◽

Marius Swoboda

Keyword(s):

Neural Network ◽

Expert Knowledge ◽

Computation Time ◽

Training Data ◽

Special Focus ◽

Optimization Approach ◽

Aerodynamic Optimization ◽

Time Model ◽

Neural Network Approach ◽

The Neural Network

This paper presents a method to obtain optimized 2D blade sections using expert knowledge, a multi-criteria optimization approach and a neural network in an automated process. A special focus is put on neural networks, which are used to capture the complex correlations between aerodynamic and geometric parameters. The multi-criteria optimization is used to generate optimal training data for the neural network. The main objective of this investigation is to generate 2D blade sections from scratch including loss prediction using through flow quantities and a neural network approach without any CFD computations. First results are very promising in terms of computation time, model capacities and performance prediction of the neural network.

Download Full-text