Integrating Domain Knowledge in AI-Assisted Criminal Sentencing of Drug Trafficking Cases

Frontiers in Artificial Intelligence and Applications - Legal Knowledge and Information Systems ◽

10.3233/faia200861 ◽

2020 ◽

Author(s):

Tien-Hsuan Wu ◽

Ben Kao ◽

Anne S.Y. Cheung ◽

Michael M.K. Cheung ◽

Chen Wang ◽

...

Keyword(s):

Machine Learning ◽

Knowledge Management ◽

Drug Trafficking ◽

Domain Knowledge ◽

Legal Knowledge ◽

Criminal Sentencing ◽

Legal Cases ◽

Legal Rules ◽

Legal Domain ◽

Legal Judgments

Judgment prediction is the task of predicting various outcomes of legal cases of which sentencing prediction is one of the most important yet difficult challenges. We study the applicability of machine learning (ML) techniques in predicting prison terms of drug trafficking cases. In particular, we study how legal domain knowledge can be integrated with ML models to construct highly accurate predictors. We illustrate how our criminal sentence predictors can be applied to address four important issues in legal knowledge management, which include (1) discovery of model drifts in legal rules, (2) identification of critical features in legal judgments, (3) fairness in machine predictions, and (4) explainability of machine predictions.

Download Full-text

Evaluating Human versus Machine Learning Performance in a LegalTech Problem

Applied Sciences ◽

10.3390/app12010297 ◽

2021 ◽

Vol 12 (1) ◽

pp. 297

Author(s):

Tamás Orosz ◽

Renátó Vági ◽

Gergely Márk Csányi ◽

Dániel Nagy ◽

István Üveges ◽

...

Keyword(s):

Machine Learning ◽

Domain Knowledge ◽

Classification Problem ◽

Machine Learning Algorithms ◽

Added Value ◽

Learning Performance ◽

Production Environment ◽

Legal Domain ◽

The Cost

Many machine learning-based document processing applications have been published in recent years. Applying these methodologies can reduce the cost of labor-intensive tasks and induce changes in the company’s structure. The artificial intelligence-based application can replace the application of trainees and free up the time of experts, which can increase innovation inside the company by letting them be involved in tasks with greater added value. However, the development cost of these methodologies can be high, and usually, it is not a straightforward task. This paper presents a survey result, where a machine learning-based legal text labeler competed with multiple people with different legal domain knowledge. The machine learning-based application used binary SVM-based classifiers to resolve the multi-label classification problem. The used methods were encapsulated and deployed as a digital twin into a production environment. The results show that machine learning algorithms can be effectively utilized for monotonous but domain knowledge- and attention-demanding tasks. The results also suggest that embracing the machine learning-based solution can increase discoverability and enrich the value of data. The test confirmed that the accuracy of a machine learning-based system matches up with the long-term accuracy of legal experts, which makes it applicable to automatize the working process.

Download Full-text

CIKM 2020 conference report

ACM SIGWEB Newsletter ◽

10.1145/3460304.3460305 ◽

2021 ◽

pp. 1-4

Author(s):

Mathieu D'Aquin ◽

Stefan Dietze

Keyword(s):

United States ◽

Machine Learning ◽

Knowledge Management ◽

Information Retrieval ◽

Conference Report ◽

The United States ◽

Knowledge Based ◽

The World ◽

Science Conference ◽

Information And Knowledge Management

The 29th ACM International Conference on Information and Knowledge Management (CIKM) was held online from the 19 th to the 23 rd of October 2020. CIKM is an annual computer science conference, focused on research at the intersection of information retrieval, machine learning, databases as well as semantic and knowledge-based technologies. Since it was first held in the United States in 1992, 28 conferences have been hosted in 9 countries around the world.

Download Full-text

Machine Learning-Assisted Sampling of SERS Substrates Improves Data Collection Efficiency

Applied Spectroscopy ◽

10.1177/00037028211034543 ◽

2021 ◽

pp. 000370282110345

Author(s):

Tatu Rojalin ◽

Dexter Antonio ◽

Ambarish Kulkarni ◽

Randy P. Carney

Keyword(s):

Machine Learning ◽

Data Collection ◽

Domain Knowledge ◽

Collection Efficiency ◽

Point Of Care ◽

Automated Analysis ◽

Downstream Processing ◽

Machine Learning Algorithms ◽

Label Free ◽

Expert User

Surface-enhanced Raman scattering (SERS) is a powerful technique for sensitive label-free analysis of chemical and biological samples. While much recent work has established sophisticated automation routines using machine learning and related artificial intelligence methods, these efforts have largely focused on downstream processing (e.g., classification tasks) of previously collected data. While fully automated analysis pipelines are desirable, current progress is limited by cumbersome and manually intensive sample preparation and data collection steps. Specifically, a typical lab-scale SERS experiment requires the user to evaluate the quality and reliability of the measurement (i.e., the spectra) as the data are being collected. This need for expert user-intuition is a major bottleneck that limits applicability of SERS-based diagnostics for point-of-care clinical applications, where trained spectroscopists are likely unavailable. While application-agnostic numerical approaches (e.g., signal-to-noise thresholding) are useful, there is an urgent need to develop algorithms that leverage expert user intuition and domain knowledge to simplify and accelerate data collection steps. To address this challenge, in this work, we introduce a machine learning-assisted method at the acquisition stage. We tested six common algorithms to measure best performance in the context of spectral quality judgment. For adoption into future automation platforms, we developed an open-source python package tailored for rapid expert user annotation to train machine learning algorithms. We expect that this new approach to use machine learning to assist in data acquisition can serve as a useful building block for point-of-care SERS diagnostic platforms.

Download Full-text

Innovative Approach to Build a Scalable Hybrid Model for Prescriptive Maintenance

10.2118/208031-ms ◽

2021 ◽

Author(s):

Richard Büssow ◽

Bruno Hain ◽

Ismael Al Nuaimi

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Big Data ◽

Hybrid Model ◽

Domain Knowledge ◽

Spare Parts ◽

Good Prediction ◽

First Principle ◽

Plant Data ◽

Rotating Equipment

Abstract Objective and Scope Analysis of operational plant data needs experts in order to interpret detected anomalies which are defined as unusual operation points. The next step on the digital transformation journey is to provide actionable insights into the data. Prescriptive Maintenance defines in advance which kind of detailed maintenance and spare parts will be required. This paper details requirements to improve these predictions for rotating equipment and show potential to integrate the outcome into an operational workflow. Methods, Procedures, Process First principle or physics-based modelling provides additional insights into the data, since the results are directly interpretable. However, such approaches are typically assumed to be expensive to build and not scalable. Identification of and focus on the relevant equipment to be modeled in a hybrid model using a combination of first principle physics and machine learning is a successful strategy. The model is trained using a machine learning approach with historic or current real plant data, to predict conditions which have not occurred before. The better the Artificial Intelligence is trained, the better the prediction will be. Results, Observations, Conclusions The general aim when operating a plant is the actual usage of operational data for process and maintenance optimization by advanced analytics. Typically a data-driven central oversight function supports operations and maintenance staff. A major lesson-learned is that the results of a rather simple statistical approach to detect anomalies fall behind the expectations and are too labor intensive. It is a widely spread misinterpretation that being able to deal with big data is sufficient to come up with good prediction quality for Prescriptive Maintenance. What big data companies are normally missing is domain knowledge, especially on plant critical rotating equipment. Without having domain knowledge the relevant input into the model will have shortcomings and hence the same will apply to its predictions. This paper gives an example of a refinery where the described hybrid model has been used. Novel and Additive Information First principle models are typically expensive to build and not scalable. This hybrid model approach, combining first principle physics based models with artificial intelligence and integration into an operational workflow shows a new way forward.

Download Full-text

A Comparative Analysis of the Prediction of Gas Condensate Dew Point Pressure Using Advanced Machine Learning Algorithms

10.2118/205997-ms ◽

2021 ◽

Author(s):

Thitaree Lertliangchai ◽

Birol Dindoruk ◽

Ligang Lu ◽

Xi Yang

Keyword(s):

Machine Learning ◽

Domain Knowledge ◽

Compositional Data ◽

Machine Learning Algorithms ◽

Empirical Correlation ◽

Dew Point ◽

Pvt Data ◽

Point Pressure ◽

Input Variables ◽

Dew Point Pressure

Abstract Dew point pressure (DPP) is a key variable that may be needed to predict the condensate to gas ratio behavior of a reservoir along with some production/completion related issues and calibrate/constrain the EOS models for integrated modeling. However, DPP is a challenging property in terms of its predictability. Recognizing the complexities, we present a state-of-the-art method for DPP prediction using advanced machine learning (ML) techniques. We compare the outcomes of our methodology with that of published empirical correlation-based approaches on two datasets with small sizes and different inputs. Our ML method noticeably outperforms the correlation-based predictors while also showing its flexibility and robustness even with small training datasets provided various classes of fluids are represented within the datasets. We have collected the condensate PVT data from public domain resources and GeoMark RFDBASE containing dew point pressure (the target variable), and the compositional data (mole percentage of each component), temperature, molecular weight (MW), MW and specific gravity (SG) of heptane plus as input variables. Using domain knowledge, before embarking the study, we have extensively checked the measurement quality and the outcomes using statistical techniques. We then apply advanced ML techniques to train predictive models with cross-validation to avoid overfitting the models to the small datasets. We compare our models against the best published DDP predictors with empirical correlation-based techniques. For fair comparisons, the correlation-based predictors are also trained using the underlying datasets. In order to improve the outcomes and using the generalized input data, pseudo-critical properties and artificial proxy features are also employed.

Download Full-text

Water Pipe Failure Prediction: A Machine Learning Approach Enhanced By Domain Knowledge

Human and Machine Learning - Human–Computer Interaction Series ◽

10.1007/978-3-319-90403-0_18 ◽

2018 ◽

pp. 363-383 ◽

Cited By ~ 2

Author(s):

Bang Zhang ◽

Ting Guo ◽

Lelin Zhang ◽

Peng Lin ◽

Yang Wang ◽

...

Keyword(s):

Machine Learning ◽

Domain Knowledge ◽

Failure Prediction ◽

Learning Approach ◽

Water Pipe ◽

Pipe Failure ◽

Machine Learning Approach

Download Full-text

Analytic continuation via domain knowledge free machine learning

Physical Review B ◽

10.1103/physrevb.98.245101 ◽

2018 ◽

Vol 98 (24) ◽

Cited By ~ 12

Author(s):

Hongkee Yoon ◽

Jae-Hoon Sim ◽

Myung Joon Han

Keyword(s):

Machine Learning ◽

Analytic Continuation ◽

Domain Knowledge ◽

Free Machine

Download Full-text

Semantic Reconciliation between two Different Aspects of Law

Central and Eastern European eDem and eGov Days ◽

10.24989/ocg.v331.11 ◽

2018 ◽

Vol 331 ◽

pp. 131-140

Author(s):

Bálint Molnár

Keyword(s):

Life Events ◽

Integrated System ◽

Structural Elements ◽

Legal Cases ◽

Strongly Coupled ◽

Legal Rules ◽

Legal Documents ◽

Legal Standards ◽

The Government ◽

Government Portal

This paper presents a proposal for reconciliation between the warehouse of legal documents created during legislation and Knowledge Warehouse that is dedicated to assisting both citizens and public officers in the procedural legal rules of Public Administration in Hungary. The Knowledge Warehouse contains several thousand detailed rules that describe how to manage and handle life events of citizens. This description can be considered as generic legal cases within legal procedures of authorities. The citizens trigger specific instances of the generic ones. The evolving Knowledge Warehouse main purpose is to enable citizens to get their specific legal cases started either through Web on the Government Portal or with the help of public officers. The Knowledge Warehouse will be extended by ontologies and semantic search capabilities. An Integrated System for Supporting of Codification will be created in an on-going project that will serve as sound basis for the National Warehouse of Legal Rules. The National Warehouse pursues the prescription of MetaLex legal standards in the case of representation of electronic legal documents. The two Warehouse are strongly coupled to each other. However, the syntactic and semantic structure of both differs profoundly. The representation of e-documents within the National Warehouse is in line with ELI, the European Legislation Identifier, even the ontologies and attached semantic description concentrates on the legal documents structural elements and their interpretation. The Knowledge Warehouse focuses on ontologies of life events and procedures of authorities to leverage semantic searching. The proposed solution tries to reconcile and integrate the two differing approaches.

Download Full-text

Predicting Health Material Cognitive Accessibility Using Multidimensional Semantic Features and Readability Tools as Predicators (Preprint)

10.2196/preprints.29175 ◽

2021 ◽

Author(s):

Meng Ji ◽

Yanmeng Liu ◽

Tianyong Hao

Keyword(s):

Machine Learning ◽

Health Education ◽

Health Information ◽

Domain Knowledge ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Semantic Features ◽

Integrated Models ◽

Advanced Education ◽

Cognitive Accessibility

BACKGROUND Much of current health information understandability research uses medical readability formula (MRF) to assess the cognitive difficulty of health education resources. This is based on an implicit assumption that medical domain knowledge represented by uncommon words or jargons form the sole barriers to health information access among the public. Our study challenged this by showing that for readers from non-English speaking backgrounds with higher education attainment, semantic features of English health texts rather than medical jargons can explain the lack of cognitive access of health materials among readers with better understanding of health terms, yet limited exposure to English health education materials. OBJECTIVE Our study explored combined MRF and multidimensional semantic features (MSF) for developing machine learning algorithms to predict the actual level of cognitive accessibility of English health materials on health risks and diseases for specific populations. We compare algorithms to evaluate the cognitive accessibility of specialised health information for non-native English speaker with advanced education levels yet very limited exposure to English health education environments. METHODS We used 108 semantic features to measure the content complexity and accessibility of original English resources. Using 1000 English health texts collected from international health organization websites, rated by international tertiary students, we compared machine learning (decision tree, SVM, discriminant analysis, ensemble tree and logistic regression) after automatic hyperparameter optimization (grid search for the best combination of hyperparameters of minimal classification errors). We applied 10-fold cross-validation on the whole dataset for the model training and testing, calculated the AUC, sensitivity, specificity, and accuracy as the measured of the model performance. RESULTS Using two sets of predictor features: widely tested MRF and MSF proposed in our study, we developed and compared three sets of machine learning algorithms: the first set of algorithms used MRF as predictors only, the second set of algorithms used MSF as predictors only, and the last set of algorithms used both MRF and MSF as integrated models. The results showed that the integrated models outperformed in terms of AUC, sensitivity, accuracy, and specificity. CONCLUSIONS Our study showed that cognitive accessibility of English health texts is not limited to word length and sentence length conventionally measured by MRF. We compared machine learning algorithms combing MRF and MSF to explore the cognitive accessibility of health information from syntactic and semantic perspectives. The results showed the strength of integrated models in terms of statistically increased AUC, sensitivity, and accuracy to predict health resource accessibility for the target readership, indicating that both MRF and MSF contribute to the comprehension of health information, and that for readers with advanced education, semantic features outweigh syntax and domain knowledge.

Download Full-text

A Survey of Domain Knowledge Elicitation in Applied Machine Learning

Multimodal Technologies and Interaction ◽

10.3390/mti5120073 ◽

2021 ◽

Vol 5 (12) ◽

pp. 73

Author(s):

Daniel Kerrigan ◽

Jessica Hullman ◽

Enrico Bertini

Keyword(s):

Machine Learning ◽

Domain Knowledge ◽

Development Process ◽

Model Development ◽

Knowledge Elicitation ◽

Future Directions ◽

Domain Experts ◽

Applied Machine Learning ◽

Elicitation Process ◽

Model Development Process

Eliciting knowledge from domain experts can play an important role throughout the machine learning process, from correctly specifying the task to evaluating model results. However, knowledge elicitation is also fraught with challenges. In this work, we consider why and how machine learning researchers elicit knowledge from experts in the model development process. We develop a taxonomy to characterize elicitation approaches according to the elicitation goal, elicitation target, elicitation process, and use of elicited knowledge. We analyze the elicitation trends observed in 28 papers with this taxonomy and identify opportunities for adding rigor to these elicitation approaches. We suggest future directions for research in elicitation for machine learning by highlighting avenues for further exploration and drawing on what we can learn from elicitation research in other fields.

Download Full-text