Transformers for Classifying Fourth Amendment Elements and Factors Tests

Frontiers in Artificial Intelligence and Applications - Legal Knowledge and Information Systems ◽

10.3233/faia200850 ◽

2020 ◽

Author(s):

Evan Gretok ◽

David Langerman ◽

Wesley M. Oliver

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Fourth Amendment ◽

Difficult Problem ◽

Domain Expert ◽

Legal Question ◽

Domain Specific ◽

Bright Line ◽

Modern Natural

Determining if a court has applied a bright-line or totality-of-the-circumstances rule for Fourth Amendment cases demonstrates a difficult problem even for human lawyers and justices. Determining the type of test that governs an issue is essential to answering a legal question. Modern natural language processing (NLP) tools, such as transformers, demonstrate the capacity to extract relevant features from unlabelled text. This study demonstrates the effectiveness of the BERT, RoBERTa, and ALBERT transformer models to classify Fourth Amendment cases by bright-line or totality-of-the-circumstances rule. Two approaches are considered in which models are trained with either positive language extracted by a domain-expert or with full texts of cases. Transformers attain up to 92.31% accuracy on full texts, further demonstrating the capability of NLP techniques on domain-specific tasks even without handcrafted features.

Download Full-text

Fast Neural Network Engine for Natural Science Language Processing: A Drug-Search Case.

10.26434/chemrxiv.12800348 ◽

2020 ◽

Author(s):

Vadim V. Korolev ◽

Artem Mitrofanov ◽

Kirill Karpov ◽

Valery Tkachenko

Keyword(s):

Neural Network ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Natural Science ◽

Therapeutic Agent ◽

Semantic Relations ◽

Chemical Data ◽

Processing Methods ◽

Modern Natural

The main advantage of modern natural language processing methods is a possibility to turn an amorphous human-readable task into a strict mathematic form. That allows to extract chemical data and insights from articles and to find new semantic relations. We propose a universal engine for processing chemical and biological texts. We successfully tested it on various use-cases and applied to a case of searching a therapeutic agent for a COVID-19 disease by analyzing PubMed archive.

Download Full-text

Identification of Chinese Unknown Word Based on Finite Multi-List Method

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.474-476.460 ◽

2011 ◽

Vol 474-476 ◽

pp. 460-465

Author(s):

Bo Sun ◽

Sheng Hui Huang ◽

Xiao Hua Liu

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Word Identification ◽

Difficult Problem ◽

Word Segmentation ◽

Chinese Word ◽

Unknown Word ◽

List Method ◽

Low Rate

Unknown word is a kind of word that is not included in the sub_word vocabulary, but must be cut out by the word segmentation program. Peoples’ names, place names and translated names are the major unknown words.Unknown Chinese words is a difficult problem in natural language processing, and also contributed to the low rate of correct segmention. This paper introduces the finite multi-list method that using the word fragments’ capability to composite a word and the location in the word tree to process the unknown Chinese words.The experiment recall is 70.67% ,the correct rate is 43.65% .The result of the experiment shows that unknown Chinese word identification based on the finite multi-list method is feasible.

Download Full-text

Fast Neural Network Engine for Natural Science Language Processing: A Drug-Search Case.

10.26434/chemrxiv.12800348.v2 ◽

2020 ◽

Author(s):

Vadim V. Korolev ◽

Artem Mitrofanov ◽

Kirill Karpov ◽

Valery Tkachenko

Keyword(s):

Neural Network ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Natural Science ◽

Therapeutic Agent ◽

Semantic Relations ◽

Chemical Data ◽

Processing Methods ◽

Modern Natural

Download Full-text

Statistical Unigram Analysis for Source Code Repository

International Journal of Semantic Computing ◽

10.1142/s1793351x18400123 ◽

2018 ◽

Vol 12 (02) ◽

pp. 237-260

Author(s):

Weifeng Xu ◽

Dianxiang Xu ◽

Abdulrahman Alatawi ◽

Omar El Ariss ◽

Yunkai Liu

Keyword(s):

Natural Language Processing ◽

Empirical Study ◽

Natural Language ◽

Programming Languages ◽

Language Processing ◽

Probabilistic Model ◽

Source Code ◽

Code Analysis ◽

Domain Specific ◽

Language Corpus

Unigram is a fundamental element of [Formula: see text]-gram in natural language processing. However, unigrams collected from a natural language corpus are unsuitable for solving problems in the domain of computer programming languages. In this paper, we analyze the properties of unigrams collected from an ultra-large source code repository. Specifically, we have collected 1.01 billion unigrams from 0.7 million open source projects hosted at GitHub.com. By analyzing these unigrams, we have discovered statistical properties regarding (1) how developers name variables, methods, and classes, and (2) how developers choose abbreviations. We describe a probabilistic model which relies on these properties for solving a well-known problem in source code analysis: how to expand a given abbreviation to its original indented word. Our empirical study shows that using the unigrams extracted from source code repository outperforms the using of the natural language corpus by 21% when solving the domain specific problems.

Download Full-text

Natural Language Processing in OTF Computing: Challenges and the Need for Interactive Approaches

Computers ◽

10.3390/computers8010022 ◽

2019 ◽

Vol 8 (1) ◽

pp. 22

Author(s):

Frederik Bäumer ◽

Joschka Kersting ◽

Michaela Geierhos

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Ad Hoc ◽

Domain Specific ◽

Compensation Process ◽

Language Requirement ◽

Chat Bot ◽

The Given ◽

Software Services

The vision of On-the-Fly (OTF) Computing is to compose and provide software services ad hoc, based on requirement descriptions in natural language. Since non-technical users write their software requirements themselves and in unrestricted natural language, deficits occur such as inaccuracy and incompleteness. These deficits are usually met by natural language processing methods, which have to face special challenges in OTF Computing because maximum automation is the goal. In this paper, we present current automatic approaches for solving inaccuracies and incompletenesses in natural language requirement descriptions and elaborate open challenges. In particular, we will discuss the necessity of domain-specific resources and show why, despite far-reaching automation, an intelligent and guided integration of end users into the compensation process is required. In this context, we present our idea of a chat bot that integrates users into the compensation process depending on the given circumstances.

Download Full-text

Natural Language Processing Techniques for Document Classification in IT Benchmarking - Automated Identification of Domain Specific Terms

Proceedings of the 17th International Conference on Enterprise Information Systems ◽

10.5220/0005462303600366 ◽

2015 ◽

Cited By ~ 2

Author(s):

Matthias Pfaff ◽

Helmut Krcmar

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Document Classification ◽

Automated Identification ◽

Domain Specific ◽

Processing Techniques ◽

It Benchmarking

Download Full-text

Domain specific word embeddings for natural language processing in radiology

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2020.103665 ◽

2021 ◽

Vol 113 ◽

pp. 103665

Author(s):

Timothy L. Chen ◽

Max Emerling ◽

Gunvant R. Chaudhari ◽

Yeshwant R. Chillakuru ◽

Youngho Seo ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Word Embeddings ◽

Domain Specific

Download Full-text

Lexical Knowledge Acquisition

10.1093/oxfordhb/9780199276349.013.0021 ◽

2012 ◽

Author(s):

Yuji Matsumoto

Keyword(s):

Natural Language Processing ◽

Mutual Information ◽

Natural Language ◽

Language Processing ◽

Lexical Knowledge ◽

Lexical Representations ◽

Domain Specific ◽

Text Corpora ◽

Corpus Data ◽

Machine Readable

This article deals with the acquisition of lexical knowledge, instrumental in complementing the ambiguous process of NLP (natural language processing). Imprecise in nature, lexical representations are mostly simple and superficial. The thesaurus would be an apt example. Two primary tools for acquiring lexical knowledge are ‘corpora’ and ‘machine-readable dictionary’ (MRD). The former are mostly domain specific, monolingual, while the definitions in MRD are generally described by a ‘genus term’ followed by a set of differentiae. Auxiliary technical nuances of the acquisition process, find mention as well, such as ‘lexical collocation’ and ‘association’, referring to the deliberate co-occurrence of words that form a new meaning altogether and loses it whenever a synonym replaces either of the words. The first seminal work on collocation extraction from large text corpora, was compiled around the early 1990s, using inter-word mutual information to locate collocation. Abundant corpus data would be obtainable from the Linguistic Data Consortium (LDC).

Download Full-text

Natural Language Processing–Assisted Literature Retrieval and Analysis for Combination Therapy in Cancer

JCO Clinical Cancer Informatics ◽

10.1200/cci.21.00109 ◽

2022 ◽

Author(s):

Jia Zeng ◽

Christian X. Cruz-Pico ◽

Turçin Saridogan ◽

Md Abu Shufean ◽

Michael Kahle ◽

...

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Combination Therapy ◽

Natural Language ◽

Search Engine ◽

Language Processing ◽

Biomedical Literature ◽

Antitumor Efficacy ◽

Domain Specific ◽

Validation Experiment

PURPOSE Despite advances in molecular therapeutics, few anticancer agents achieve durable responses. Rational combinations using two or more anticancer drugs have the potential to achieve a synergistic effect and overcome drug resistance, enhancing antitumor efficacy. A publicly accessible biomedical literature search engine dedicated to this domain will facilitate knowledge discovery and reduce manual search and review. METHODS We developed RetriLite, an information retrieval and extraction framework that leverages natural language processing and domain-specific knowledgebase to computationally identify highly relevant papers and extract key information. The modular architecture enables RetriLite to benefit from synergizing information retrieval and natural language processing techniques while remaining flexible to customization. We customized the application and created an informatics pipeline that strategically identifies papers that describe efficacy of using combination therapies in clinical or preclinical studies. RESULTS In a small pilot study, RetriLite achieved an F 1 score of 0.93. A more extensive validation experiment was conducted to determine agents that have enhanced antitumor efficacy in vitro or in vivo with poly (ADP-ribose) polymerase inhibitors: 95.9% of the papers determined to be relevant by our application were true positive and the application's feature of distinguishing a clinical paper from a preclinical paper achieved an accuracy of 97.6%. Interobserver assessment was conducted, which resulted in a 100% concordance. The data derived from the informatics pipeline have also been made accessible to the public via a dedicated online search engine with an intuitive user interface. CONCLUSION RetriLite is a framework that can be applied to establish domain-specific information retrieval and extraction systems. The extensive and high-quality metadata tags along with keyword highlighting facilitate information seekers to more effectively and efficiently discover knowledge in the combination therapy domain.

Download Full-text