scholarly journals Transformers for Classifying Fourth Amendment Elements and Factors Tests

Author(s):  
Evan Gretok ◽  
David Langerman ◽  
Wesley M. Oliver

Determining if a court has applied a bright-line or totality-of-the-circumstances rule for Fourth Amendment cases demonstrates a difficult problem even for human lawyers and justices. Determining the type of test that governs an issue is essential to answering a legal question. Modern natural language processing (NLP) tools, such as transformers, demonstrate the capacity to extract relevant features from unlabelled text. This study demonstrates the effectiveness of the BERT, RoBERTa, and ALBERT transformer models to classify Fourth Amendment cases by bright-line or totality-of-the-circumstances rule. Two approaches are considered in which models are trained with either positive language extracted by a domain-expert or with full texts of cases. Transformers attain up to 92.31% accuracy on full texts, further demonstrating the capability of NLP techniques on domain-specific tasks even without handcrafted features.

2020 ◽  
Author(s):  
Vadim V. Korolev ◽  
Artem Mitrofanov ◽  
Kirill Karpov ◽  
Valery Tkachenko

The main advantage of modern natural language processing methods is a possibility to turn an amorphous human-readable task into a strict mathematic form. That allows to extract chemical data and insights from articles and to find new semantic relations. We propose a universal engine for processing chemical and biological texts. We successfully tested it on various use-cases and applied to a case of searching a therapeutic agent for a COVID-19 disease by analyzing PubMed archive.


2011 ◽  
Vol 474-476 ◽  
pp. 460-465
Author(s):  
Bo Sun ◽  
Sheng Hui Huang ◽  
Xiao Hua Liu

Unknown word is a kind of word that is not included in the sub_word vocabulary, but must be cut out by the word segmentation program. Peoples’ names, place names and translated names are the major unknown words.Unknown Chinese words is a difficult problem in natural language processing, and also contributed to the low rate of correct segmention. This paper introduces the finite multi-list method that using the word fragments’ capability to composite a word and the location in the word tree to process the unknown Chinese words.The experiment recall is 70.67% ,the correct rate is 43.65% .The result of the experiment shows that unknown Chinese word identification based on the finite multi-list method is feasible.


2020 ◽  
Author(s):  
Vadim V. Korolev ◽  
Artem Mitrofanov ◽  
Kirill Karpov ◽  
Valery Tkachenko

The main advantage of modern natural language processing methods is a possibility to turn an amorphous human-readable task into a strict mathematic form. That allows to extract chemical data and insights from articles and to find new semantic relations. We propose a universal engine for processing chemical and biological texts. We successfully tested it on various use-cases and applied to a case of searching a therapeutic agent for a COVID-19 disease by analyzing PubMed archive.


2018 ◽  
Vol 12 (02) ◽  
pp. 237-260
Author(s):  
Weifeng Xu ◽  
Dianxiang Xu ◽  
Abdulrahman Alatawi ◽  
Omar El Ariss ◽  
Yunkai Liu

Unigram is a fundamental element of [Formula: see text]-gram in natural language processing. However, unigrams collected from a natural language corpus are unsuitable for solving problems in the domain of computer programming languages. In this paper, we analyze the properties of unigrams collected from an ultra-large source code repository. Specifically, we have collected 1.01 billion unigrams from 0.7 million open source projects hosted at GitHub.com. By analyzing these unigrams, we have discovered statistical properties regarding (1) how developers name variables, methods, and classes, and (2) how developers choose abbreviations. We describe a probabilistic model which relies on these properties for solving a well-known problem in source code analysis: how to expand a given abbreviation to its original indented word. Our empirical study shows that using the unigrams extracted from source code repository outperforms the using of the natural language corpus by 21% when solving the domain specific problems.


Computers ◽  
2019 ◽  
Vol 8 (1) ◽  
pp. 22
Author(s):  
Frederik Bäumer ◽  
Joschka Kersting ◽  
Michaela Geierhos

The vision of On-the-Fly (OTF) Computing is to compose and provide software services ad hoc, based on requirement descriptions in natural language. Since non-technical users write their software requirements themselves and in unrestricted natural language, deficits occur such as inaccuracy and incompleteness. These deficits are usually met by natural language processing methods, which have to face special challenges in OTF Computing because maximum automation is the goal. In this paper, we present current automatic approaches for solving inaccuracies and incompletenesses in natural language requirement descriptions and elaborate open challenges. In particular, we will discuss the necessity of domain-specific resources and show why, despite far-reaching automation, an intelligent and guided integration of end users into the compensation process is required. In this context, we present our idea of a chat bot that integrates users into the compensation process depending on the given circumstances.


2021 ◽  
Vol 113 ◽  
pp. 103665
Author(s):  
Timothy L. Chen ◽  
Max Emerling ◽  
Gunvant R. Chaudhari ◽  
Yeshwant R. Chillakuru ◽  
Youngho Seo ◽  
...  

Author(s):  
Yuji Matsumoto

This article deals with the acquisition of lexical knowledge, instrumental in complementing the ambiguous process of NLP (natural language processing). Imprecise in nature, lexical representations are mostly simple and superficial. The thesaurus would be an apt example. Two primary tools for acquiring lexical knowledge are ‘corpora’ and ‘machine-readable dictionary’ (MRD). The former are mostly domain specific, monolingual, while the definitions in MRD are generally described by a ‘genus term’ followed by a set of differentiae. Auxiliary technical nuances of the acquisition process, find mention as well, such as ‘lexical collocation’ and ‘association’, referring to the deliberate co-occurrence of words that form a new meaning altogether and loses it whenever a synonym replaces either of the words. The first seminal work on collocation extraction from large text corpora, was compiled around the early 1990s, using inter-word mutual information to locate collocation. Abundant corpus data would be obtainable from the Linguistic Data Consortium (LDC).


Author(s):  
Jia Zeng ◽  
Christian X. Cruz-Pico ◽  
Turçin Saridogan ◽  
Md Abu Shufean ◽  
Michael Kahle ◽  
...  

PURPOSE Despite advances in molecular therapeutics, few anticancer agents achieve durable responses. Rational combinations using two or more anticancer drugs have the potential to achieve a synergistic effect and overcome drug resistance, enhancing antitumor efficacy. A publicly accessible biomedical literature search engine dedicated to this domain will facilitate knowledge discovery and reduce manual search and review. METHODS We developed RetriLite, an information retrieval and extraction framework that leverages natural language processing and domain-specific knowledgebase to computationally identify highly relevant papers and extract key information. The modular architecture enables RetriLite to benefit from synergizing information retrieval and natural language processing techniques while remaining flexible to customization. We customized the application and created an informatics pipeline that strategically identifies papers that describe efficacy of using combination therapies in clinical or preclinical studies. RESULTS In a small pilot study, RetriLite achieved an F 1 score of 0.93. A more extensive validation experiment was conducted to determine agents that have enhanced antitumor efficacy in vitro or in vivo with poly (ADP-ribose) polymerase inhibitors: 95.9% of the papers determined to be relevant by our application were true positive and the application's feature of distinguishing a clinical paper from a preclinical paper achieved an accuracy of 97.6%. Interobserver assessment was conducted, which resulted in a 100% concordance. The data derived from the informatics pipeline have also been made accessible to the public via a dedicated online search engine with an intuitive user interface. CONCLUSION RetriLite is a framework that can be applied to establish domain-specific information retrieval and extraction systems. The extensive and high-quality metadata tags along with keyword highlighting facilitate information seekers to more effectively and efficiently discover knowledge in the combination therapy domain.


Sign in / Sign up

Export Citation Format

Share Document