Machine Intelligence for Integrated Workover Operations

Mapping Intimacies ◽

10.2118/204423-ms ◽

2021 ◽

Author(s):

Saniya Karnik ◽

Supriya Gupta ◽

Jason Baihly

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Best Practices ◽

Natural Language ◽

Language Processing ◽

Field Data ◽

Machine Intelligence ◽

Cost Information ◽

Production History ◽

The Rich

Abstract Because of recent advancements in the field of natural language processing (NLP) and machine learning, there is potential to ingest decades of field history and heterogeneous production records. This paper proposes an analytics workflow that leverages artificial intelligence to process thousands of historical workover reports (handwritten and electronic), extract important information, learn patterns in production activity, and train machines to quantify workover impact and derive best practices for field operations. Natural language processing libraries were developed to ingest and catalog gigabytes of field data, identify rich sources of workover information, and extract workover and cost information from unstructured reports. A machine learning (ML) model was developed and trained to predict well intervention categories based on free text describing workovers found in reports. This ML model learnt pattern and context of repeating words pertaining to a workover type (e.g. Artificial Lift, Well Integrity, etc.) and to classify reports accordingly. Statistical models were built to determine return on investment from workovers and rank them based on production improvement and payout time. Today, 80% of an oilfield expert's time can be spent manually organizing data. When processing decades of historical oilfield production data spread across both structured (production timeseries) and unstructured records (e.g., workover reports), experts often face two major challenges: 1) How to rapidly analyze field data with thousands of historical records. 2) How to use the rich historical information to generate effective insights to optimize production. In this paper, we analyzed multiple field datasets in a heterogeneous file environment with 20 different file formats (PDF, Excel, and other formats), 2,000+ files and production history spanning 50+ years across and 2000+ producing wells. Libraries were developed to extract workover files from complex folder hierarchies through an intelligent automated search. Information from reports was extracted through Python libraries and optical character recognition technology to build master data source with production history, workover, and cost information. A neural network model was trained to predict workover class for each report with >85% accuracy. The rich dataset was then used to analyze episodic workover activity by well and compute key performance indicators (KPIs) to identify well candidates for production enhancement. The building blocks included quantifying production upside and calculating return of investment for various workover classes. O&G companies have vast volumes of unstructured data and use less than 1% of it to uncover meaningful insights about field operations. Our workflow describes methodology to ingest both structured and unstructured documents, capture knowledge, quantify production upside, understand capital spending, and learn best practices in workover operations through an automated process. This process helps optimize forward operating expense (OPEX) plan with focus on cost reduction and shortens turnaround time for decision making.

Download Full-text

Data Driven Solution for Enhancing Workover Intervention Activities

10.2118/204712-ms ◽

2021 ◽

Author(s):

Saniya Karnik ◽

Supriya Gupta ◽

Jason Baihly ◽

David Saier

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Best Practices ◽

Natural Language ◽

Language Processing ◽

Field Data ◽

Free Text ◽

Cost Information ◽

Production History ◽

The Rich

Abstract Recent advancements in the field of natural language processing (NLP) and machine learning has allowed for the potential to ingest decades of field history and heterogeneous production records. This paper proposes an analytics workflow that leverages artificial intelligence to process thousands of historical workover reports (handwritten and electronic), extract important information, learn patterns in production activity, and train machines to quantify workover impact and derive best practices for field operations. Natural language processing libraries were developed to ingest and catalog gigabytes of field data, identify rich sources of workover information, and extract workover and cost information from unstructured reports. A clustering based architecture was developed and trained to categorize documents based on free text describing the activities found in reports. This machine learning model learnt the pattern and context of repeating words and was able to cluster documents with similar content together. This enabled the user to find a category of documents e.g. workover intervention reports instantaneously. Statistical models were built to determine return on investment from workovers and rank them based on production improvement and payout time. Today, 80% of an oilfield expert's time can be spent manually organizing data. When processing decades of historical oilfield production data spread across both structured (production timeseries) and unstructured records (e.g., workover reports), experts often face two major challenges: 1) How to rapidly analyze field data with thousands of historical records. 2) How to use the rich historical information to generate effective insights to take the proper actions to optimize production. In this paper, we analyzed multiple field datasets in a heterogeneous file environment with 20 different file formats (PDF, Excel, and other formats), 2,000+ files, production history spanning 50+ years across, and 2,000+ producing wells. Libraries were developed to extract files from complex folder hierarchies, machine learning architectures assisted in finding the workover reports from the myriad documents. Information from reports was extracted through Python libraries and optical character recognition technology to build master data source with production history, workover and cost information. The rich dataset was then used to analyze episodic workover activity by well and compute key performance indicators (KPIs) to identify well candidates for production enhancement. The building blocks included quantifying production upside and calculating return of investment for various workover classes. O&G companies have vast volumes of unstructured data and use less than 1% of it to uncover meaningful insights about field operations. Our workflow describes a methodology to ingest both structured and unstructured documents, capture knowledge, quantify production upside, understand capital spending, and learn best practices in workover operations through an automated process. This process helps optimize forward operating expense (OPEX) plans with a focus on cost reduction and shortened turnaround time for decision making.

Download Full-text

Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing - FeatureEng '05

10.3115/1610230 ◽

2005 ◽

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Feature Engineering

Download Full-text

A Machine Learning Application for Raising WASH Awareness in the Times of COVID-19 Pandemic (Preprint)

10.2196/preprints.25320 ◽

2020 ◽

Cited By ~ 1

Author(s):

Rohan Pandey ◽

Vaibhav Gautam ◽

Ridam Pal ◽

Harsh Bandhey ◽

Lovedeep Singh Dhingra ◽

...

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

User Feedback ◽

Who Guidelines ◽

The Times ◽

The Right ◽

Local Languages

BACKGROUND The COVID-19 pandemic has uncovered the potential of digital misinformation in shaping the health of nations. The deluge of unverified information that spreads faster than the epidemic itself is an unprecedented phenomenon that has put millions of lives in danger. Mitigating this ‘Infodemic’ requires strong health messaging systems that are engaging, vernacular, scalable, effective and continuously learn the new patterns of misinformation. OBJECTIVE We created WashKaro, a multi-pronged intervention for mitigating misinformation through conversational AI, machine translation and natural language processing. WashKaro provides the right information matched against WHO guidelines through AI, and delivers it in the right format in local languages. METHODS We theorize (i) an NLP based AI engine that could continuously incorporate user feedback to improve relevance of information, (ii) bite sized audio in the local language to improve penetrance in a country with skewed gender literacy ratios, and (iii) conversational but interactive AI engagement with users towards an increased health awareness in the community. RESULTS A total of 5026 people who downloaded the app during the study window, among those 1545 were active users. Our study shows that 3.4 times more females engaged with the App in Hindi as compared to males, the relevance of AI-filtered news content doubled within 45 days of continuous machine learning, and the prudence of integrated AI chatbot “Satya” increased thus proving the usefulness of an mHealth platform to mitigate health misinformation. CONCLUSIONS We conclude that a multi-pronged machine learning application delivering vernacular bite-sized audios and conversational AI is an effective approach to mitigate health misinformation. CLINICALTRIAL Not Applicable

Download Full-text

Thai Fake News Detection Based on Information Retrieval, Natural Language Processing and Machine Learning

SN Computer Science ◽

10.1007/s42979-021-00775-6 ◽

2021 ◽

Vol 2 (6) ◽

Author(s):

Phayung Meesad

Keyword(s):

Machine Learning ◽

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Fake News

Download Full-text

Automate Traditional Interviewing Process Using Natural Language Processing and Machine Learning

2021 6th International Conference for Convergence in Technology (I2CT) ◽

10.1109/i2ct51068.2021.9418115 ◽

2021 ◽

Author(s):

Pasindu Senarathne ◽

Malaka Silva ◽

Ama Methmini ◽

Dulaj Kavinda ◽

Samantha Thelijjagoda

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing

Download Full-text

Comparative Question Answering System based on Natural Language Processing and Machine Learning

2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS) ◽

10.1109/icais50930.2021.9396015 ◽

2021 ◽

Author(s):

Rohit Arora ◽

Parth Singh ◽

Hemlata Goyal ◽

Sunita Singhal ◽

Smita Vijayvargiya

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Question Answering ◽

Question Answering System

Download Full-text

Social Media Content Categorization Using Supervised Based Machine Learning Methods and Natural Language Processing in Bangla Language

2020 11th International Conference on Electrical and Computer Engineering (ICECE) ◽

10.1109/icece51571.2020.9393095 ◽

2020 ◽

Author(s):

Md. Rejaul Alam ◽

Afsana Akter ◽

Minhajul Abedin Shafin ◽

Md. Mehedi Hasan ◽

Antara Mahmud

Keyword(s):

Machine Learning ◽

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Media Content ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Applying natural language processing and machine learning techniques to patient experience feedback: a systematic review

BMJ Health & Care Informatics ◽

10.1136/bmjhci-2020-100262 ◽

2021 ◽

Vol 28 (1) ◽

pp. e100262

Author(s):

Mustafa Khanbhai ◽

Patrick Anyadi ◽

Joshua Symons ◽

Kelsey Flott ◽

Ara Darzi ◽

...

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Patient Experience ◽

Language Processing ◽

Performance Metrics ◽

Free Text ◽

Patient Feedback

ObjectivesUnstructured free-text patient feedback contains rich information, and analysing these data manually would require a lot of personnel resources which are not available in most healthcare organisations.To undertake a systematic review of the literature on the use of natural language processing (NLP) and machine learning (ML) to process and analyse free-text patient experience data.MethodsDatabases were systematically searched to identify articles published between January 2000 and December 2019 examining NLP to analyse free-text patient feedback. Due to the heterogeneous nature of the studies, a narrative synthesis was deemed most appropriate. Data related to the study purpose, corpus, methodology, performance metrics and indicators of quality were recorded.ResultsNineteen articles were included. The majority (80%) of studies applied language analysis techniques on patient feedback from social media sites (unsolicited) followed by structured surveys (solicited). Supervised learning was frequently used (n=9), followed by unsupervised (n=6) and semisupervised (n=3). Comments extracted from social media were analysed using an unsupervised approach, and free-text comments held within structured surveys were analysed using a supervised approach. Reported performance metrics included the precision, recall and F-measure, with support vector machine and Naïve Bayes being the best performing ML classifiers.ConclusionNLP and ML have emerged as an important tool for processing unstructured free text. Both supervised and unsupervised approaches have their role depending on the data source. With the advancement of data analysis tools, these techniques may be useful to healthcare organisations to generate insight from the volumes of unstructured free-text data.

Download Full-text

Obtaining Knowledge in Pathology Reports Through a Natural Language Processing Approach With Classification, Named-Entity Recognition, and Relation-Extraction Heuristics

JCO Clinical Cancer Informatics ◽

10.1200/cci.19.00008 ◽

2019 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Tomasz Oliwa ◽

Steven B. Maron ◽

Leah M. Chase ◽

Samantha Lomnicki ◽

Daniel V.T. Catenacci ◽

...

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Classification Model ◽

Supervised Machine Learning ◽

Named Entity ◽

Pathology Reports

PURPOSE Robust institutional tumor banks depend on continuous sample curation or else subsequent biopsy or resection specimens are overlooked after initial enrollment. Curation automation is hindered by semistructured free-text clinical pathology notes, which complicate data abstraction. Our motivation is to develop a natural language processing method that dynamically identifies existing pathology specimen elements necessary for locating specimens for future use in a manner that can be re-implemented by other institutions. PATIENTS AND METHODS Pathology reports from patients with gastroesophageal cancer enrolled in The University of Chicago GI oncology tumor bank were used to train and validate a novel composite natural language processing-based pipeline with a supervised machine learning classification step to separate notes into internal (primary review) and external (consultation) reports; a named-entity recognition step to obtain label (accession number), location, date, and sublabels (block identifiers); and a results proofreading step. RESULTS We analyzed 188 pathology reports, including 82 internal reports and 106 external consult reports, and successfully extracted named entities grouped as sample information (label, date, location). Our approach identified up to 24 additional unique samples in external consult notes that could have been overlooked. Our classification model obtained 100% accuracy on the basis of 10-fold cross-validation. Precision, recall, and F1 for class-specific named-entity recognition models show strong performance. CONCLUSION Through a combination of natural language processing and machine learning, we devised a re-implementable and automated approach that can accurately extract specimen attributes from semistructured pathology notes to dynamically populate a tumor registry.

Download Full-text

A Survey on Bias in Deep NLP

Applied Sciences ◽

10.3390/app11073184 ◽

2021 ◽

Vol 11 (7) ◽

pp. 3184

Author(s):

Ismael Garrido-Muñoz ◽

Arturo Montejo-Ráez ◽

Fernando Martínez-Santiago ◽

L. Alfonso Ureña-López

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Natural Language Processing ◽

Probability Distribution ◽

Natural Language ◽

Network Design ◽

Language Processing ◽

Deep Neural Networks ◽

Learning Processes ◽

Relevant Issue

Deep neural networks are hegemonic approaches to many machine learning areas, including natural language processing (NLP). Thanks to the availability of large corpora collections and the capability of deep architectures to shape internal language mechanisms in self-supervised learning processes (also known as “pre-training”), versatile and performing models are released continuously for every new network design. These networks, somehow, learn a probability distribution of words and relations across the training collection used, inheriting the potential flaws, inconsistencies and biases contained in such a collection. As pre-trained models have been found to be very useful approaches to transfer learning, dealing with bias has become a relevant issue in this new scenario. We introduce bias in a formal way and explore how it has been treated in several networks, in terms of detection and correction. In addition, available resources are identified and a strategy to deal with bias in deep NLP is proposed.

Download Full-text