The Rise of Big Data Science: A Survey of Techniques, Methods and Approaches in the Field of Natural Language Processing and Network Theory

Jeffrey Ray; Olayinka Johnny; Marcello Trovati; Stelios Sotiriadis; Nik Bessis

doi:10.3390/bdcc2030022

The Rise of Big Data Science: A Survey of Techniques, Methods and Approaches in the Field of Natural Language Processing and Network Theory

Big Data and Cognitive Computing ◽

10.3390/bdcc2030022 ◽

2018 ◽

Vol 2 (3) ◽

pp. 22 ◽

Cited By ~ 3

Author(s):

Jeffrey Ray ◽

Olayinka Johnny ◽

Marcello Trovati ◽

Stelios Sotiriadis ◽

Nik Bessis

Keyword(s):

Big Data ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Network Theory ◽

Data Science ◽

Theoretical Foundation ◽

Scientific Field ◽

Research Challenges ◽

New Research

The continuous creation of data has posed new research challenges due to its complexity, diversity and volume. Consequently, Big Data has increasingly become a fully recognised scientific field. This article provides an overview of the current research efforts in Big Data science, with particular emphasis on its applications, as well as theoretical foundation.

Download Full-text

Content Analysis of Textbooks via Natural Language Processing: Findings on Gender, Race, and Ethnicity in Texas U.S. History Textbooks

AERA Open ◽

10.1177/2332858420940312 ◽

2020 ◽

Vol 6 (3) ◽

pp. 233285842094031

Author(s):

Li Lucy ◽

Dorottya Demszky ◽

Patricia Bromley ◽

Dan Jurafsky

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Race And Ethnicity ◽

Data Science ◽

Black People ◽

History Textbooks ◽

Word Embeddings ◽

Representation Of Women ◽

New Research

Cutting-edge data science techniques can shed new light on fundamental questions in educational research. We apply techniques from natural language processing (lexicons, word embeddings, topic models) to 15 U.S. history textbooks widely used in Texas between 2015 and 2017, studying their depiction of historically marginalized groups. We find that Latinx people are rarely discussed, and the most common famous figures are nearly all White men. Lexicon-based approaches show that Black people are described as performing actions associated with low agency and power. Word embeddings reveal that women tend to be discussed in the contexts of work and the home. Topic modeling highlights the higher prominence of political topics compared with social ones. We also find that more conservative counties tend to purchase textbooks with less representation of women and Black people. Building on a rich tradition of textbook analysis, we release our computational toolkit to support new research directions.

Download Full-text

Patient-oriented natural language processing: Defining a new paradigm for research and development to facilitate adoption and utilization by medical experts (Preprint)

10.2196/preprints.18471 ◽

2020 ◽

Author(s):

Abeed Sarker ◽

Mohammed Ali Al-Garadi ◽

Yuan-Chi Yang ◽

Jinho Choi ◽

Arshed A Quyyumi ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Research And Development ◽

Language Processing ◽

Data Science ◽

New Paradigm ◽

Heterogeneous Datasets ◽

System Properties ◽

New Research ◽

Specific Symptoms

UNSTRUCTURED The capabilities of natural language processing (NLP) methods have expanded significantly in recent years, particularly driven by advances in data science and machine learning. However, the utilization of NLP for patient-oriented clinical research and care (POCRC) is still limited. A primary reason behind this is perhaps the fact that clinical NLP methods are developed, optimized, and evaluated on narrow-focus datasets and tasks (e.g., for the detection of specific symptoms from free texts). Such research and development (R&D) approaches may be described as problem-oriented, and the developed systems only perform well for a given specialized task. As standalone systems, they are also typically not suitable for addressing the needs of POCRC, leaving a gap between the capabilities of clinical NLP methods and the needs of patient-facing medical experts. We believe that to make clinical NLP systems more valuable, future R&D efforts need to follow a new research paradigm, one that explicitly incorporates characteristics that are crucial for POCRC. We present our viewpoint about four interrelated characteristics, three representing NLP system properties and one associated with the R&D process—(i) generalizability (capability to characterize patients, not clinical problems), (ii) interpretability (ability to explain system decisions), (iii) customizability (flexibility for adaptation to distinct settings, problems and cohorts), and (iv) cross-evaluation (validated performance on heterogeneous datasets)—that are relevant for NLP systems suitable for POCRC. Using the NLP task of clinical concept detection as an example, we detail these characteristics and discuss how they may lead to increased uptake of NLP systems for POCRC.

Download Full-text

Ansätze zur quantitativen Inhaltsanalyse

WiSt - Wirtschaftswissenschaftliches Studium ◽

10.15358/0340-1650-2021-2-3-17 ◽

2021 ◽

Vol 50 (2-3) ◽

pp. 17-22

Author(s):

Johannes Brunzel

Keyword(s):

Big Data ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing

Der Beitrag erläutert, inwiefern die Methode der quantitativen Textanalyse ein wesentliches Mittel zur betriebswirtschaftlichen Effizienzsteigerung sein kann. Dabei geht der Artikel über die Nennung von Chancen und Risiken des Einsatzes von künstlicher Intelligenz/Big Data-Analysen hinaus, indem der Beitrag praxisorientiert wichtige Entwicklungen im Bereich der quantitativen Inhaltsanalyse aus der wirtschaftswissenschaftlichen Literatur herleitet. Nachfolgend unterteilt der Artikel die wichtigsten Schritte zur Implementierung in (1) Datenerhebung von quantitativen Textdaten, (2) Durchführung der generischen Textanalyse und (3) Durchführung des Natural Language Processing. Als ein Hauptergebnis hält der Artikel fest, dass Natural Language Processing-Ansätze zwar weiterführende und komplexere Einsichten bieten, jedoch das Potenzial generischer Textanalyse - aufgrund der Flexibilität und verhältnismäßig einfachen Anwendbarkeit im Unternehmenskontext - noch nicht ausgeschöpft ist. Zudem stehen Führungskräfte vor der dichotomen Entscheidung, ob programmierbasierte oder kommerzielle Lösungen für die Durchführung der Textanalyse relevant sind.

Download Full-text

EOR/IOR Screening with Big Data Analytics and Natural Language Processing for Unstructured Data: A Statistical Approach

10.2118/181117-ms ◽

2016 ◽

Author(s):

Sardar Afra ◽

Mohammadali Tarrahi

Keyword(s):

Big Data ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Data Analytics ◽

Statistical Approach ◽

Big Data Analytics ◽

Unstructured Data

Download Full-text

Advanced Well Planning Using Natural Language Processing NLP and Data Science Models: Maximizing the Value of Data to Mitigate Costs and Risks in New Wells

10.2118/203280-ms ◽

2020 ◽

Author(s):

John Cumming ◽

Valentina Riggins ◽

Paul Hodson ◽

Barry Walker

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Data Science ◽

Science Models

Download Full-text

Big Data and Natural Language Processing for Analysing Railway Safety

Innovative Applications of Big Data in the Railway Industry - Advances in Civil and Industrial Engineering ◽

10.4018/978-1-5225-3176-0.ch011 ◽

2018 ◽

pp. 240-267

Author(s):

Kanza Noor Syeda ◽

Syed Noorulhassan Shirazi ◽

Syed Asad Ali Naqvi ◽

Howard J Parkinson ◽

Gary Bamford

Keyword(s):

Big Data ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Machine Intelligence ◽

Data Availability ◽

Accident Data ◽

Data Driven Approach ◽

Advanced Analytics ◽

The Uk

Due to modern powerful computing and the explosion in data availability and advanced analytics, there should be opportunities to use a Big Data approach to proactively identify high risk scenarios on the railway. In this chapter, we comprehend the need for developing machine intelligence to identify heightened risk on the railway. In doing so, we have explained a potential for a new data driven approach in the railway, we then focus the rest of the chapter on Natural Language Processing (NLP) and its potential for analysing accident data. We review and analyse investigation reports of railway accidents in the UK, published by the Rail Accident Investigation Branch (RAIB), aiming to reveal the presence of entities which are informative of causes and failures such as human, technical and external. We give an overview of a framework based on NLP and machine learning to analyse the raw text from RAIB reports which would assist the risk and incident analysis experts to study causal relationship between causes and failures towards the overall safety in the rail industry.

Download Full-text

Analysis of the Impact of the US Presidential Election on the US Economy Based on Natural Language Processing and Big Data

Computational and Experimental Simulations in Engineering - Mechanisms and Machine Science ◽

10.1007/978-3-030-67090-0_39 ◽

2021 ◽

pp. 483-494

Author(s):

Mingzhen Li ◽

Xiangdong Liu

Keyword(s):

Big Data ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Presidential Election ◽

Us Economy ◽

The Us ◽

Us Presidential Election ◽

The Impact

Download Full-text

Big Data Security Issues and Natural Language Processing

2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI) ◽

10.1109/icoei.2019.8862744 ◽

2019 ◽

Author(s):

S Thejaswini ◽

C Indupriya

Keyword(s):

Big Data ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Data Security ◽

Security Issues

Download Full-text

Teaching Natural Language Processing through Big Data Text Summarization with Problem-Based Learning

Data and Information Management ◽

10.2478/dim-2020-0003 ◽

2020 ◽

Vol 4 (1) ◽

pp. 18-43

Author(s):

Liuqing Li ◽

Jack Geissinger ◽

William A. Ingram ◽

Edward A. Fox

Keyword(s):

Big Data ◽

Natural Language Processing ◽

Graduate Students ◽

Natural Language ◽

Information Management ◽

Language Processing ◽

Problem Based Learning ◽

Text Summarization ◽

Data Sets ◽

Student Teams

AbstractNatural language processing (NLP) covers a large number of topics and tasks related to data and information management, leading to a complex and challenging teaching process. Meanwhile, problem-based learning is a teaching technique specifically designed to motivate students to learn efficiently, work collaboratively, and communicate effectively. With this aim, we developed a problem-based learning course for both undergraduate and graduate students to teach NLP. We provided student teams with big data sets, basic guidelines, cloud computing resources, and other aids to help different teams in summarizing two types of big collections: Web pages related to events, and electronic theses and dissertations (ETDs). Student teams then deployed different libraries, tools, methods, and algorithms to solve the task of big data text summarization. Summarization is an ideal problem to address learning NLP since it involves all levels of linguistics, as well as many of the tools and techniques used by NLP practitioners. The evaluation results showed that all teams generated coherent and readable summaries. Many summaries were of high quality and accurately described their corresponding events or ETD chapters, and the teams produced them along with NLP pipelines in a single semester. Further, both undergraduate and graduate students gave statistically significant positive feedback, relative to other courses in the Department of Computer Science. Accordingly, we encourage educators in the data and information management field to use our approach or similar methods in their teaching and hope that other researchers will also use our data sets and synergistic solutions to approach the new and challenging tasks we addressed.

Download Full-text

Business Sentiment Quotient Analysis using Natural Language Processing

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.d8721.049420 ◽

2020 ◽

Vol 9 (4) ◽

pp. 1350-1352

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Embedding Technique ◽

Computer Scientists ◽

Online Business ◽

New Research ◽

Python Programming ◽

Market Requirement

Online business has opened up several avenues for researchers and computer scientists to initiate new research models. The business activities that the customers accomplish certainly produce abundant information /data. Analysis of the data/information will obviously produce useful inferences and many declarations. These inferences may support the system in improving the quality of service, understand the current market requirement, Trend of the business, future need of the society and so on. In this connection the current paper is trying to propose a feature extraction technique named as Business Sentiment Quotient (BSQ). BSQ involves word2vec[1] word embedding technique from Natural Language Processing. Number of tweets related to business are accessed from twitter and processed to estimate BSQ using python programming language. BSQ may be utilized for further Machine Learning Activities.

Download Full-text