Application of Data Mining Technology on Surveillance Report Data of HIV/AIDS High-Risk Group in Urumqi from 2009 to 2015

Complexity ◽

10.1155/2018/9193248 ◽

2018 ◽

Vol 2018 ◽

pp. 1-17

Author(s):

Dandan Tang ◽

Man Zhang ◽

Jiabo Xu ◽

Xueliang Zhang ◽

Fang Yang ◽

...

Keyword(s):

Data Mining ◽

High Risk ◽

Diagnostic Accuracy ◽

Random Forests ◽

Prediction Models ◽

Confusion Matrix ◽

Risk Groups ◽

Support Vector ◽

Mining Technology ◽

Hiv Aids

Objective. Urumqi is one of the key areas of HIV/AIDS infection in Xinjiang and in China. The AIDS epidemic is spreading from high-risk groups to the general population, and the situation is still very serious. The goal of this study was to use four data mining algorithms to establish the identification model of HIV infection and compare their predictive performance. Method. The data from the sentinel monitoring data of the three groups of high-risk groups (injecting drug users (IDU), men who have sex with men (MSM), and female sex workers (FSW)) in Urumqi from 2009 to 2015 included demographic characteristics, sex behavior, and serological detection results. Then we used age, marital status, education level, and other variables as input variables and whether to infect HIV as output variables to establish four prediction models for the three datasets. We also used confusion matrix, accuracy, sensitivity, specificity, precision, recall, and the area under the receiver operating characteristic (ROC) curve (AUC) to evaluate classification performance and analyzed the importance of predictive variables. Results. The final experimental results show that random forests algorithm obtains the best results, the diagnostic accuracy for random forests on MSM dataset is 94.4821%, 97.5136% on FSW dataset, and 94.6375% on IDU dataset. The k-nearest neighbors algorithm came out second, with 91.5258% diagnostic accuracy on MSM dataset, 96.3083% diagnostic accuracy on FSW dataset, and 90.8287% diagnostic accuracy on IDU dataset, followed by support vector machine (94.0182%, 98.0369%, and 91.3571%). The decision tree algorithm was the poorest among the four algorithms, with 79.1761% diagnostic accuracy on MSM dataset, 87.0283% diagnostic accuracy on FSW dataset, and 74.3879% accuracy on IDU. Conclusions. Data mining technology, as a new method of assisting disease screening and diagnosis, can help medical personnel to screen and diagnose AIDS rapidly from a large number of information.

Download Full-text

Risk Factors for Pneumonia and Death in Adult Patients With Seasonal Influenza and Establishment of Prediction Scores: A Population-Based Study

Open Forum Infectious Diseases ◽

10.1093/ofid/ofab068 ◽

2021 ◽

Vol 8 (3) ◽

Author(s):

Koichi Miyashita ◽

Eiji Nakatani ◽

Hironao Hozumi ◽

Yoko Sato ◽

Yoshiki Miyachi ◽

...

Keyword(s):

High Risk ◽

Seasonal Influenza ◽

Prediction Models ◽

Validation Cohort ◽

Risk Groups ◽

Population Based ◽

Limited Data ◽

Population Based Study ◽

Relative Risks ◽

Primary Care Settings

Abstract Background Seasonal influenza remains a global health problem; however, there are limited data on the specific relative risks for pneumonia and death among outpatients considered to be at high risk for influenza complications. This population-based study aimed to develop prediction models for determining the risk of influenza-related pneumonia and death. Methods We included patients diagnosed with laboratory-confirmed influenza between 2016 and 2017 (main cohort, n = 25 659), those diagnosed between 2015 and 2016 (validation cohort 1, n = 16 727), and those diagnosed between 2017 and 2018 (validation cohort 2, n = 34 219). Prediction scores were developed based on the incidence and independent predictors of pneumonia and death identified using multivariate analyses, and patients were categorized into low-, medium-, and high-risk groups based on total scores. Results In the main cohort, age, gender, and certain comorbidities (dementia, congestive heart failure, diabetes, and others) were independent predictors of pneumonia and death. The 28-day pneumonia incidence was 0.5%, 4.1%, and 10.8% in the low-, medium-, and high-risk groups, respectively (c-index, 0.75); the 28-day mortality was 0.05%, 0.7%, and 3.3% in the low-, medium-, and high-risk groups, respectively (c-index, 0.85). In validation cohort 1, c-indices for the models for pneumonia and death were 0.75 and 0.87, respectively. In validation cohort 2, c-indices for the models were 0.74 and 0.87, respectively. Conclusions We successfully developed and validated simple-to-use risk prediction models, which would promptly provide useful information for treatment decisions in primary care settings.

Download Full-text

Cardiovascular Disease Prediction using Data Mining Techniques

Oriental journal of computer science and technology ◽

10.13005/ojcst/10.02.38 ◽

2017 ◽

Vol 10 (2) ◽

pp. 520-528 ◽

Cited By ~ 1

Author(s):

Mudasir Kirmani

Keyword(s):

Data Mining ◽

Cardiovascular Disease ◽

Cardiovascular Diseases ◽

High Risk ◽

Learning Algorithm ◽

Heart Diseases ◽

Circulatory System ◽

World Health ◽

Support Vector ◽

Predictive Tool

Cardiovascular disease represents various diseases associated with heart, lymphatic system and circulatory system of human body. World Health Organisation (WHO) has reported that cardiovascular diseases have high mortality rate and high risk to cause various disabilities. Most prevalent causes for cardiovascular diseases are behavioural and food habits like tobacco intake, unhealthy diet and obesity, physical inactivity, ageing and addiction to drugs and alcohol are to name few. Factors such as hypertension, diabetes, hyperlipidemia, Stress and other ailments are at high risk to cardiovascular diseases. There have been different techniques to predict the prevalence of cardiovascular diseases in general and heart disease in particular from time to time by implementing variety of algorithms. Detection and management of cardiovascular diseases can be achieved by using computer based predictive tool in data mining. By implementing data mining based techniques there is scope for better and reliable prediction and diagnosis of heart diseases. In this study we studied various available techniques like decision Tree and its variants, Naive Bayes, Neural Networks, Support Vector Machine, Fuzzy Rules, Genetic Algorithms, and Ant Colony Optimization to name few. The observations illustrated that it is difficult to name a single machine learning algorithm for the diagnosis and prognosis of CVD. The study further contemplates on the behaviour, selection and number of factors required for efficient prediction.

Download Full-text

“HIV Is Our Friend”

Legalizing Sex ◽

10.18574/nyu/9781479810024.003.0002 ◽

2020 ◽

pp. 29-52

Author(s):

Chaitanya Lakkimsetti

Keyword(s):

High Risk ◽

Sex Workers ◽

Initial Response ◽

Risk Groups ◽

Leadership Role ◽

Sex Worker ◽

Aids Epidemic ◽

Marginalized Groups ◽

Sexually Marginalized ◽

Hiv Aids

This chapter provides an overview of HIV/AIDS policies as well as how sexually marginalized groups are drawn into biopower programs as “high-risk” groups. In 1983, when HIV/AIDS was first detected among sex workers in India, the state’s initial response was to blame the sex workers themselves as well as to forcefully test them and confine them in prison. However, it proved impossible to incarcerate every sex worker and to stop the spread of the HIV/AIDS epidemic. Instead, I argue, ultimately a consensus formed that supported giving marginalized groups a leadership role in tackling the epidemic. Drawing on ethnographic observations and the HIV/AIDS policy of the National AIDS Control Organization (NACO), this chapter also highlights how these biopower projects deepened the involvement of high-risk groups as they moved from simple prevention to behavioral change. Ultimately, communities became extensions of biopower projects as they implemented these programs at the day-to-day level.

Download Full-text

Enhancing Decision Tree with AdaBoost for Predicting Schizophrenia Readmission

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.931-932.1467 ◽

2014 ◽

Vol 931-932 ◽

pp. 1467-1471 ◽

Cited By ~ 1

Author(s):

Jaree Thongkam ◽

Vatinee Sukmak

Keyword(s):

Data Mining ◽

Decision Tree ◽

Random Forests ◽

Prediction Models ◽

Adverse Outcome ◽

Random Tree ◽

Psychiatric Readmission ◽

Using Data ◽

Insight Into ◽

F Measure

A psychiatric readmission is argued to be an adverse outcome because it is costly and occurs when relapse to the illness is so severe. An analysis of systematic models in readmission data can provide useful insight into the quicker and sicker patients with schizophrenia. This research aims to develop and investigate schizophrenia readmission prediction models using data mining techniques including decision tree, Random Tree, Random Forests, AdaBoost, Bagging and a combination of AdaBoost with decision tree, AdaBoost with Random Tree, AdaBoost with Random Forests, Bagging with decision tree, Bagging with Random Tree and Bagging with Random Forests. The experimental results successfully showed that AdaBoost with decision tree has the highest precision, recall and F-measure up to 98.11%, 98.79% and 98.41%, respectively.

Download Full-text

Cardiovascular Disease Prediction System Using Extra Trees Classifier

10.21203/rs.2.14454/v1 ◽

2019 ◽

Author(s):

Rahman Shafique ◽

Arif Mehmood ◽

Saleem ullah ◽

Gyu Sang Choi

Keyword(s):

Data Mining ◽

Cardiovascular Disease ◽

Health Care ◽

Support Vector Machine ◽

Prediction Models ◽

Support Vector ◽

Prediction System ◽

Classification Techniques ◽

Data Mining Techniques ◽

Tree Classifier

Abstract Heart Disease as cardiovascular disease is the leading cause of death for both men and women. It is the major cause of morbidity and mortality in present society. Therefore, researchers are working to help health care professionals in diagnosing process by using data mining techniques. Although the health care industry is richer in the database this data is not properly mined in order to discover hidden patterns and can able to make decisions based on these patterns. The major goal of this learning refers the extraction of hidden layers by applying numerous data mining techniques that probably give remarkable results in order to ensure the presence of cardiovascular disease among peoples. Data mining classification techniques are used to discover these patterns for research in medical industry. The dataset containing 13 attributes has analyzed for prediction system. The dataset contains some commonly used medical terms like blood pressure, cholesterol level, chest pain and 11 other attributes used to predict cardiovascular disease. The most common and effective classification techniques that are used in mining process are Verdict Tree commonly known as Decision Tree, Extra Trees Classifier, Random Forest, Support Vector Machine, Naive Bays and Logistic Regression has analyzed in this paper. Diagnosing and controlling ratio of deaths from cardiovascular disease Extra classifier trees consider is the best approach. We evaluate these prediction models by using evaluation parameters which are Accuracy, Precision, Recall, and F1-score. As per our experimental results shows accuracy of Extra trees classifier, Logistic Model tree classifier, support vector machine, and naive bays classifiers are 90%, 88%, 87%, 86% respectively. So as per our experiment analysis Extra Tree classifier with highest accuracy considered best approach for predication cardiovascular disease.

Download Full-text

Development of a Gene-Based Prediction Model for Recurrence of Colorectal Cancer Using an Ensemble Learning Algorithm

Frontiers in Oncology ◽

10.3389/fonc.2021.631056 ◽

2021 ◽

Vol 11 ◽

Author(s):

Han-Ching Chan ◽

Amrita Chattopadhyay ◽

Eric Y. Chuang ◽

Tzu-Pin Lu

Keyword(s):

Colorectal Cancer ◽

High Risk ◽

Adjuvant Chemotherapy ◽

Prediction Models ◽

Stage I ◽

Training Data ◽

Differentially Expressed ◽

Support Vector ◽

Data Set ◽

Risk Of Recurrence

It is difficult to determine which patients with stage I and II colorectal cancer are at high risk of recurrence, qualifying them to undergo adjuvant chemotherapy. In this study, we aimed to determine a gene signature using gene expression data that could successfully identify high risk of recurrence among stage I and II colorectal cancer patients. First, a synthetic minority oversampling technique was used to address the problem of imbalanced data due to rare recurrence events. We then applied a sequential workflow of three methods (significance analysis of microarrays, logistic regression, and recursive feature elimination) to identify genes differentially expressed between patients with and without recurrence. To stabilize the prediction algorithm, we repeated the above processes on 10 subsets by bagging the training data set and then used support vector machine methods to construct the prediction models. The final predictions were determined by majority voting. The 10 models, using 51 differentially expressed genes, successfully predicted a high risk of recurrence within 3 years in the training data set, with a sensitivity of 91.18%. For the validation data sets, the sensitivity of the prediction with samples from two other countries was 80.00% and 91.67%. These prediction models can potentially function as a tool to decide if adjuvant chemotherapy should be administered after surgery for patients with stage I and II colorectal cancer.

Download Full-text

The Sexual Behavior of Male Sexual Partner of Tranvestite in the Prevention Efforts of HIV/AIDS Transmission

Jurnal Kesehatan Masyarakat ◽

10.15294/kemas.v12i1.6542 ◽

2016 ◽

Vol 12 (1) ◽

Author(s):

Muhammad Azinar ◽

Anggipita Budi Mahardining

Keyword(s):

Sexual Behavior ◽

High Risk ◽

Sex Workers ◽

Risk Groups ◽

Regular Partner ◽

Multiple Sexual Partners ◽

Multiple Partners ◽

Biological And Behavioral Surveillance ◽

Male Sexual Partner ◽

Hiv Aids

Transvestite is one of the high risk groups in HIV/AIDS. Integrated Biological and Behavioral Surveillance (IBBS) states that in 2011, the HIV prevalence among waria in Indonesia has reached 22%, increasing from 2009 (18.96%). Such occurrence is because transvestite usually has multiple partners in intercourse both oral and anal sex, and rarely use condoms. Similarly, the male regular partners of transvestite also have sex with multiple sexual partners. Therefore, they also have a high risk of spread of HIV/AIDS. The objective of this study is to analyze sexual behavior of male regular partner of transvestite in Efforts to prevent the spread of HIV/AIDS. The study was carried out in 2014 using qualitative approach. Data was collected by in-depth interview on 6 male regular partner of transvestite through purposive sampling and triangulation informants of peer educators. The result shows that the use of condom on male regular partners of transvestite is still low and they inconsistently use when having sex with transvestite, female sex workers and female girlfriend. Their knowledge about HIV/AIDS is not good, and some regular partner of transvestite also felt susceptibility to contracting HIV. However, the perceived severity and perceived of benefits male regular partner of transvestite is good about HIV /AIDS despite the barrier of perceived and perceived self-efficacy is low related access to condoms.

Download Full-text

Data Mining Technology Application in False Text Information Recognition

Mobile Information Systems ◽

10.1155/2021/4206424 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Jie Wan ◽

Xue Cao ◽

Kun Yao ◽

Donghui Yang ◽

E. Peng ◽

...

Keyword(s):

Data Mining ◽

Classification Model ◽

Support Vector ◽

Svm Classifier ◽

Characteristic Matrix ◽

Mining Technology ◽

Technology Application ◽

Text Information ◽

The Government ◽

Effect Of The Support

False information on the Internet is being heralded as serious social harm to our society. To recognize false text information, in this paper, an effective method for mining text features is proposed in the field of false drug advertisements. Firstly, the data of false drug advertisements and real drug advertisements were collected from the official websites to build a database of false and real drug advertisements. Secondly, by performing feature extraction on the text of drug advertisements, this work built a characteristic matrix based on the effective features and assigned positive or negative labels to the feature vector of the matrix according to whether it is a fake medical advertisement or not. Thirdly, this study trained and tested several different classifiers, selected the classification model with the best performance in identifying false drug advertisements, and found the key characteristics that can determine the classification. Finally, the model with the best performance was used to predict new false drug advertisements collected from Sina Weibo. In the case of identifying false drug advertisements, the classification effect of the support vector machine (SVM) classifier established on the feature set after feature selection was the most effective. The findings of this study can provide an effective method for the government to identify and combat false advertisements. This study has a certain reference significance in demonstrating the use of text data mining technology to identify and detect information fraud behavior.

Download Full-text

SVM and Cross-validation using R Studio

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1673.1010120 ◽

2020 ◽

Vol 10 (1) ◽

pp. 46-54

Keyword(s):

Data Mining ◽

Missing Values ◽

Cross Validation ◽

Confusion Matrix ◽

Tourism Industry ◽

Support Vector ◽

Full Dataset ◽

The Matrix ◽

Social Media Platforms ◽

Classification Of Images

Each passing day data is getting multiplied. It is difficult to extract useful information from such big data. Data Mining is used to extract useful information. Data mining is used in majorly all fields like healthcare, marketing, social media platforms and so on. In this paper, data is loaded and preprocessed by dealing with some missing values. The dataset used is of Airbnb, the platform used for lodging and tourism industry. Analyzing the data by plotting correlation using spearman method. Further, applying PCA and Support Vector Machine classification technique on the dataset. There are various applications of SVM, it is used in face-detection, text and hypertext categorization, classification of images, bioinformatics and so on. SVM has high dimensional input space, sparse document vectors and regularization parameters therefore it is appropriate to use SVM. Cross-validation gives more accurate result. The dataset is divided into folds. The end product is the test set which is similar to full dataset. Confusion matrix is evaluated, grid approach is followed for building the matrix at various seeds and kernels (RBF, Polynomial). The aim of this research is to see which is the best kernel for the dataset.

Download Full-text

The sexual networks of female sex workers and potential HIV transmission risk: an entertainment venue-based study in Shaanxi, China

International Journal of STD & AIDS ◽

10.1177/0956462419886780 ◽

2020 ◽

Vol 31 (5) ◽

pp. 402-409

Author(s):

Huijun Liu ◽

Min Zhao ◽

Ying Wang ◽

Marcus W Feldman ◽

Qunying Xiao

Keyword(s):

High Risk ◽

General Population ◽

Sex Workers ◽

Female Sex Workers ◽

Risk Groups ◽

Sexual Networks ◽

Commercial Sex ◽

Sex Partners ◽

Sexual Network ◽

Hiv Aids

People involved in commercial sex are thought to be at high risk for human immunodeficiency virus/acquired immune deficiency syndrome (HIV/AIDS) transmission. To explore the characteristics of female sex workers’ (FSWs) sexual networks and how FSWs and their sex partners could serve as ‘bridges’ in HIV/AIDS transmission, egocentric sexual networks (where a subject is asked to identify his or her sexual contacts and their relationships) of 66 FSWs in Xi'an city, Shaanxi Province of China, were studied. Convenience sampling was used to collect FSWs’ socio-demographic and sexual behavior data, which we analyzed using social network and descriptive statistical methods. Results show that some egocentric sexual networks were connected by sex partners, and these were integrated into several components of a sexual network. According to centrality indicators, FSWs and their commercial sex partners (especially regular clients) served as key nodes within high-risk groups and as bridges between high-risk groups and the general population. The cluster of high-risk groups with cohesive sub-networks had larger network size (P < 0.001), more complex network structures, and more high-risk members (P < 0.05) than other isolated networks. The sexual network of FSWs was characterized by multiple sexual relations (680), unstable relationships (50.15%), and a high rate of inconsistent condom use with non-commercial sex partners (31.22%). By linking commercial and non-commercial sexual networks, the FSWs and their clients can become effective bridges for HIV/AIDS spread from high-risk groups to the general population.

Download Full-text