Factors Influencing Matching of Ride-Hailing Service Using Machine Learning Method

Myungsik Do; Wanhee Byun; Doh Kyoum Shin; Hyeryun Jin

doi:10.3390/su11205615

Factors Influencing Matching of Ride-Hailing Service Using Machine Learning Method

Sustainability ◽

10.3390/su11205615 ◽

2019 ◽

Vol 11 (20) ◽

pp. 5615 ◽

Cited By ~ 2

Author(s):

Myungsik Do ◽

Wanhee Byun ◽

Doh Kyoum Shin ◽

Hyeryun Jin

Keyword(s):

Machine Learning ◽

Success Rate ◽

Cross Validation ◽

Average Distance ◽

Machine Learning Method ◽

Land Uses ◽

Taxi Drivers ◽

Factors Influencing ◽

Taxi Service ◽

The City

It is common to call a taxi by taxi-apps in Korea and it was believed that an app-taxi service would provide customers with more convenience. However, customers’ requests can often be denied, as taxi drivers can decide whether to take calls from customers or not. Therefore, studies on factors that determine whether taxi drivers refuse or accept calls from customers are needed. This study investigated why taxi drivers might refuse calls from customers and factors that influence the success of matching within the service. This study used origin-destination data in Seoul and Daejeon obtained from T-map Taxis, which was analyzed via a decision tree using machine learning. Cross-validation was also performed. Results showed that distance, socio-economic features, and land uses affected matching success rate. Furthermore, distance was the most important factor in both Seoul and Daejeon. The matching success rate in Seoul was lowest for trips shorter than the average at midnight. In Daejeon, the rate was lowest when the calls were made for trips either shorter or longer than the average distance. This study showed that the matching success for ride-hailing services can be differentiated particularly by the distance of the requested trip depending on the size of the city.

Download Full-text

A Machine Learning Method for Identifying Lung Cancer Based on Routine Blood Indices: Qualitative Feasibility Study

JMIR Medical Informatics ◽

10.2196/13476 ◽

2019 ◽

Vol 7 (3) ◽

pp. e13476 ◽

Cited By ~ 3

Author(s):

Jiangpeng Wu ◽

Xiangyi Zan ◽

Liping Gao ◽

Jianhong Zhao ◽

Jing Fan ◽

...

Keyword(s):

Machine Learning ◽

Lung Cancer ◽

Cross Validation ◽

Clinical Symptoms ◽

Machine Learning Method ◽

Learning Method ◽

Liquid Biopsies ◽

Blood Indices ◽

Routine Blood ◽

Identification Model

Background Liquid biopsies based on blood samples have been widely accepted as a diagnostic and monitoring tool for cancers, but extremely high sensitivity is frequently needed due to the very low levels of the specially selected DNA, RNA, or protein biomarkers that are released into blood. However, routine blood indices tests are frequently ordered by physicians, as they are easy to perform and are cost effective. In addition, machine learning is broadly accepted for its ability to decipher complicated connections between multiple sets of test data and diseases. Objective The aim of this study is to discover the potential association between lung cancer and routine blood indices and thereby help clinicians and patients to identify lung cancer based on these routine tests. Methods The machine learning method known as Random Forest was adopted to build an identification model between routine blood indices and lung cancer that would determine if they were potentially linked. Ten-fold cross-validation and further tests were utilized to evaluate the reliability of the identification model. Results In total, 277 patients with 49 types of routine blood indices were included in this study, including 183 patients with lung cancer and 94 patients without lung cancer. Throughout the course of the study, there was correlation found between the combination of 19 types of routine blood indices and lung cancer. Lung cancer patients could be identified from other patients, especially those with tuberculosis (which usually has similar clinical symptoms to lung cancer), with a sensitivity, specificity and total accuracy of 96.3%, 94.97% and 95.7% for the cross-validation results, respectively. This identification method is called the routine blood indices model for lung cancer, and it promises to be of help as a tool for both clinicians and patients for the identification of lung cancer based on routine blood indices. Conclusions Lung cancer can be identified based on the combination of 19 types of routine blood indices, which implies that artificial intelligence can find the connections between a disease and the fundamental indices of blood, which could reduce the necessity of costly, elaborate blood test techniques for this purpose. It may also be possible that the combination of multiple indices obtained from routine blood tests may be connected to other diseases as well.

Download Full-text

North American Hardwoods Identification Using Machine-Learning

Forests ◽

10.3390/f11030298 ◽

2020 ◽

Vol 11 (3) ◽

pp. 298 ◽

Cited By ~ 2

Author(s):

Dercilio Junior Verly Lopes ◽

Greg W. Burgreen ◽

Edward D. Entsminger

Keyword(s):

Machine Learning ◽

North American ◽

Mobile Application ◽

Cross Validation ◽

Data Augmentation ◽

Technical Note ◽

Machine Learning Method ◽

Training Set ◽

Hardwood Species ◽

Fold Cross Validation

This technical note determines the feasibility of using an InceptionV4_ResNetV2 convolutional neural network (CNN) to correctly identify hardwood species from macroscopic images. The method is composed of a commodity smartphone fitted with a 14× macro lens for photography. The end-grains of ten different North American hardwood species were photographed to create a dataset of 1869 images. The stratified 5-fold cross-validation machine-learning method was used, in which the number of testing samples varied from 341 to 342. Data augmentation was performed on-the-fly for each training set by rotating, zooming, and flipping images. It was found that the CNN could correctly identify hardwood species based on macroscopic images of its end-grain with an adjusted accuracy of 92.60%. With the current growing of machine-learning field, this model can then be readily deployed in a mobile application for field wood identification.

Download Full-text

A Machine Learning Method for Identifying Lung Cancer Based on Routine Blood Indices: Qualitative Feasibility Study (Preprint)

10.2196/preprints.13476 ◽

2019 ◽

Author(s):

Jiangpeng Wu ◽

Xiangyi Zan ◽

Liping Gao ◽

Jianhong Zhao ◽

Jing Fan ◽

...

Keyword(s):

Machine Learning ◽

Lung Cancer ◽

Cross Validation ◽

Clinical Symptoms ◽

Machine Learning Method ◽

Learning Method ◽

Liquid Biopsies ◽

Blood Indices ◽

Routine Blood ◽

Identification Model

BACKGROUND Liquid biopsies based on blood samples have been widely accepted as a diagnostic and monitoring tool for cancers, but extremely high sensitivity is frequently needed due to the very low levels of the specially selected DNA, RNA, or protein biomarkers that are released into blood. However, routine blood indices tests are frequently ordered by physicians, as they are easy to perform and are cost effective. In addition, machine learning is broadly accepted for its ability to decipher complicated connections between multiple sets of test data and diseases. OBJECTIVE The aim of this study is to discover the potential association between lung cancer and routine blood indices and thereby help clinicians and patients to identify lung cancer based on these routine tests. METHODS The machine learning method known as Random Forest was adopted to build an identification model between routine blood indices and lung cancer that would determine if they were potentially linked. Ten-fold cross-validation and further tests were utilized to evaluate the reliability of the identification model. RESULTS In total, 277 patients with 49 types of routine blood indices were included in this study, including 183 patients with lung cancer and 94 patients without lung cancer. Throughout the course of the study, there was correlation found between the combination of 19 types of routine blood indices and lung cancer. Lung cancer patients could be identified from other patients, especially those with tuberculosis (which usually has similar clinical symptoms to lung cancer), with a sensitivity, specificity and total accuracy of 96.3%, 94.97% and 95.7% for the cross-validation results, respectively. This identification method is called the routine blood indices model for lung cancer, and it promises to be of help as a tool for both clinicians and patients for the identification of lung cancer based on routine blood indices. CONCLUSIONS Lung cancer can be identified based on the combination of 19 types of routine blood indices, which implies that artificial intelligence can find the connections between a disease and the fundamental indices of blood, which could reduce the necessity of costly, elaborate blood test techniques for this purpose. It may also be possible that the combination of multiple indices obtained from routine blood tests may be connected to other diseases as well.

Download Full-text

Predicting Risky and Aggressive Driving Behavior among Taxi Drivers: Do Spatio-Temporal Attributes Matter?

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph17113937 ◽

2020 ◽

Vol 17 (11) ◽

pp. 3937 ◽

Cited By ~ 6

Author(s):

Muhammad Zahid ◽

Yangzhou Chen ◽

Sikandar Khan ◽

Arshad Jamal ◽

Muhammad Ijaz ◽

...

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Significant Proportion ◽

Driving Behavior ◽

Evaluation Metrics ◽

Aggressive Driving ◽

Taxi Drivers ◽

Traffic Violations ◽

The City ◽

Stack Model

Risky and aggressive driving maneuvers are considered a significant indicator for traffic accident occurrence as well as they aggravate their severity. Traffic violations caused by such uncivilized driving behavior is a global issue. Studies in existing literature have used statistical analysis methods to explore key contributing factors toward aggressive driving and traffic violations. However, such methods are unable to capture latent correlations among predictor variables, and they also suffer from low prediction accuracies. This study aimed to comprehensively investigate different traffic violations using spatial analysis and machine learning methods in the city of Luzhou, China. Violations committed by taxi drivers are the focus of the current study since they constitute a significant proportion of total violations reported in the city. Georeferenced violation data for the year 2016 was obtained from the traffic police department. Detailed descriptive analysis is presented to summarize key statistics about various violation types. Results revealed that over-speeding was the most prevalent violation type observed in the study area. Frequency-based nearest neighborhood cluster methods in Arc map Geographic Information System (GIS) were used to develop hotspot maps for different violation types that are vital for prioritizing and conducting treatment alternatives efficiently. Finally, different machine learning (ML) methods, including decision tree, AdaBoost with a base estimator decision tree, and stack model, were employed to predict and classify each violation type. The proposed methods were compared based on different evaluation metrics like accuracy, F-1 measure, specificity, and log loss. Prediction results demonstrated the adequacy and robustness of proposed machine learning (ML) methods. However, a detailed comparative analysis showed that the stack model outperformed other models in terms of proposed evaluation metrics.

Download Full-text

Predicting Disease Related microRNA Based on Similarity and Topology

Cells ◽

10.3390/cells8111405 ◽

2019 ◽

Vol 8 (11) ◽

pp. 1405 ◽

Cited By ~ 2

Author(s):

Zhihua Chen ◽

Xinke Wang ◽

Peng Gao ◽

Hongju Liu ◽

Bosheng Song

Keyword(s):

Machine Learning ◽

Cross Validation ◽

Lung Neoplasm ◽

Area Under The Curve ◽

Usual Method ◽

Machine Learning Method ◽

And Topology ◽

Machine Learning Model ◽

Topology Information

It is known that many diseases are caused by mutations or abnormalities in microRNA (miRNA). The usual method to predict miRNA disease relationships is to build a high-quality similarity network of diseases and miRNAs. All unobserved associations are ranked by their similarity scores, such that a higher score indicates a greater probability of a potential connection. However, this approach does not utilize information within the network. Therefore, in this study, we propose a machine learning method, called STIM, which uses network topology information to predict disease–miRNA associations. In contrast to the conventional approach, STIM constructs features according to information on similarity and topology in networks and then uses a machine learning model to predict potential associations. To verify the reliability and accuracy of our method, we compared STIM to other classical algorithms. The results of fivefold cross validation demonstrated that STIM outperforms many existing methods, particularly in terms of the area under the curve. In addition, the top 30 candidate miRNAs recommended by STIM in a case study of lung neoplasm have been confirmed in previous experiments, which proved the validity of the method.

Download Full-text

Multi-parametric MRI-based radiomics signature for discriminating between clinically significant and insignificant prostate cancer: Cross-validation of a machine learning method

European Journal of Radiology ◽

10.1016/j.ejrad.2019.03.010 ◽

2019 ◽

Vol 115 ◽

pp. 16-21 ◽

Cited By ~ 21

Author(s):

Xiangde Min ◽

Min Li ◽

Di Dong ◽

Zhaoyan Feng ◽

Peipei Zhang ◽

...

Keyword(s):

Prostate Cancer ◽

Machine Learning ◽

Cross Validation ◽

Machine Learning Method ◽

Learning Method ◽

Clinically Significant ◽

Radiomics Signature ◽

Insignificant Prostate Cancer

Download Full-text

METHOD SUGGESTING CITY WALKING ROUTES FOR PEDESTRIANS USING AN EXAMPLE OF SAINT-PETERSBURG

Informatization and communication ◽

10.34219/2078-8320-2019-10-3-71-76 ◽

2019 ◽

pp. 71-76

Author(s):

A.E. Semenov

Keyword(s):

Random Forest ◽

Develop Model ◽

Random Forest Algorithm ◽

Saint Petersburg ◽

Pedestrian Navigation ◽

Factors Influencing ◽

The City

The method of pedestrian navigation in the cities illustrated by the example of Saint-Petersburg was investigated. The factors influencing people when they choose a route for their walk were determined. Based on acquired factors corresponding data was collected and used to develop model determining attractiveness of a street in the city using Random Forest algorithm. The results obtained shows that routes provided by the method are 14% more attractive and just 6% longer compared with the shortest ones.

Download Full-text

Speech Organ Contour Extraction Using Real-Time MRI and Machine Learning Method

10.21437/interspeech.2019-1593 ◽

2019 ◽

Author(s):

Hironori Takemoto ◽

Tsubasa Goto ◽

Yuya Hagihara ◽

Sayaka Hamanaka ◽

Tatsuya Kitamura ◽

...

Keyword(s):

Machine Learning ◽

Real Time ◽

Machine Learning Method ◽

Learning Method ◽

Contour Extraction

Download Full-text

What Makes a City Bikeable? A Study of Intercity and Intracity Patterns of Bicycle Ridership using Mobike Big Data Records

Built Environment ◽

10.2148/benv.46.1.55 ◽

2020 ◽

Vol 46 (1) ◽

pp. 55-75

Author(s):

Ying Long ◽

Jianting Zhao

Keyword(s):

Big Data ◽

Average Distance ◽

Policy Implications ◽

Rating System ◽

City Centre ◽

Average Score ◽

Policy Makers ◽

Usage Frequency ◽

Share Data ◽

The City

This paper examines how mass ridership data can help describe cities from the bikers' perspective. We explore the possibility of using the data to reveal general bikeability patterns in 202 major Chinese cities. This process is conducted by constructing a bikeability rating system, the Mobike Riding Index (MRI), to measure bikeability in terms of usage frequency and the built environment. We first investigated mass ridership data and relevant supporting data; we then established the MRI framework and calculated MRI scores accordingly. This study finds that people tend to ride shared bikes at speeds close to 10 km/h for an average distance of 2 km roughly three times a day. The MRI results show that at the street level, the weekday and weekend MRI distributions are analogous, with an average score of 49.8 (range 0–100). At the township level, high-scoring townships are those close to the city centre; at the city level, the MRI is unevenly distributed, with high-MRI cities along the southern coastline or in the middle inland area. These patterns have policy implications for urban planners and policy-makers. This is the first and largest-scale study to incorporate mobile bike-share data into bikeability measurements, thus laying the groundwork for further research.

Download Full-text

Prediction of K562 Cells Functional Inhibitors Based on Machine Learning Approaches

Current Pharmaceutical Design ◽

10.2174/1381612825666191107092214 ◽

2020 ◽

Vol 25 (40) ◽

pp. 4296-4302 ◽

Cited By ~ 2

Author(s):

Yuan Zhang ◽

Zhenyan Han ◽

Qian Gao ◽

Xiaoyi Bai ◽

Chi Zhang ◽

...

Keyword(s):

Machine Learning ◽

Inclusion Bodies ◽

Cross Validation ◽

Independent Set ◽

K562 Cells ◽

Machine Learning Algorithms ◽

Learning Approaches ◽

Validation Test ◽

Excess Number ◽

Fold Cross Validation

Background: β thalassemia is a common monogenic genetic disease that is very harmful to human health. The disease arises is due to the deletion of or defects in β-globin, which reduces synthesis of the β-globin chain, resulting in a relatively excess number of α-chains. The formation of inclusion bodies deposited on the cell membrane causes a decrease in the ability of red blood cells to deform and a group of hereditary haemolytic diseases caused by massive destruction in the spleen. Methods: In this work, machine learning algorithms were employed to build a prediction model for inhibitors against K562 based on 117 inhibitors and 190 non-inhibitors. Results: The overall accuracy (ACC) of a 10-fold cross-validation test and an independent set test using Adaboost were 83.1% and 78.0%, respectively, surpassing Bayes Net, Random Forest, Random Tree, C4.5, SVM, KNN and Bagging. Conclusion: This study indicated that Adaboost could be applied to build a learning model in the prediction of inhibitors against K526 cells.

Download Full-text