A New Information-Theoretic Method for Advertisement Conversion Rate Prediction for Large-Scale Sparse Data Based on Deep Learning

Qianchen Xia; Jianghua Lv; Shilong Ma; Bocheng Gao; Zhenhua Wang

doi:10.3390/e22060643

A New Information-Theoretic Method for Advertisement Conversion Rate Prediction for Large-Scale Sparse Data Based on Deep Learning

Entropy ◽

10.3390/e22060643 ◽

2020 ◽

Vol 22 (6) ◽

pp. 643 ◽

Cited By ~ 1

Author(s):

Qianchen Xia ◽

Jianghua Lv ◽

Shilong Ma ◽

Bocheng Gao ◽

Zhenhua Wang

Keyword(s):

Conversion Rate ◽

Large Scale ◽

Prediction Models ◽

User Behavior ◽

Online Advertising ◽

Sparse Data ◽

User Preferences ◽

Experimental Comparison ◽

Purchasing Intention ◽

Behavior Sequences

With the development of online advertising technology, the accurate targeted advertising based on user preferences is obviously more suitable both for the market and users. The amount of conversion can be properly increased by predicting the user’s purchasing intention based on the advertising Conversion Rate (CVR). According to the high-dimensional and sparse characteristics of the historical behavior sequences, this paper proposes a LSLM_LSTM model, which is for the advertising CVR prediction based on large-scale sparse data. This model aims at minimizing the loss, utilizing the Adaptive Moment Estimation (Adam) optimization algorithm to mine the nonlinear patterns hidden in the data automatically. Through the experimental comparison with a variety of typical CVR prediction models, it is found that the proposed LSLM_LSTM model can utilize the time series characteristics of user behavior sequences more effectively, as well as mine the potential relationship hidden in the features, which brings higher accuracy and trains faster compared to those with consideration of only low or high order features.

Download Full-text

Learning from user interactions with rankings

ACM SIGIR Forum ◽

10.1145/3483382.3483402 ◽

2020 ◽

Vol 54 (2) ◽

pp. 1-2

Author(s):

Harrie Oosterhuis

Keyword(s):

Supervised Learning ◽

High Performance ◽

Large Scale ◽

State Of The Art ◽

Learning To Rank ◽

User Preferences ◽

User Preference ◽

Experimental Comparison ◽

The Third ◽

Ranking Systems

Ranking systems form the basis for online search engines and recommendation services. They process large collections of items, for instance web pages or e-commerce products, and present the user with a small ordered selection. The goal of a ranking system is to help a user find the items they are looking for with the least amount of effort. Thus the rankings they produce should place the most relevant or preferred items at the top of the ranking. Learning to rank is a field within machine learning that covers methods which optimize ranking systems w.r.t. this goal. Traditional supervised learning to rank methods utilize expert-judgements to evaluate and learn, however, in many situations such judgements are impossible or infeasible to obtain. As a solution, methods have been introduced that perform learning to rank based on user clicks instead. The difficulty with clicks is that they are not only affected by user preferences, but also by what rankings were displayed. Therefore, these methods have to prevent being biased by other factors than user preference. This thesis concerns learning to rank methods based on user clicks and specifically aims to unify the different families of these methods. The first part of the thesis consists of three chapters that look at online learning to rank algorithms which learn by directly interacting with users. Its first chapter considers large scale evaluation and shows existing methods do not guarantee correctness and user experience, we then introduce a novel method that can guarantee both. The second chapter proposes a novel pairwise method for learning from clicks that contrasts with the previous prevalent dueling-bandit methods. Our experiments show that our pairwise method greatly outperforms the dueling-bandit approach. The third chapter further confirms these findings in an extensive experimental comparison, furthermore, we also show that the theory behind the dueling-bandit approach is unsound w.r.t. deterministic ranking systems. The second part of the thesis consists of four chapters that look at counterfactual learning to rank algorithms which learn from historically logged click data. Its first chapter takes the existing approach and makes it applicable to top- k settings where not all items can be displayed at once. It also shows that state-of-the-art supervised learning to rank methods can be applied in the counterfactual scenario. The second chapter introduces a method that combines the robust generalization of feature-based models with the high-performance specialization of tabular models. The third chapter looks at evaluation and introduces a method for finding the optimal logging policy that collects click data in a way that minimizes the variance of estimated ranking metrics. By applying this method during the gathering of clicks, one can turn counterfactual evaluation into online evaluation. The fourth chapter proposes a novel counterfactual estimator that considers the possibility that the logging policy has been updated during the gathering of click data. As a result, it can learn much more efficiently when deployed in an online scenario where interventions can take place. The resulting approach is thus both online and counterfactual, our experimental results show that its performance matches the state-of-the-art in both the online and the counterfactual scenario. As a whole, the second part of this thesis proposes a framework that bridges many gaps between areas of online, counterfactual, and supervised learning to rank. It has taken approaches, previously considered independent, and unified them into a single methodology for widely applicable and effective learning to rank from user clicks. Awarded by: University of Amsterdam, Amsterdam, The Netherlands. Supervised by: Maarten de Rijke. Available at: https://hdl.handle.net/11245.1/8ff3aa38-97fb-4d2a-8127-a29a03af4d5c.

Download Full-text

Bioactivity Prediction Based on Matched Molecular Pair and Matched Molecular Series Methods

Current Pharmaceutical Design ◽

10.2174/1381612826666200427111309 ◽

2020 ◽

Vol 26 (33) ◽

pp. 4195-4205

Author(s):

Xiaoyu Ding ◽

Chen Cui ◽

Dingyan Wang ◽

Jihui Zhao ◽

Mingyue Zheng ◽

...

Keyword(s):

Prediction Model ◽

Large Scale ◽

Prediction Models ◽

Predictive Accuracy ◽

Lead Optimization ◽

Consensus Method ◽

Molecular Pair ◽

Bioactivity Prediction ◽

Compound Synthesis ◽

Consensus Modeling

Background: Enhancing a compound’s biological activity is the central task for lead optimization in small molecules drug discovery. However, it is laborious to perform many iterative rounds of compound synthesis and bioactivity tests. To address the issue, it is highly demanding to develop high quality in silico bioactivity prediction approaches, to prioritize such more active compound derivatives and reduce the trial-and-error process. Methods: Two kinds of bioactivity prediction models based on a large-scale structure-activity relationship (SAR) database were constructed. The first one is based on the similarity of substituents and realized by matched molecular pair analysis, including SA, SA_BR, SR, and SR_BR. The second one is based on SAR transferability and realized by matched molecular series analysis, including Single MMS pair, Full MMS series, and Multi single MMS pairs. Moreover, we also defined the application domain of models by using the distance-based threshold. Results: Among seven individual models, Multi single MMS pairs bioactivity prediction model showed the best performance (R2 = 0.828, MAE = 0.406, RMSE = 0.591), and the baseline model (SA) produced the most lower prediction accuracy (R2 = 0.798, MAE = 0.446, RMSE = 0.637). The predictive accuracy could further be improved by consensus modeling (R2 = 0.842, MAE = 0.397 and RMSE = 0.563). Conclusion: An accurate prediction model for bioactivity was built with a consensus method, which was superior to all individual models. Our model should be a valuable tool for lead optimization.

Download Full-text

Is Facebook an effective tool to access foreign markets? Evidence from international export performance of fashion firms

Journal of Management & Governance ◽

10.1007/s10997-021-09572-y ◽

2021 ◽

Author(s):

Alice Mazzucchelli ◽

Roberto Chierici ◽

Angelo Di Gregorio ◽

Claudio Chiacchierini

Keyword(s):

International Business ◽

Conversion Rate ◽

Online Advertising ◽

Export Performance ◽

Foreign Markets ◽

Firm Level ◽

Brand Communities ◽

Social Crm ◽

Body Of Knowledge ◽

Physical Presence

AbstractSocial networks are a driving force of digital transformation and offer firms the opportunity to market products and services to both international consumers and providers, establish durable relationships with them, and improve their own competitiveness. The study analyzes the role played by the use of Facebook for online advertising, building interaction and brand communities, implementing social CRM activities, and conducting market research, as well as a sales channel alternative to physical presence, in firms’ international export performance, both in terms of managers’ perceptions and Facebook buy button conversion rate. A survey-based empirical analysis of 105 fashion firms operating worldwide was conducted. The results of multiple regression analyses show that building conversations and brand communities positively affects international export performance, while advertising via Facebook yields mixed results. By comparing firms that have a physical presence with those that do not, the former turned out to benefit from especially in-store advertising and promotions to enhance their Facebook buy button conversion rate; while the latter can improve their performance mainly by adopting outdoor and transit advertising and digital marketing. The research contributes to the existing body of knowledge on social media marketing and international business and, by adopting a firm-level perspective, provides interesting insights for practitioners since it allows to understand how to develop an effective Facebook strategy to succeed in foreign markets.

Download Full-text

Cerebrospinal fluid metabolomics identifies 19 brain-related phenotype associations

Communications Biology ◽

10.1038/s42003-020-01583-z ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Daniel J. Panyard ◽

Kyeong Mo Kim ◽

Burcu F. Darst ◽

Yuetiva K. Deming ◽

Xiaoyuan Zhong ◽

...

Keyword(s):

Cerebrospinal Fluid ◽

Drug Targets ◽

Large Scale ◽

Prediction Models ◽

Genome Wide Association ◽

Large Samples ◽

Genome Wide ◽

Metabolomic Data ◽

Related Phenotype ◽

Omic Data

AbstractThe study of metabolomics and disease has enabled the discovery of new risk factors, diagnostic markers, and drug targets. For neurological and psychiatric phenotypes, the cerebrospinal fluid (CSF) is of particular importance. However, the CSF metabolome is difficult to study on a large scale due to the relative complexity of the procedure needed to collect the fluid. Here, we present a metabolome-wide association study (MWAS), which uses genetic and metabolomic data to impute metabolites into large samples with genome-wide association summary statistics. We conduct a metabolome-wide, genome-wide association analysis with 338 CSF metabolites, identifying 16 genotype-metabolite associations (metabolite quantitative trait loci, or mQTLs). We then build prediction models for all available CSF metabolites and test for associations with 27 neurological and psychiatric phenotypes, identifying 19 significant CSF metabolite-phenotype associations. Our results demonstrate the feasibility of MWAS to study omic data in scarce sample types.

Download Full-text

57 Precision neoantigen discovery using novel algorithms and expanded HLA-ligandome datasets

Journal for ImmunoTherapy of Cancer ◽

10.1136/jitc-2020-sitc2020.0057 ◽

2020 ◽

Vol 8 (Suppl 3) ◽

pp. A62-A62

Author(s):

Dattatreya Mellacheruvu ◽

Rachel Pyke ◽

Charles Abbott ◽

Nick Phillips ◽

Sejal Desai ◽

...

Keyword(s):

Machine Learning ◽

Cell Lines ◽

Antigen Processing ◽

Large Scale ◽

Prediction Models ◽

K562 Cells ◽

Machine Learning Algorithms ◽

Training Data ◽

High Quality ◽

Tissue Samples

BackgroundAccurately identified neoantigens can be effective therapeutic agents in both adjuvant and neoadjuvant settings. A key challenge for neoantigen discovery has been the availability of accurate prediction models for MHC peptide presentation. We have shown previously that our proprietary model based on (i) large-scale, in-house mono-allelic data, (ii) custom features that model antigen processing, and (iii) advanced machine learning algorithms has strong performance. We have extended upon our work by systematically integrating large quantities of high-quality, publicly available data, implementing new modelling algorithms, and rigorously testing our models. These extensions lead to substantial improvements in performance and generalizability. Our algorithm, named Systematic HLA Epitope Ranking Pan Algorithm (SHERPA™), is integrated into the ImmunoID NeXT Platform®, our immuno-genomics and transcriptomics platform specifically designed to enable the development of immunotherapies.MethodsIn-house immunopeptidomic data was generated using stably transfected HLA-null K562 cells lines that express a single HLA allele of interest, followed by immunoprecipitation using W6/32 antibody and LC-MS/MS. Public immunopeptidomics data was downloaded from repositories such as MassIVE and processed uniformly using in-house pipelines to generate peptide lists filtered at 1% false discovery rate. Other metrics (features) were either extracted from source data or generated internally by re-processing samples utilizing the ImmunoID NeXT Platform.ResultsWe have generated large-scale and high-quality immunopeptidomics data by using approximately 60 mono-allelic cell lines that unambiguously assign peptides to their presenting alleles to create our primary models. Briefly, our primary ‘binding’ algorithm models MHC-peptide binding using peptide and binding pockets while our primary ‘presentation’ model uses additional features to model antigen processing and presentation. Both primary models have significantly higher precision across all recall values in multiple test data sets, including mono-allelic cell lines and multi-allelic tissue samples. To further improve the performance of our model, we expanded the diversity of our training set using high-quality, publicly available mono-allelic immunopeptidomics data. Furthermore, multi-allelic data was integrated by resolving peptide-to-allele mappings using our primary models. We then trained a new model using the expanded training data and a new composite machine learning architecture. The resulting secondary model further improves performance and generalizability across several tissue samples.ConclusionsImproving technologies for neoantigen discovery is critical for many therapeutic applications, including personalized neoantigen vaccines, and neoantigen-based biomarkers for immunotherapies. Our new and improved algorithm (SHERPA) has significantly higher performance compared to a state-of-the-art public algorithm and furthers this objective.

Download Full-text

A Novel Stable Algorithm of Getting User’s Account Based on Telnet Protocol

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.299.130 ◽

2013 ◽

Vol 299 ◽

pp. 130-134

Author(s):

Li Wei ◽

Da Zhi Deng

Keyword(s):

Network Security ◽

Network Management ◽

Large Scale ◽

Computer Network ◽

User Behavior ◽

Political Stability ◽

Future Internet ◽

Stable Algorithm ◽

Internet User Behavior ◽

Security Incidents

In recent years,china input in the construction of the network management is constantly increasing;information technology has improved continuously,but,making a variety of network security incidents occur frequently,due to the vulnerability of the computer network system inherent,a direct impact on national security and social and political stability. Because of the popularity of computers and large-scale development of the Internet, network security has been increasing as the theme. Reasonable safeguards against violations of resources; regular Internet user behavior and so on has been the public's expectations of future Internet. This paper described a stable method of getting telnet user’s account in development of network management based on telnet protocol.

Download Full-text

45 Application of Machine Learning Models to Thermal Burn Patient Outcome Predictions in the Aftermath of a Nuclear Event

Journal of Burn Care & Research ◽

10.1093/jbcr/irab032.049 ◽

2021 ◽

Vol 42 (Supplement_1) ◽

pp. S33-S34

Author(s):

Morgan A Taylor ◽

Randy D Kearns ◽

Jeffrey E Carter ◽

Mark H Ebell ◽

Curt A Harris

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Length Of Stay ◽

Regression Models ◽

Large Scale ◽

Prediction Models ◽

Burn Patients ◽

Thermal Burn ◽

Logistic Regression Models ◽

Burn Patient

Abstract Introduction A nuclear disaster would generate an unprecedented volume of thermal burn patients from the explosion and subsequent mass fires (Figure 1). Prediction models characterizing outcomes for these patients may better equip healthcare providers and other responders to manage large scale nuclear events. Logistic regression models have traditionally been employed to develop prediction scores for mortality of all burn patients. However, other healthcare disciplines have increasingly transitioned to machine learning (ML) models, which are automatically generated and continually improved, potentially increasing predictive accuracy. Preliminary research suggests ML models can predict burn patient mortality more accurately than commonly used prediction scores. The purpose of this study is to examine the efficacy of various ML methods in assessing thermal burn patient mortality and length of stay in burn centers. Methods This retrospective study identified patients with fire/flame burn etiologies in the National Burn Repository between the years 2009 – 2018. Patients were randomly partitioned into a 67%/33% split for training and validation. A random forest model (RF) and an artificial neural network (ANN) were then constructed for each outcome, mortality and length of stay. These models were then compared to logistic regression models and previously developed prediction tools with similar outcomes using a combination of classification and regression metrics. Results During the study period, 82,404 burn patients with a thermal etiology were identified in the analysis. The ANN models will likely tend to overfit the data, which can be resolved by ending the model training early or adding additional regularization parameters. Further exploration of the advantages and limitations of these models is forthcoming as metric analyses become available. Conclusions In this proof-of-concept study, we anticipate that at least one ML model will predict the targeted outcomes of thermal burn patient mortality and length of stay as judged by the fidelity with which it matches the logistic regression analysis. These advancements can then help disaster preparedness programs consider resource limitations during catastrophic incidents resulting in burn injuries.

Download Full-text

Development and validation of prediction models to estimate risk of primary total hip and knee replacements using data from the UK: two prospective open cohorts using the UK Clinical Practice Research Datalink

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2018-213894 ◽

2018 ◽

Vol 78 (1) ◽

pp. 91-99 ◽

Cited By ~ 7

Author(s):

Dahai Yu ◽

Kelvin P Jordan ◽

Kym I E Snell ◽

Richard D Riley ◽

John Bedson ◽

...

Keyword(s):

Clinical Practice ◽

Large Scale ◽

Prediction Models ◽

Cox Proportional Hazards ◽

Practice Research ◽

Clinical Practice Research Datalink ◽

Total Hip ◽

Risk Of Death ◽

Geographical Regions ◽

The Uk

ObjectivesThe ability to efficiently and accurately predict future risk of primary total hip and knee replacement (THR/TKR) in earlier stages of osteoarthritis (OA) has potentially important applications. We aimed to develop and validate two models to estimate an individual’s risk of primary THR and TKR in patients newly presenting to primary care.MethodsWe identified two cohorts of patients aged ≥40 years newly consulting hip pain/OA and knee pain/OA in the Clinical Practice Research Datalink. Candidate predictors were identified by systematic review, novel hypothesis-free ‘Record-Wide Association Study’ with replication, and panel consensus. Cox proportional hazards models accounting for competing risk of death were applied to derive risk algorithms for THR and TKR. Internal–external cross-validation (IECV) was then applied over geographical regions to validate two models.Results45 predictors for THR and 53 for TKR were identified, reviewed and selected by the panel. 301 052 and 416 030 patients newly consulting between 1992 and 2015 were identified in the hip and knee cohorts, respectively (median follow-up 6 years). The resultant model C-statistics is 0.73 (0.72, 0.73) and 0.79 (0.78, 0.79) for THR (with 20 predictors) and TKR model (with 24 predictors), respectively. The IECV C-statistics ranged between 0.70–0.74 (THR model) and 0.76–0.82 (TKR model); the IECV calibration slope ranged between 0.93–1.07 (THR model) and 0.92–1.12 (TKR model).ConclusionsTwo prediction models with good discrimination and calibration that estimate individuals’ risk of THR and TKR have been developed and validated in large-scale, nationally representative data, and are readily automated in electronic patient records.

Download Full-text

Development of a Global Fire Weather Database

Natural Hazards and Earth System Science ◽

10.5194/nhess-15-1407-2015 ◽

2015 ◽

Vol 15 (6) ◽

pp. 1407-1423 ◽

Cited By ~ 44

Author(s):

R. D. Field ◽

A. C. Spessa ◽

N. A. Aziz ◽

A. Camia ◽

A. Cantin ◽

...

Keyword(s):

Southeast Asia ◽

Large Scale ◽

Prediction Models ◽

Dry Season ◽

Weather Data ◽

Data Sets ◽

Fire Weather ◽

Fire Weather Index ◽

Mato Grosso ◽

Global Database

Abstract. The Canadian Forest Fire Weather Index (FWI) System is the mostly widely used fire danger rating system in the world. We have developed a global database of daily FWI System calculations, beginning in 1980, called the Global Fire WEather Database (GFWED) gridded to a spatial resolution of 0.5° latitude by 2/3° longitude. Input weather data were obtained from the NASA Modern Era Retrospective-Analysis for Research and Applications (MERRA), and two different estimates of daily precipitation from rain gauges over land. FWI System Drought Code calculations from the gridded data sets were compared to calculations from individual weather station data for a representative set of 48 stations in North, Central and South America, Europe, Russia, Southeast Asia and Australia. Agreement between gridded calculations and the station-based calculations tended to be most different at low latitudes for strictly MERRA-based calculations. Strong biases could be seen in either direction: MERRA DC over the Mato Grosso in Brazil reached unrealistically high values exceeding DC = 1500 during the dry season but was too low over Southeast Asia during the dry season. These biases are consistent with those previously identified in MERRA's precipitation, and they reinforce the need to consider alternative sources of precipitation data. GFWED can be used for analyzing historical relationships between fire weather and fire activity at continental and global scales, in identifying large-scale atmosphere–ocean controls on fire weather, and calibration of FWI-based fire prediction models.

Download Full-text

User Behavior Characterization of a Large-scale Mobile Live Streaming System

Proceedings of the 24th International Conference on World Wide Web - WWW '15 Companion ◽

10.1145/2740908.2743054 ◽

2015 ◽

Cited By ~ 24

Author(s):

Zhenyu Li ◽

Gaogang Xie ◽

Mohamed Ali Kaafar ◽

Kave Salamatian

Keyword(s):

Large Scale ◽

User Behavior ◽

Live Streaming

Download Full-text