scholarly journals An Innovative Machine Learning Approach to Predict the Dietary Fiber Content of Packaged Foods

Nutrients ◽  
2021 ◽  
Vol 13 (9) ◽  
pp. 3195
Author(s):  
Tazman Davies ◽  
Jimmy Chun Yu Louie ◽  
Tailane Scapin ◽  
Simone Pettigrew ◽  
Jason HY Wu ◽  
...  

Underconsumption of dietary fiber is prevalent worldwide and is associated with multiple adverse health conditions. Despite the importance of fiber, the labeling of fiber content on packaged foods and beverages is voluntary in most countries, making it challenging for consumers and policy makers to monitor fiber consumption. Here, we developed a machine learning approach for automated and systematic prediction of fiber content using nutrient information commonly available on packaged products. An Australian packaged food dataset with known fiber content information was divided into training (n = 8986) and test datasets (n = 2455). Utilization of a k-nearest neighbors machine learning algorithm explained a greater proportion of variance in fiber content than an existing manual fiber prediction approach (R2 = 0.84 vs. R2 = 0.68). Our findings highlight the opportunity to use machine learning to efficiently predict the fiber content of packaged products on a large scale.

2021 ◽  
Author(s):  
Tazman Davies ◽  
Jimmy Chun Yu Louie ◽  
Rhoda Ndanuko ◽  
Sebastiano Barbieri ◽  
Oscar Perez-Concha ◽  
...  

Abstract Background Dietary guidelines recommend limiting the intake of added sugars. However, despite the public health importance, most countries have not mandated the labeling of added sugar content on packaged foods and beverages, making it difficult for consumers to avoid products with added sugar, and limiting the ability of policymakers to identify priority products for intervention. Objective To develop a machine learning approach for the prediction of added sugar content in packaged products using available nutrient, ingredient, and food category information. Design The added sugar prediction algorithm was developed using k-Nearest Neighbors (KNN) and packaged food information from the US Label Insight dataset (n = 70,522). A synthetic dataset of Australian packaged products (n = 500) was used to assess validity and generalization. Performance metrics included the coefficient of determination (R2), mean absolute error (MAE), and Spearman rank correlation (ρ). To benchmark the KNN approach, the KNN approach was compared to an existing added sugar prediction approach that relies on a series of manual steps. Results Compared to the existing added sugar prediction approach, the KNN approach was similarly apt at explaining variation in added sugar content (R2 = 0.96 vs. 0.97 respectively) and ranking products from highest to lowest in added sugar content (ρ = 0.91 vs. 0.93 respectively), while less apt at minimizing absolute deviations between predicted and true values (MAE = 1.68 g vs. 1.26 g per 100 g or 100 mL respectively). Conclusions KNN can be used to predict added sugar content in packaged products with a high degree of validity. Being automated, KNN can easily be applied to large datasets. Such predicted added sugar levels can be used to monitor the food supply and inform interventions aimed at reducing added sugar intake.


2019 ◽  
Author(s):  
Anton Levitan ◽  
Andrew N. Gale ◽  
Emma K. Dallon ◽  
Darby W. Kozan ◽  
Kyle W. Cunningham ◽  
...  

ABSTRACTIn vivo transposon mutagenesis, coupled with deep sequencing, enables large-scale genome-wide mutant screens for genes essential in different growth conditions. We analyzed six large-scale studies performed on haploid strains of three yeast species (Saccharomyces cerevisiae, Schizosaccaromyces pombe, and Candida albicans), each mutagenized with two of three different heterologous transposons (AcDs, Hermes, and PiggyBac). Using a machine-learning approach, we evaluated the ability of the data to predict gene essentiality. Important data features included sufficient numbers and distribution of independent insertion events. All transposons showed some bias in insertion site preference because of jackpot events, and preferences for specific insertion sequences and short-distance vs long-distance insertions. For PiggyBac, a stringent target sequence limited the ability to predict essentiality in genes with few or no target sequences. The machine learning approach also robustly predicted gene function in less well-studied species by leveraging cross-species orthologs. Finally, comparisons of isogenic diploid versus haploid S. cerevisiae isolates identified several genes that are haplo-insufficient, while most essential genes, as expected, were recessive. We provide recommendations for the choice of transposons and the inference of gene essentiality in genome-wide studies of eukaryotic haploid microbes such as yeasts, including species that have been less amenable to classical genetic studies.


2021 ◽  
Vol 2090 (1) ◽  
pp. 012115
Author(s):  
Eraldo Pereira Marinho

Abstract It is presented a machine learning approach to find the optimal anisotropic SPH kernel, whose compact support consists of an ellipsoid that matches with the convex hull of the self-regulating k-nearest neighbors of the smoothing particle (query).


Entropy ◽  
2019 ◽  
Vol 21 (10) ◽  
pp. 1015 ◽  
Author(s):  
Carles Bretó ◽  
Priscila Espinosa ◽  
Penélope Hernández ◽  
Jose M. Pavía

This paper applies a Machine Learning approach with the aim of providing a single aggregated prediction from a set of individual predictions. Departing from the well-known maximum-entropy inference methodology, a new factor capturing the distance between the true and the estimated aggregated predictions presents a new problem. Algorithms such as ridge, lasso or elastic net help in finding a new methodology to tackle this issue. We carry out a simulation study to evaluate the performance of such a procedure and apply it in order to forecast and measure predictive ability using a dataset of predictions on Spanish gross domestic product.


Author(s):  
B.D. Britt ◽  
T. Glagowski

AbstractThis paper describes current research toward automating the redesign process. In redesign, a working design is altered to meet new problem specifications. This process is complicated by interactions between different parts of the design, and many researchers have addressed these issues. An overview is given of a large design tool under development, the Circuit Designer's Apprentice. This tool integrates various techniques for reengineering existing circuits so that they meet new circuit requirements. The primary focus of the paper is one particular technique being used to reengineer circuits when they cannot be transformed to meet the new problem requirements. In these cases, a design plan is automatically generated for the circuit, and then replayed to solve all or part of the new problem. This technique is based upon the derivational analogy approach to design reuse. Derivational Analogy is a machine learning algorithm in which a design plan is saved at the time of design so that it can be replayed on a new design problem. Because design plans were not saved for the circuits available to the Circuit Designer's Apprentice, an algorithm was developed that automatically reconstructs a design plan for any circuit. This algorithm, Reconstructive Derivational Analogy, is described in detail, including a quantitative analysis of the implementation of this algorithm.


2021 ◽  
Author(s):  
Diti Roy ◽  
Md. Ashiq Mahmood ◽  
Tamal Joyti Roy

<p>Heart Disease is the most dominating disease which is taking a large number of deaths every year. A report from WHO in 2016 portrayed that every year at least 17 million people die of heart disease. This number is gradually increasing day by day and WHO estimated that this death toll will reach the summit of 75 million by 2030. Despite having modern technology and health care system predicting heart disease is still beyond limitations. As the Machine Learning algorithm is a vital source predicting data from available data sets we have used a machine learning approach to predict heart disease. We have collected data from the UCI repository. In our study, we have used Random Forest, Zero R, Voted Perceptron, K star classifier. We have got the best result through the Random Forest classifier with an accuracy of 97.69.<i><b></b></i></p> <p><b> </b></p>


Author(s):  
Ganesh K. Shinde

Abstract: Most important part of information gathering is to focus on how people think. There are so many opinion resources such as online review sites and personal blogs are available. In this paper we focused on the Twitter. Twitter allow user to express his opinion on variety of entities. We performed sentiment analysis on tweets using Text Mining methods such as Lexicon and Machine Learning Approach. We performed Sentiment Analysis in two steps, first by searching the polarity words from the pool of words that are already predefined in lexicon dictionary and in Second step training the machine learning algorithm using polarities given in the first step. Keywords: Sentiment analysis, Social Media, Twitter, Lexicon Dictionary, Machine Learning Classifiers, SVM.


PLoS ONE ◽  
2020 ◽  
Vol 15 (11) ◽  
pp. e0241239
Author(s):  
Kai On Wong ◽  
Osmar R. Zaïane ◽  
Faith G. Davis ◽  
Yutaka Yasui

Background Canada is an ethnically-diverse country, yet its lack of ethnicity information in many large databases impedes effective population research and interventions. Automated ethnicity classification using machine learning has shown potential to address this data gap but its performance in Canada is largely unknown. This study conducted a large-scale machine learning framework to predict ethnicity using a novel set of name and census location features. Methods Using census 1901, the multiclass and binary class classification machine learning pipelines were developed. The 13 ethnic categories examined were Aboriginal (First Nations, Métis, Inuit, and all-combined)), Chinese, English, French, Irish, Italian, Japanese, Russian, Scottish, and others. Machine learning algorithms included regularized logistic regression, C-support vector, and naïve Bayes classifiers. Name features consisted of the entire name string, substrings, double-metaphones, and various name-entity patterns, while location features consisted of the entire location string and substrings of province, district, and subdistrict. Predictive performance metrics included sensitivity, specificity, positive predictive value, negative predictive value, F1, Area Under the Curve for Receiver Operating Characteristic curve, and accuracy. Results The census had 4,812,958 unique individuals. For multiclass classification, the highest performance achieved was 76% F1 and 91% accuracy. For binary classifications for Chinese, French, Italian, Japanese, Russian, and others, the F1 ranged 68–95% (median 87%). The lower performance for English, Irish, and Scottish (F1 ranged 63–67%) was likely due to their shared cultural and linguistic heritage. Adding census location features to the name-based models strongly improved the prediction in Aboriginal classification (F1 increased from 50% to 84%). Conclusions The automated machine learning approach using only name and census location features can predict the ethnicity of Canadians with varying performance by specific ethnic categories.


Sign in / Sign up

Export Citation Format

Share Document