scholarly journals Predicting Writing Styles of Online Materials for Children’s Health Education Using Machine-Learning Assisted Selection of Semantic Features (Preprint)

10.2196/30115 ◽  
2021 ◽  
Author(s):  
Wenxiu Xie ◽  
Meng Ji ◽  
Yanmeng Liu ◽  
Tianyong Hao ◽  
Chi-Yin Chow
2016 ◽  
Vol 24 (4) ◽  
pp. 69-80 ◽  
Author(s):  
Li-Ling Liao ◽  
Chieh-Hsing Liu ◽  
Chi-Chia Cheng ◽  
Tzu-Chau Chang

Background: Health literacy is related to health inequality, health behaviors, and health status. Globally, health literacy has primarily focused on adults and has been based on the medical model. It is necessary to understand children’s life experiences as they relate to health; thus, this study attempted to evaluate and describe the health literacy abilities of sixth-graders in Taiwan. Methods: Interviews were conducted with 10 teachers and 11 caregivers, and focus groups were conducted with 32 children. Health literacy abilities corresponding to real-life situations were identified from life skills and the Taiwanese Curriculum Guidelines for health education. Three expert meetings were held to redefine children’s health literacy using a health promotion perspective and confirmed indicators. Results: An operational definition of three aspects of children’s health literacy and 25 abilities was proposed: 11 functional health literacy abilities (e.g. understands the connection between personal health care behaviors and health); seven interactive health literacy abilities (e.g. obtains and understands information from various channels); and seven critical health literacy abilities (e.g. analyzes the relationship between personal needs and diet choices for a balanced diet). These indicators cover 10 health education categories. Conclusions: These findings highlight the importance of understanding Taiwanese children’s health literacy, and the urgency of developing an appropriate measurement tool. The definition and indicators in this study were identified using a child-centered approach focusing on children’s real-life experiences. The result serves as a solid basis for the development of the Taiwan Children’s Health Literacy Scale, and provides information for the decision-making sector on health education.


2014 ◽  
Vol 4 (2) ◽  
Author(s):  
Maria Tan ◽  
Sandy Campbell

Books have long been recognized  resources for health literacy and healing (Fosson & Husband, 1984). Individuals with health conditions or disabilities or who are dealing with illness, disability or death among friends or loved ones, can find solace and affirmation in fictional works that depict characters coping with similar health conditions. This study asked the question “If we were to select a new collection of children’s health-related fiction in mid-2014, which books would we select and what selection criteria would we apply?”  The results of this study are a set of criteria for the selection of  current English language literary works with health-related content for the pre-kindergarten to Grade 6 (age 12) audience http://hdl.handle.net/10402/era.38842, a collection of books that are readily available to Canadian libraries - selected against these criteria http://hdl.handle.net/10402/era.38843, a special issue of the Deakin Review of Children’s Literature -  dedicated to juvenile health fiction, and book exhibits in two libraries to accompany the Deakin Review issue.


2021 ◽  
Author(s):  
Meng Ji ◽  
Yanmeng Liu ◽  
Tianyong Hao

BACKGROUND Much of current health information understandability research uses medical readability formula (MRF) to assess the cognitive difficulty of health education resources. This is based on an implicit assumption that medical domain knowledge represented by uncommon words or jargons form the sole barriers to health information access among the public. Our study challenged this by showing that for readers from non-English speaking backgrounds with higher education attainment, semantic features of English health texts rather than medical jargons can explain the lack of cognitive access of health materials among readers with better understanding of health terms, yet limited exposure to English health education materials. OBJECTIVE Our study explored combined MRF and multidimensional semantic features (MSF) for developing machine learning algorithms to predict the actual level of cognitive accessibility of English health materials on health risks and diseases for specific populations. We compare algorithms to evaluate the cognitive accessibility of specialised health information for non-native English speaker with advanced education levels yet very limited exposure to English health education environments. METHODS We used 108 semantic features to measure the content complexity and accessibility of original English resources. Using 1000 English health texts collected from international health organization websites, rated by international tertiary students, we compared machine learning (decision tree, SVM, discriminant analysis, ensemble tree and logistic regression) after automatic hyperparameter optimization (grid search for the best combination of hyperparameters of minimal classification errors). We applied 10-fold cross-validation on the whole dataset for the model training and testing, calculated the AUC, sensitivity, specificity, and accuracy as the measured of the model performance. RESULTS Using two sets of predictor features: widely tested MRF and MSF proposed in our study, we developed and compared three sets of machine learning algorithms: the first set of algorithms used MRF as predictors only, the second set of algorithms used MSF as predictors only, and the last set of algorithms used both MRF and MSF as integrated models. The results showed that the integrated models outperformed in terms of AUC, sensitivity, accuracy, and specificity. CONCLUSIONS Our study showed that cognitive accessibility of English health texts is not limited to word length and sentence length conventionally measured by MRF. We compared machine learning algorithms combing MRF and MSF to explore the cognitive accessibility of health information from syntactic and semantic perspectives. The results showed the strength of integrated models in terms of statistically increased AUC, sensitivity, and accuracy to predict health resource accessibility for the target readership, indicating that both MRF and MSF contribute to the comprehension of health information, and that for readers with advanced education, semantic features outweigh syntax and domain knowledge.


2017 ◽  
Vol 10 (4) ◽  
pp. 219-224 ◽  
Author(s):  
Kanae Watanabe ◽  
Annette Dickinson

In New Zealand and Japan, despite health education on food, exercise, and hygiene, children’s health is an important concern in preschools. This study investigated the relationship between children’s health and health education in New Zealand and Japan using a qualitative interpretative descriptive design method and semi-structured interviews with preschool teachers. Major children’s health issues identified by preschool teachers in New Zealand were asthma, allergies, and dental hygiene. Although few preschool children are overweight in New Zealand, it becomes a serious concern in primary school. Identified as a suspected cause of children’s health problems was parents providing their children with sweet and/or unhealthy foods. Preschool teachers want parents to understand and implement health education, and they stated that parents’ education was necessary. In Japan, children’s health problems identified by teachers were allergies, food preferences, and sleep deprivation. The suspected causes included too much convenience, parents’ irregular lifestyles because they were busy, and parents’ depending on preschools to discipline children in ways that should be done at home. The goals for preschool health education were similar in New Zealand and Japan. The goals should be to obtain lifelong health knowledge, an ability to make wise health-related decisions in adulthood, and healthy lifestyle choices for themselves and their families. Some children’s health issues were beyond the scope of the abilities of individual preschools. Therefore, the entire nation and government should work together to cope with children’s health issues and health education.


2021 ◽  
Author(s):  
Wenxiu Xie ◽  
Meng Ji ◽  
Tianyong Hao ◽  
Chi-Yin Chow

UNSTRUCTURED Objective: To determine the linguistic/textual features of English health educational materials for predicting the probabilistic distribution of critical conceptual mistakes in neural machine translations (Google Translate: English to Chinese) of public-oriented online health resources on infectious diseases and viruses. Methods: We collected 200 English source texts on infectious diseases and their human translations to Chinese from HON. Net certified health education websites. Human translations were compared with machine translations (Google Translate) by native Chinese speakers to identify critical conceptual mistakes. To overcome overfitting issues of machine learning with small, high-dimensional datasets, Bayesian machine learning classifiers (relevance vector machine, RVM) was trained (70% and 30% train/test data split; 5-fold cross-validation) on English source texts classified as linked or not with machine translation outputs containing critical conceptual mistakes, to identify possible source text features causing clinically significant machine translation errors. We compared the performance of RVM with the combined features through separate optimization (CFSO: 21), to RVM trained on the original combined features (OCF: 135) (20 structural; 115 semantic features), combined features through joint optimization (CFJO: 48); optimized structural features (OTF: 5), and optimized semantic features (OSF: 16). In addition, RVM (CFSO) was compared to classifiers using individual standard (currently available) parameters to measure English complexity (Flesch Reading Ease FRE; Gunning Fog Index - GFI; SMOG Readability Index-SMOG). Results: The AUC, sensitivity, specificity and accuracy of RVM MLCs trained on different features sets were: CFSO (AUC: 0.685; sensitivity: 0.73, specificity: 0.63; accuracy: 0.68); OCF (AUC: 0.7; sensitivity: 0.42, specificity: 0.8; accuracy: 0.625); CFJO (AUC: 0.690; sensitivity: 0.54, specificity: 0.73; accuracy: 0.64); OTF (AUC: 0.587; sensitivity: 0.58, specificity: 0.53; accuracy: 0.55); OSF (AUC: 0.679; sensitivity: 0.58, specificity: 0.67; accuracy: 0.625). The best-performing model was RVM trained on the combined features through separate optimisation (CFSO) (16% of the original combined features). RVM (CFSO) outperformed binary classifiers (BCs) using standard English readability tests. The accuracy, sensitivity, specificity of the three BCs were FRE (accuracy 0.457; sensitivity 0.903, specificity 0.011); GFI (accuracy 0.5735; sensitivity 0.685, specificity 0.462); SMOG (accuracy 0.568; sensitivity 0.674, specificity 0.462). Conclusion: Our study found that machine-generated Chinese medical translation errors were not caused by difficult medical jargon or a lack of readability of source language information. It was certain English structures (passive voices; sentences starting with conjunctions), semantic polysemy (different meanings of a word when used in common versus specialized domains) which tend to cause critical conceptual mistakes in neural machine translation systems (English to Chinese) of health education information on infectious diseases.


2021 ◽  
Author(s):  
Meng Ji ◽  
Yanmeng Liu ◽  
Tianyong Hao

BACKGROUND Much of current health information understandability research uses medical readability formula to assess the cognitive difficulty of health education resources. This is based on an implicit assumption that medical domain knowledge represented by uncommon words or jargons form the sole barriers to health information access among the public. Our study challenged this by showing that for readers from non-English speaking backgrounds with higher education attainment, semantic features of English health texts which underpin the knowledge structure of English health texts, rather than medical jargons can explain the cognitive accessibility of health materials among readers with better understanding of English health terms, yet very limited exposure to English-based health education environments and traditions. OBJECTIVE Our study explored multidimensional semantic features for developing machine learning algorithms to predict the perceived level of cognitive accessibility of English health materials on health risks and diseases for young adults enrolled in Australian tertiary institutes. We compared algorithms to evaluate the cognitive accessibility of health information for non-native English speaker with advanced education levels yet very limited exposure to English health education environments. METHODS We used 108 semantic features to measure the content complexity and accessibility of original English resources. Using 1000 English health texts collected from Australian and international health organization websites, rated by overseas tertiary students, we compared machine learning (decision tree, SVM, ensemble tree, logistic regression) after hyperparameter optimization (grid search for the best hyperparameter combination of minimal classification errors). We applied 10-fold cross-validation on the whole dataset for the model training and testing, calculated the AUC, sensitivity, specificity, and accuracy as the measurement of the model performance. RESULTS We developed, compared four machine learning algorithms using multidimensional semantic features as predictors. The results showed that ensemble tree (LogitBoost) outperformed in terms of AUC (0.97), sensitivity (0.966), specificity (0.972) and accuracy (0.969). Decision tree followed closely with an AUC (0.924), sensitivity (0.912), specificity (0.9358), and accuracy (0.924), and SVM with an AUC (0.8946), sensitivity (0.8952), specificity (0.894), accuracy (0.8946). Decision tree, ensemble tree, SVM achieved statistically significant improvement over logistic regression in AUC, specificity, accuracy. As the best performing algorithm, ensemble tree reached statistically significant improvement over SVM in AUC, specificity, accuracy, and a statistically significant improvement over decision tree in sensitivity. CONCLUSIONS Our study showed that cognitive accessibility of English health texts is not limited to word length and sentence length as had been conventionally measured by the medical readability formula. We compared machine learning algorithms based on semantic features to explore the cognitive accessibility of health information for non-native English speakers. The results showed the new models reached statistically increased AUC, sensitivity, and accuracy to predict health resource accessibility for the target readership. Our study illustrated that semantic features such as cognitive abilities related semantic features, communicative actions and processes, power relationships in healthcare settings, and lexical familiarity and diversity of health texts are large contributors to the comprehension of health information and that for readers such as international students, semantic features of health texts which outweigh syntax and domain knowledge.


2021 ◽  
Author(s):  
Wenxiu Xie ◽  
Meng Ji ◽  
Yanmeng Liu ◽  
Tianyong Hao ◽  
Chi-Yin Chow

BACKGROUND Suitability of health resources for specific readerships represents a critical yet underexplored area of research in health informatics, despite its importance in health literacy and health education. High relevance of health information can improve the suitability and readability of online health educational resources for young readers. It has an important role in developing the health literacy of children with increasing exposure to online health information. Existing research on health resource evaluation is limited to the analysis of the morphological and syntactic complexity. Besides, empirical instruments do not exist to evaluate the suitability of online health information for children. OBJECTIVE We aimed to develop algorithms to predict suitability of online health information for this understudied user group, using a small number of semantic features to provide accurate and convenient tools for automatic prediction of the suitability of online health information for children. METHODS Combining machine learning and linguistic insights, we identified semantic features to predict the suitability of online health information for children, as an emerging and large readership on online health information. The selection of natural language features as predicator variables of algorithms went through initial automatic feature selection using Ridge Classifier, support vector machine, extreme gradient boost, followed by revision by linguists, education experts based on effective health information design. We compared algorithms using the automatically selected features (19) and linguistically enhanced features (20), using the initial features (115) as the baseline. RESULTS Using 5-fold cross-validation, comparing with the baseline (115 features), the Gaussian Naive Bayes model (20 features) achieved statistically higher mean sensitivity (P =0.0206, 95% CI: -0.016, 0.1929); mean specificity (P = 0.0205, 95% CI: -0.016, 0.199); mean AUC (P =0.017, 95% CI: -0.007, 0.140); mean Macro F1 (P =0.0061, 95% CI: 0.016, 0.167). The statistically improved performance of the final model (20 features) stands in contrast with the statistically insignificant changes between the original feature set (115) and the automatically selected features (19): mean sensitivity (P =0.134, 95% CI: -0.1699, 0.0681), mean specificity (P = 0.1001, 95% CI: -0.1389, 0.4017); mean AUC (P =0.0082, 95% CI: 0.0059, 0.1126), and mean macro F1 (P = 0.9796, 95% CI: -0.0555, 0.0548). This demonstrates the importance and effectiveness of combing automatic feature selection and expert-based linguistic revision to develop most effective machine learning algorithms from high-dimensional datasets. CONCLUSIONS Our study developed machine learning algorithms for evaluating health information suitability for children, an important readership who is having increasing reliance on online health information for developing their health literacy. User-adaptive automatic assessment of online health contents holds much promise for distant and remote health education among young readers. Our study leveraged the precision, adaptability of machine learning algorithms and insights from health linguistics to help advance this significant yet understudied area of research.


Sign in / Sign up

Export Citation Format

Share Document