scholarly journals Statistical literacy and quantitative reasoning: Rethinking the curriculum

2020 ◽  
Author(s):  
Gail Burrill ◽  

The importance of statistical literacy/quantitative reasoning has been highlighted for decades; today the need is even more compelling with data science emerging as foundational in many disciplines. Educated students should understand how to make decisions in the presence of uncertainty and how to interpret quantitative information presented to them in the course of their professional and personal activities. Too often, however, students have limited experience in thinking and reasoning based on real data. This paper explores how ideas from data science interface with notions of statistical literacy/quantitative reasoning, considers foundational concepts necessary to enable students to engage with real data sets in the learning process, and identifies potential curricular elements that are important for all students from these perspectives.

Author(s):  
Gail Burrill

Given a world awash with data, students of today will be consumers of statistical information whatever their future. What can we do to make them critical consumers as articulated by researchers such as Gal and Steen and as suggested in the National Council of Teachers of Mathematics Catalyzing Change, able to process information, ask the right questions and make informed decisions? This paper explores what it means to be statistically literate able to reason with quantitative information in today’s world and why it is important from both a personal and professional perspective. Examples from several fields illustrate features of essential core concepts that should be components of the curriculum for all students if we are to have statistically literate citizens capable of thinking and reasoning in quantitative situations. The discussion will also address some of the challenges we face in making this recommendation a reality.


Author(s):  
Li Chen ◽  
Lala Aicha Coulibaly

Data science and big data analytics are still at the center of computer science and information technology. Students and researchers not in computer science often found difficulties in real data analytics using programming languages such as Python and Scala, especially when they attempt to use Apache-Spark in cloud computing environments-Spark Scala and PySpark. At the same time, students in information technology could find it difficult to deal with the mathematical background of data science algorithms. To overcome these difficulties, this chapter will provide a practical guideline to different users in this area. The authors cover the main algorithms for data science and machine learning including principal component analysis (PCA), support vector machine (SVM), k-means, k-nearest neighbors (kNN), regression, neural networks, and decision trees. A brief description of these algorithms will be explained, and the related code will be selected to fit simple data sets and real data sets. Some visualization methods including 2D and 3D displays will be also presented in this chapter.


2019 ◽  
Vol 18 (2) ◽  
pp. es2 ◽  
Author(s):  
Melissa K. Kjelvik ◽  
Elizabeth H. Schultheis

Data are becoming increasingly important in science and society, and thus data literacy is a vital asset to students as they prepare for careers in and outside science, technology, engineering, and mathematics and go on to lead productive lives. In this paper, we discuss why the strongest learning experiences surrounding data literacy may arise when students are given opportunities to work with authentic data from scientific research. First, we explore the overlap between the fields of quantitative reasoning, data science, and data literacy, specifically focusing on how data literacy results from practicing quantitative reasoning and data science in the context of authentic data. Next, we identify and describe features that influence the complexity of authentic data sets (selection, curation, scope, size, and messiness) and implications for data-literacy instruction. Finally, we discuss areas for future research with the aim of identifying the impact that authentic data may have on student learning. These include defining desired learning outcomes surrounding data use in the classroom and identification of teaching best practices when using data in the classroom to develop students’ data-literacy abilities.


2020 ◽  
Vol 19 (1) ◽  
pp. 194-205
Author(s):  
KAREN FRANÇOIS ◽  
CARLOS MONTEIRO ◽  
PATRICK ALLO

In the contemporary society a massive amount of data is generated continuously by various means, and they are called Big-Data sets. Big Data has potential and limits which need to be understood by statisticians and statistics consumers, therefore it is a challenge to develop Big-Data Literacy to support the needs of constructive, concerned, and reflective citizens. However, the development of the concept of statistical literacy mirrors the current gap between purely technical and socio-political characterizations of Big Data. In this paper, we review the recent history of the concept of statistical literacy and highlight the need to integrate the new challenges and critical issues from data science associated with Big Data, including ethics, epistemology, mathematical justification, and math washing. First published February 2020 at Statistics Education Research Journal Archives


2017 ◽  
Vol 16 (1) ◽  
pp. 44-49 ◽  
Author(s):  
JOACHIM ENGEL

Data are abundant, quantitative information about the state of society and the wider world is around us more than ever. Paradoxically, recent trends in the public discourse point towards a post-factual world that seems content to ignore or misrepresent empirical evidence. As statistics educators we are challenged to promote understanding of statistics about society. In order to re-root public debate to be based on facts instead of emotions and to promote evidence-based policy decisions, statistics education needs to embrace two areas widely neglected in secondary and tertiary education: understanding of multivariate phenomena and the thinking with and learning from complex data. First published May 2017 at Statistics Education Research Journal Archives


2021 ◽  
Author(s):  
Jakob Raymaekers ◽  
Peter J. Rousseeuw

AbstractMany real data sets contain numerical features (variables) whose distribution is far from normal (Gaussian). Instead, their distribution is often skewed. In order to handle such data it is customary to preprocess the variables to make them more normal. The Box–Cox and Yeo–Johnson transformations are well-known tools for this. However, the standard maximum likelihood estimator of their transformation parameter is highly sensitive to outliers, and will often try to move outliers inward at the expense of the normality of the central part of the data. We propose a modification of these transformations as well as an estimator of the transformation parameter that is robust to outliers, so the transformed data can be approximately normal in the center and a few outliers may deviate from it. It compares favorably to existing techniques in an extensive simulation study and on real data.


Entropy ◽  
2020 ◽  
Vol 23 (1) ◽  
pp. 62
Author(s):  
Zhengwei Liu ◽  
Fukang Zhu

The thinning operators play an important role in the analysis of integer-valued autoregressive models, and the most widely used is the binomial thinning. Inspired by the theory about extended Pascal triangles, a new thinning operator named extended binomial is introduced, which is a general case of the binomial thinning. Compared to the binomial thinning operator, the extended binomial thinning operator has two parameters and is more flexible in modeling. Based on the proposed operator, a new integer-valued autoregressive model is introduced, which can accurately and flexibly capture the dispersed features of counting time series. Two-step conditional least squares (CLS) estimation is investigated for the innovation-free case and the conditional maximum likelihood estimation is also discussed. We have also obtained the asymptotic property of the two-step CLS estimator. Finally, three overdispersed or underdispersed real data sets are considered to illustrate a superior performance of the proposed model.


Econometrics ◽  
2021 ◽  
Vol 9 (1) ◽  
pp. 10
Author(s):  
Šárka Hudecová ◽  
Marie Hušková ◽  
Simos G. Meintanis

This article considers goodness-of-fit tests for bivariate INAR and bivariate Poisson autoregression models. The test statistics are based on an L2-type distance between two estimators of the probability generating function of the observations: one being entirely nonparametric and the second one being semiparametric computed under the corresponding null hypothesis. The asymptotic distribution of the proposed tests statistics both under the null hypotheses as well as under alternatives is derived and consistency is proved. The case of testing bivariate generalized Poisson autoregression and extension of the methods to dimension higher than two are also discussed. The finite-sample performance of a parametric bootstrap version of the tests is illustrated via a series of Monte Carlo experiments. The article concludes with applications on real data sets and discussion.


Information ◽  
2021 ◽  
Vol 12 (5) ◽  
pp. 202
Author(s):  
Louai Alarabi ◽  
Saleh Basalamah ◽  
Abdeltawab Hendawi ◽  
Mohammed Abdalla

The rapid spread of infectious diseases is a major public health problem. Recent developments in fighting these diseases have heightened the need for a contact tracing process. Contact tracing can be considered an ideal method for controlling the transmission of infectious diseases. The result of the contact tracing process is performing diagnostic tests, treating for suspected cases or self-isolation, and then treating for infected persons; this eventually results in limiting the spread of diseases. This paper proposes a technique named TraceAll that traces all contacts exposed to the infected patient and produces a list of these contacts to be considered potentially infected patients. Initially, it considers the infected patient as the querying user and starts to fetch the contacts exposed to him. Secondly, it obtains all the trajectories that belong to the objects moved nearby the querying user. Next, it investigates these trajectories by considering the social distance and exposure period to identify if these objects have become infected or not. The experimental evaluation of the proposed technique with real data sets illustrates the effectiveness of this solution. Comparative analysis experiments confirm that TraceAll outperforms baseline methods by 40% regarding the efficiency of answering contact tracing queries.


Symmetry ◽  
2021 ◽  
Vol 13 (3) ◽  
pp. 474
Author(s):  
Abdulhakim A. Al-Babtain ◽  
Ibrahim Elbatal ◽  
Hazem Al-Mofleh ◽  
Ahmed M. Gemeay ◽  
Ahmed Z. Afify ◽  
...  

In this paper, we introduce a new flexible generator of continuous distributions called the transmuted Burr X-G (TBX-G) family to extend and increase the flexibility of the Burr X generator. The general statistical properties of the TBX-G family are calculated. One special sub-model, TBX-exponential distribution, is studied in detail. We discuss eight estimation approaches to estimating the TBX-exponential parameters, and numerical simulations are conducted to compare the suggested approaches based on partial and overall ranks. Based on our study, the Anderson–Darling estimators are recommended to estimate the TBX-exponential parameters. Using two skewed real data sets from the engineering sciences, we illustrate the importance and flexibility of the TBX-exponential model compared with other existing competing distributions.


Sign in / Sign up

Export Citation Format

Share Document