Embedding containerized workflows inside data science notebooks enhances reproducibility

Mapping Intimacies ◽

10.1101/309567 ◽

2018 ◽

Author(s):

Jiaming Hu ◽

Ling-Hong Hung ◽

Ka Yee Yeung

Keyword(s):

Computational Methods ◽

Data Science ◽

Science Notebooks ◽

Executable Code

AbstractData science notebooks, such as Jupyter, combine text documentation with dynamically editable and executable code and have become popular for sharing computational methods. We present nbdocker, an extension that integrates Docker software containers into Jupyter notebooks. nbdocker transforms notebooks into autonomous, self-contained, executable and reproducible modules that can document and disseminate complicated data science workflows containing code written in different languages and executables requiring different software environments.

Download Full-text

Critical Digital Humanities

10.5622/illinois/9780252042270.001.0001 ◽

2019 ◽

Author(s):

James E. Dobson

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Computational Methods ◽

Digital Humanities ◽

Data Science ◽

Computational Thinking ◽

Major Question ◽

Computational Tools ◽

Multiple Dimensions ◽

Selection For

This book seeks to develop an answer to the major question arising from the adoption of sophisticated data-science approaches within humanities research: are existing humanities methods compatible with computational thinking? Data-based and algorithmically powered methods present both new opportunities and new complications for humanists. This book takes as its founding assumption that the exploration and investigation of texts and data with sophisticated computational tools can serve the interpretative goals of humanists. At the same time, it assumes that these approaches cannot and will not obsolete other existing interpretive frameworks. Research involving computational methods, the book argues, should be subject to humanistic modes that deal with questions of power and infrastructure directed toward the field’s assumptions and practices. Arguing for a methodologically and ideologically self-aware critical digital humanities, the author contextualizes the digital humanities within the larger neo-liberalizing shifts of the contemporary university in order to resituate the field within a theoretically informed tradition of humanistic inquiry. Bringing the resources of critical theory to bear on computational methods enables humanists to construct an array of compelling and possible humanistic interpretations from multiple dimensions—from the ideological biases informing many commonly used algorithms to the complications of a historicist text mining, from examining the range of feature selection for sentiment analysis to the fantasies of human subjectless analysis activated by machine learning and artificial intelligence.

Download Full-text

Extracting Feature Engineering Knowledge from Data Science Notebooks

2019 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata47090.2019.9006522 ◽

2019 ◽

Cited By ~ 1

Author(s):

Masafumi Oyamada

Keyword(s):

Data Science ◽

Feature Engineering ◽

Engineering Knowledge ◽

Science Notebooks

Download Full-text

Computational Methods for Understanding Mass Spectrometry–Based Shotgun Proteomics Data

Annual Review of Biomedical Data Science ◽

10.1146/annurev-biodatasci-080917-013516 ◽

2018 ◽

Vol 1 (1) ◽

pp. 207-234 ◽

Cited By ~ 53

Author(s):

Pavel Sinitcyn ◽

Jan Daniel Rudolph ◽

Jürgen Cox

Keyword(s):

Mass Spectrometry ◽

Computational Methods ◽

Posttranslational Modifications ◽

Data Science ◽

Feature Detection ◽

Shotgun Proteomics ◽

Proteomics Data ◽

Computational Proteomics ◽

Biological Interpretation ◽

Concentration Changes

Computational proteomics is the data science concerned with the identification and quantification of proteins from high-throughput data and the biological interpretation of their concentration changes, posttranslational modifications, interactions, and subcellular localizations. Today, these data most often originate from mass spectrometry–based shotgun proteomics experiments. In this review, we survey computational methods for the analysis of such proteomics data, focusing on the explanation of the key concepts. Starting with mass spectrometric feature detection, we then cover methods for the identification of peptides. Subsequently, protein inference and the control of false discovery rates are highly important topics covered. We then discuss methods for the quantification of peptides and proteins. A section on downstream data analysis covers exploratory statistics, network analysis, machine learning, and multiomics data integration. Finally, we discuss current developments and provide an outlook on what the near future of computational proteomics might bear.

Download Full-text

The nonequilibrium quantum many-body problem as a paradigm for extreme data science

International Journal of Modern Physics B ◽

10.1142/s0217979214300217 ◽

2014 ◽

Vol 28 (31) ◽

pp. 1430021 ◽

Cited By ~ 4

Author(s):

J. K. Freericks ◽

B. K. Nikolić ◽

O. Frieder

Keyword(s):

Hilbert Space ◽

Big Data ◽

Computer Science ◽

Computational Methods ◽

Data Science ◽

Body Problem ◽

System Size ◽

Many Body ◽

Data Set ◽

New Approaches

Generating big data pervades much of physics. But some problems, which we call extreme data problems, are too large to be treated within big data science. The nonequilibrium quantum many-body problem on a lattice is just such a problem, where the Hilbert space grows exponentially with system size and rapidly becomes too large to fit on any computer (and can be effectively thought of as an infinite-sized data set). Nevertheless, much progress has been made with computational methods on this problem, which serve as a paradigm for how one can approach and attack extreme data problems. In addition, viewing these physics problems from a computer-science perspective leads to new approaches that can be tried to solve more accurately and for longer times. We review a number of these different ideas here.

Download Full-text

Computational Methods for Single-Particle Electron Cryomicroscopy

Annual Review of Biomedical Data Science ◽

10.1146/annurev-biodatasci-021020-093826 ◽

2020 ◽

Vol 3 (1) ◽

pp. 163-190 ◽

Cited By ~ 1

Author(s):

Amit Singer ◽

Fred J. Sigworth

Keyword(s):

Computational Methods ◽

Single Particle ◽

Data Science ◽

3D Structure ◽

Three Dimensional ◽

Imaging Method ◽

Electron Cryomicroscopy ◽

The Individual ◽

High Level ◽

3D Molecular Structure

Single-particle electron cryomicroscopy (cryo-EM) is an increasingly popular technique for elucidating the three-dimensional (3D) structure of proteins and other biologically significant complexes at near-atomic resolution. It is an imaging method that does not require crystallization and can capture molecules in their native states. In single-particle cryo-EM, the 3D molecular structure needs to be determined from many noisy 2D tomographic projections of individual molecules, whose orientations and positions are unknown. The high level of noise and the unknown pose parameters are two key elements that make reconstruction a challenging computational problem. Even more challenging is the inference of structural variability and flexible motions when the individual molecules being imaged are in different conformational states. This review discusses computational methods for structure determination by single-particle cryo-EM and their guiding principles from statistical inference, machine learning, and signal processing, which also play a significant role in many other data science applications.

Download Full-text

Race, Writing, and Computation: Racial Difference and the US Novel, 1880-2000

10.31235/osf.io/gd2t4 ◽

2019 ◽

Author(s):

Richard Jean So ◽

Hoyt Long ◽

Yuancheng Zhu

Keyword(s):

Computational Methods ◽

Data Science ◽

English Language ◽

Racial Difference ◽

Critical Race Studies ◽

Critical Race ◽

Computational Tools ◽

Racial Categories ◽

Race Studies ◽

The Us

This article seeks to bridge two scholarly fields often seen as incommensurable: cultural analytics (also known as "computational criticism") and critical race studies. It does so by discovering generative points of contact between two sets of methods that are also typically viewed as antithetical: data science and critique. Cultural analytics is an emerging field wherein humanist scholars leverage the increasing availability of large digital corpora and the affordances of new computational tools. This allows them to study, for example, semantic and narratological patterns in the English-language novel at the scale of centuries and across tens-of-thousands of texts. Cultural analytics is a fast-growing field, with scholars taking on an expanding array of topics, including genre and cultural prestige. Yet there is one topic that remains relatively understudied: race and racial difference. The reasons for this elision are not hard to grasp. Computational methods demand the quantification of one's objects of study. It's likely easier to accept measuring a novel's popularity by sales figures or classifying its genre by diction than labeling it according to discrete racial identifiers. Such labeling is an affront to critical race studies, which has taken as its very mission the deconstruction of racial categories.

Download Full-text

Hacking Social Science for the Age of Datafication

Journal of Digital Social Research ◽

10.33621/jdsr.v1i1.6 ◽

2019 ◽

Vol 1 (1) ◽

pp. 1-9

Author(s):

Simon Lindgren

Keyword(s):

Social Science ◽

Research Methods ◽

Computational Methods ◽

Data Science ◽

Core Elements ◽

Made In ◽

Critical Science

The ongoing and intensifying datafication of our societies poses huge challenges as well as opportunities for social science to rethink core elements of its research enterprise. Prominently, there is a pressing need to move beyond the long-standing qualitative/quantitative divide. This paper is an argument towards developing a critical science of data, by bringing together the interpretive theoretical and ethical sensibilities of social science with the predictive and prognostic powers of data science and computational methods. I argue that the renegotiation of theories and research methods that must be made in order for them to be more relevant and useful, can be fruitfully understood through the metaphor of hacking social science: developing creative ways of exploiting existing tools in alternative and unexpected ways to solve problems

Download Full-text