Data Management Challenges of Data-Intensive Scientific Workflows

Author(s):  
Ewa Deelman ◽  
Ann Chervenak
Author(s):  
Ewa Deelman ◽  
Ann Chervenak

Scientific applications such as those in astronomy, earthquake science, gravitational-wave physics, and others have embraced workflow technologies to do large-scale science. Workflows enable researchers to collaboratively design, manage, and obtain results that involve hundreds of thousands of steps, access terabytes of data, and generate similar amounts of intermediate and final data products. Although workflow systems are able to facilitate the automated generation of data products, many issues still remain to be addressed. These issues exist in different forms in the workflow lifecycle. This chapter describes a workflow lifecycle as consisting of a workflow generation phase where the analysis is defined, the workflow planning phase where resources needed for execution are selected, the workflow execution part, where the actual computations take place, and the result, metadata, and provenance storing phase. The authors discuss the issues related to data management at each step of the workflow cycle. They describe challenge problems and illustrate them in the context of real-life applications. They discuss the challenges, possible solutions, and open issues faced when mapping and executing large-scale workflows on current cyberinfrastructure. They particularly emphasize the issues related to the management of data throughout the workflow lifecycle.


2020 ◽  
Vol 146 ◽  
pp. 35-51
Author(s):  
Tong Jin ◽  
Fan Zhang ◽  
Qian Sun ◽  
Melissa Romanus ◽  
Hoang Bui ◽  
...  

2013 ◽  
Vol 12 (2) ◽  
pp. 245-264 ◽  
Author(s):  
Claudia Szabo ◽  
Quan Z. Sheng ◽  
Trent Kroeger ◽  
Yihong Zhang ◽  
Jian Yu

2018 ◽  
Vol 29 (2) ◽  
pp. 338-350 ◽  
Author(s):  
Nicholas Hazekamp ◽  
Nathaniel Kremer-Herman ◽  
Benjamin Tovar ◽  
Haiyan Meng ◽  
Olivia Choudhury ◽  
...  

2016 ◽  
Vol 11 (1) ◽  
pp. 156 ◽  
Author(s):  
Wei Jeng ◽  
Liz Lyon

We report on a case study which examines the social science community’s capability and institutional support for data management. Fourteen researchers were invited for an in-depth qualitative survey between June 2014 and October 2015. We modify and adopt the Community Capability Model Framework (CCMF) profile tool to ask these scholars to self-assess their current data practices and whether their academic environment provides enough supportive infrastructure for data related activities. The exemplar disciplines in this report include anthropology, political sciences, and library and information science. Our findings deepen our understanding of social disciplines and identify capabilities that are well developed and those that are poorly developed. The participants reported that their institutions have made relatively slow progress on economic supports and data science training courses, but acknowledged that they are well informed and trained for participants’ privacy protection. The result confirms a prior observation from previous literature that social scientists are concerned with ethical perspectives but lack technical training and support. The results also demonstrate intra- and inter-disciplinary commonalities and differences in researcher perceptions of data-intensive capability, and highlight potential opportunities for the development and delivery of new and impactful research data management support services to social sciences researchers and faculty. 


Sign in / Sign up

Export Citation Format

Share Document