ELIXIR Europe on the Road to Sustainable Research Software

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37677 ◽

2019 ◽

Vol 3 ◽

Author(s):

Mateusz Kuzak ◽

Jen Harrow ◽

Paula Martinez ◽

Fotis Psomopoulos ◽

Allegra Via

Keyword(s):

Best Practices ◽

Software Development ◽

Open Source ◽

Life Science ◽

Source Code ◽

Management Plan ◽

Third Party ◽

Research Software ◽

Communication Processes ◽

Training Materials

ELIXIR (ELIXIR Europe 2019a) is an intergovernmental organization that brings together life science resources across Europe. These resources include databases, software tools, training materials, cloud storage, and supercomputers. One of the goals of ELIXIR is to coordinate these resources so that they form a single infrastructure. This infrastructure makes it easier for scientists to find and share data, exchange expertise, and agree on best practices. ELIXIR's activities are divided into the following five areas: Data, Tools, Interoperability, Compute and Training, each known as “platform”. The ELIXIR Tools Platform works to improve the discovery, quality and sustainability of software resources. The Software Development Best Practices task of the Tools Platform aims to raise the quality and sustainability of research software by producing, adopting, and promoting information standards and best practices relevant to the software development life cycle. We have published four (4OSS) simple recommendations to encourage best practices in research software (Jiménez et al. 2017) and the Top 10 metrics for recommended life science software practices (Artaza et al. 2016). The 4OSS simple recommendations are as follows: Develop a publicly accessible open source code from day one. Make software easy to discover by providing software metadata via a popular community registry. Adopt a license and comply with the licenses of third-party dependencies. Have clear and transparent contribution, governance and communication processes. Develop a publicly accessible open source code from day one. Make software easy to discover by providing software metadata via a popular community registry. Adopt a license and comply with the licenses of third-party dependencies. Have clear and transparent contribution, governance and communication processes. In order to encourage researchers and developers to adopt the 4OSS recommendations and build FAIR (Findable, Accessible, Interoperable and Reusable) software, the best practices group, in partnership with the ELIXIR Training platform, The Carpentries (Carpentries 2019, ELIXIR Europe 2019b), and other communities, are creating a collection of training materials (Kuzak et al. 2019). The next step is to adopt, promote, and recognise these information standards and best practices. The group will address this by (i) developing comprehensive guidelines for software curation, (ii) through training researchers and developers towards the adoption of software best practices and (iii) improvement of the usability of Tools Platform products. Additionally, a direct outcome of this task will be a software management plan template, connected to a concise description of the guidelines for open research software; and production of a white paper for the software development management plan for ELIXIR, which can be consequently used to produce training materials. We will work with the newly formed ReSA (Research Software Alliance) to facilitate the adoption of this plan for the broader community.

Download Full-text

Four simple recommendations to encourage best practices in research software

F1000Research ◽

10.12688/f1000research.11407.1 ◽

2017 ◽

Vol 6 ◽

pp. 876 ◽

Cited By ~ 45

Author(s):

Rafael C. Jiménez ◽

Mateusz Kuzak ◽

Monther Alhamdoosh ◽

Michelle Barker ◽

Bérénice Batut ◽

...

Keyword(s):

Best Practices ◽

Software Development ◽

Open Source ◽

Source Code ◽

Computer Software ◽

Scientific Research ◽

Research Software ◽

New Software Development

Scientific research relies on computer software, yet software is not always developed following practices that ensure its quality and sustainability. This manuscript does not aim to propose new software development best practices, but rather to provide simple recommendations that encourage the adoption of existing best practices. Software development best practices promote better quality software, and better quality software improves the reproducibility and reusability of research. These recommendations are designed around Open Source values, and provide practical suggestions that contribute to making research software and its source code more discoverable, reusable and transparent. This manuscript is aimed at developers, but also at organisations, projects, journals and funders that can increase the quality and sustainability of research software by encouraging the adoption of these recommendations.

Download Full-text

Associating Natural Language Comment and Source Code Entities

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6382 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8592-8599

Author(s):

Sheena Panthaplackel ◽

Milos Gligoric ◽

Raymond J. Mooney ◽

Junyi Jessy Li

Keyword(s):

Software Development ◽

Natural Language ◽

Open Source ◽

Source Code ◽

Initial Step ◽

Binary Classifier ◽

Sequence Labeling ◽

Evaluation Dataset ◽

Revision Histories

Comments are an integral part of software development; they are natural language descriptions associated with source code elements. Understanding explicit associations can be useful in improving code comprehensibility and maintaining the consistency between code and comments. As an initial step towards this larger goal, we address the task of associating entities in Javadoc comments with elements in Java source code. We propose an approach for automatically extracting supervised data using revision histories of open source projects and present a manually annotated evaluation dataset for this task. We develop a binary classifier and a sequence labeling model by crafting a rich feature set which encompasses various aspects of code, comments, and the relationships between them. Experiments show that our systems outperform several baselines learning from the proposed supervision.

Download Full-text

Embedding Metadata and Other Semantics in Word Processing Documents

International Journal of Digital Curation ◽

10.2218/ijdc.v4i2.96 ◽

2009 ◽

Vol 4 (2) ◽

pp. 93-106 ◽

Cited By ~ 1

Author(s):

Peter Sefton ◽

Ian Barnes ◽

Ron Ward ◽

Jim Downing

Keyword(s):

Semantic Web ◽

Software Development ◽

Open Source ◽

Academic Writing ◽

Word Processing ◽

Source Code ◽

Data Curation ◽

User Testing ◽

Microsoft Word ◽

Computing Platforms

This paper describes a technique for embedding document metadata, and potentially other semantic references inline in word processing documents, which the authors have implemented with the help of a software development team. Several assumptions underly the approach; It must be available across computing platforms and work with both Microsoft Word (because of its user base) and OpenOffice.org (because of its free availability). Further the application needs to be acceptable to and usable by users, so the initial implementation covers only small number of features, which will only be extended after user-testing. Within these constraints the system provides a mechanism for encoding not only simple metadata, but for inferring hierarchical relationships between metadata elements from a ‘flat’ word processing file.The paper includes links to open source code implementing the techniques as part of a broader suite of tools for academic writing. This addresses tools and software, semantic web and data curation, integrating curation into research workflows and will provide a platform for integrating work on ontologies, vocabularies and folksonomies into word processing tools.

Download Full-text

Key Concepts and Definitions of Open Source Communities

Encyclopedia of Networked and Virtual Organizations ◽

10.4018/978-1-59904-885-7.ch099 ◽

2010 ◽

pp. 753-760

Author(s):

Ruben van Wendel de Joode ◽

Sebastian Spaeth

Keyword(s):

Software Development ◽

Open Source ◽

Open Source Software ◽

Online Communities ◽

Source Code ◽

Professional Organizations ◽

Large Numbers ◽

Key Concepts ◽

Open Source Communities ◽

Do So

Most open source software is developed in online communities. These communities are typically referred to as “open source software communities” or “OSS communities.” In OSS communities, the source code, which is the human-readable part of software, is treated as something that is open and that should be downloadable and modifiable to anyone who wishes to do so. The availability of the source code has enabled a practice of decentralized software development in which large numbers of people contribute time and effort. Communities like Linux and Apache, for instance, have been able to connect thousands of individual programmers and professional organizations (although most project communities remain relatively small). These people and organizations are not confined to certain geographical places; on the contrary, they come from literally all continents and they interact and collaborate virtually.

Download Full-text

Logging Analysis and Prediction in Open Source Java Project

Research Anthology on Usage and Development of Open Source Software ◽

10.4018/978-1-7998-9158-1.ch038 ◽

2021 ◽

pp. 733-761

Author(s):

Sangeeta Lal ◽

Neetu Sardana ◽

Ashish Sureka

Keyword(s):

Machine Learning ◽

Content Analysis ◽

Software Development ◽

Anomaly Detection ◽

Open Source ◽

Large Scale ◽

Source Code ◽

Scale Analysis ◽

Large Scale Analysis ◽

Research Questions

Log statements present in source code provide important information to the software developers because they are useful in various software development activities such as debugging, anomaly detection, and remote issue resolution. Most of the previous studies on logging analysis and prediction provide insights and results after analyzing only a few code constructs. In this chapter, the authors perform an in-depth, focused, and large-scale analysis of logging code constructs at two levels: the file level and catch-blocks level. They answer several research questions related to statistical and content analysis. Statistical and content analysis reveals the presence of differentiating properties among logged and nonlogged code constructs. Based on these findings, the authors propose a machine-learning-based model for catch-blocks logging prediction. The machine-learning-based model is found to be effective in catch-blocks logging prediction.

Download Full-text

Bionitio: demonstrating and facilitating best practices for bioinformatics command-line software

GigaScience ◽

10.1093/gigascience/giz109 ◽

2019 ◽

Vol 8 (9) ◽

Cited By ~ 3

Author(s):

Peter Georgeson ◽

Anna Syme ◽

Clare Sloggett ◽

Jessica Chung ◽

Harriet Dashnow ◽

...

Keyword(s):

Best Practices ◽

Software Development ◽

Open Source ◽

Open Source Software ◽

Command Line ◽

Use Of Resources ◽

Excellent Starting Point ◽

Starting Point ◽

Bioinformatics Software ◽

Programming Practices

Abstract Background Bioinformatics software tools are often created ad hoc, frequently by people without extensive training in software development. In particular, for beginners, the barrier to entry in bioinformatics software development is high, especially if they want to adopt good programming practices. Even experienced developers do not always follow best practices. This results in the proliferation of poorer-quality bioinformatics software, leading to limited scalability and inefficient use of resources; lack of reproducibility, usability, adaptability, and interoperability; and erroneous or inaccurate results. Findings We have developed Bionitio, a tool that automates the process of starting new bioinformatics software projects following recommended best practices. With a single command, the user can create a new well-structured project in 1 of 12 programming languages. The resulting software is functional, carrying out a prototypical bioinformatics task, and thus serves as both a working example and a template for building new tools. Key features include command-line argument parsing, error handling, progress logging, defined exit status values, a test suite, a version number, standardized building and packaging, user documentation, code documentation, a standard open source software license, software revision control, and containerization. Conclusions Bionitio serves as a learning aid for beginner-to-intermediate bioinformatics programmers and provides an excellent starting point for new projects. This helps developers adopt good programming practices from the beginning of a project and encourages high-quality tools to be developed more rapidly. This also benefits users because tools are more easily installed and consistent in their usage. Bionitio is released as open source software under the MIT License and is available at https://github.com/bionitio-team/bionitio.

Download Full-text

Integrating Projects from Multiple Open Source Code Forges

Database Technologies ◽

10.4018/978-1-60566-058-5.ch141 ◽

2009 ◽

pp. 2301-2312

Author(s):

Megan Squire

Keyword(s):

Software Development ◽

Open Source ◽

Relevant Literature ◽

Source Code ◽

Scoring Systems ◽

Open Source Code ◽

Multiple Code ◽

Future Work

Much of the data about free, libre, and open source (FLOSS) software development comes from studies of code forges or code repositories used for managing projects. This paper presents a method for integrating data about open source projects by way of matching projects (entities) across multiple code forges. After a review of the relevant literature, a few of the methods are chosen and applied to the FLOSS domain, including a comparison of some simple scoring systems for pairwise project matches. Finally, the paper describes limitations of this approach and recommendations for future work.

Download Full-text

Use of Free and Open-Source Software (FOSS) in the U.S. Department of Defense

Terry's Archive Online ◽

10.48034/20030102 ◽

2003 ◽

Vol 2003 (01) ◽

pp. 0102

Author(s):

Terry Bollinger

Keyword(s):

Software Development ◽

Open Source ◽

Open Source Software ◽

Department Of Defense ◽

Low Cost ◽

Source Code ◽

Leading Edge ◽

Cyber Attacks ◽

Software Analysis ◽

The U.S

This report documents the results of a study by The MITRE Corporation on the use of free and open-source software (FOSS) in the U.S. Department of Defense (DoD). FOSS gives users the right to run, copy, distribute, study, change, and improve it as they see fit, without asking permission or making fiscal payments to any external group or person. The study showed that FOSS provides substantial benefits to DoD security, infrastructure support, software development, and research. Given the openness of its source code, the finding that FOSS profoundly benefits security was both counterintuitive and instructive. Banning FOSS in DoD would remove access to exceptionally well-verified infrastructure components such as OpenBSD and robust network and software analysis tools needed to detect and respond to cyber-attacks. Finally, losing the hands-on source code accessibility of FOSS source code would reduce DoD’s ability to respond rapidly to cyberattacks. In short, banning FOSS would have immediate, broad, and strongly negative impacts on the DoD’s ability to defend the U.S. against cyberattacks. For infrastructure support, the deep historical ties between FOSS and the emergence of the Internet mean that removing FOSS applications would strongly negatively impact the DoD’s ability to support web and Internet-based applications. Software development would be hit especially hard due to many leading-edge and broadly used tools being FOSS. Finally, the loss of access to low-cost data processing tools and the inability to share results in the more potent form of executable FOSS software would seriously and negatively impact nearly all forms of scientific and data-driven research.

Download Full-text

Logging Analysis and Prediction in Open Source Java Project

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Optimizing Contemporary Application and Processes in Open Source Software ◽

10.4018/978-1-5225-5314-4.ch003 ◽

2018 ◽

pp. 57-85

Author(s):

Sangeeta Lal ◽

Neetu Sardana ◽

Ashish Sureka

Keyword(s):

Machine Learning ◽

Content Analysis ◽

Software Development ◽

Anomaly Detection ◽

Open Source ◽

Large Scale ◽

Source Code ◽

Scale Analysis ◽

Large Scale Analysis ◽

Research Questions

Download Full-text

The relevance of Open Source to hydroinformatics

Journal of Hydroinformatics ◽

10.2166/hydro.2002.0022 ◽

2002 ◽

Vol 4 (4) ◽

pp. 219-234 ◽

Cited By ~ 14

Author(s):

Hamish Harvey ◽

Dawei Han

Keyword(s):

Operating System ◽

Software Development ◽

Open Source ◽

Open Source Software ◽

Rapid Development ◽

Source Code ◽

Web Server ◽

High Profile ◽

History Of ◽

Closed Approach

Open Source, in which the source code to software is freely shared and improved upon, has recently risen to prominence as an alternative to the more usual closed approach to software development. A number of high profile projects, such as the Linux operating system kernel and the Apache web server, have demonstrated that Open Source can be technically effective, and companies such as Cygnus Solutions (now owned by Red Hat) and Zope Corporation have demonstrated that it is possible to build successful companies around open source software. Open Source could have significant benefits for hydroinformatics, encouraging widespread interoperability and rapid development. In this paper we present a brief history of Open Source, a summary of some reasons for its effectiveness, and we explore how and why Open Source is of particular interest in the field of hydroinformatics. We argue that for technical, scientific and business reasons, Open Source has a lot to offer.

Download Full-text