scholarly journals DINA—Development of open source and open services for natural history collections & research

Author(s):  
Falko Glöckler ◽  
James Macklin ◽  
David Shorthouse ◽  
Christian Bölling ◽  
Satpal Bilkhu ◽  
...  

The DINA Consortium (DINA = “DIgital information system for NAtural history data”, https://dina-project.net) is a framework for like-minded practitioners of natural history collections to collaborate on the development of distributed, open source software that empowers and sustains collections management. Target collections include zoology, botany, mycology, geology, paleontology, and living collections. The DINA software will also permit the compilation of biodiversity inventories and will robustly support both observation and molecular data. The DINA Consortium focuses on an open source software philosophy and on community-driven open development. Contributors share their development resources and expertise for the benefit of all participants. The DINA System is explicitly designed as a loosely coupled set of web-enabled modules. At its core, this modular ecosystem includes strict guidelines for the structure of Web application programming interfaces (APIs), which guarantees the interoperability of all components (https://github.com/DINA-Web). Important to the DINA philosophy is that users (e.g., collection managers, curators) be actively engaged in an agile development process. This ensures that the product is pleasing for everyday use, includes efficient yet flexible workflows, and implements best practices in specimen data capture and management. There are three options for developing a DINA module: create a new module compliant with the specifications (Fig. 1), modify an existing code-base to attain compliance (Fig. 2), or wrap a compliant API around existing code that cannot be or may not be modified (e.g., infeasible, dependencies on other systems, closed code) (Fig. 3). create a new module compliant with the specifications (Fig. 1), modify an existing code-base to attain compliance (Fig. 2), or wrap a compliant API around existing code that cannot be or may not be modified (e.g., infeasible, dependencies on other systems, closed code) (Fig. 3). All three of these scenarios have been applied in the modules recently developed: a module for molecular data (SeqDB), modules for multimedia, documents and agents data and a service module for printing labels and reports: The SeqDB collection management and molecular tracking system (Bilkhu et al. 2017) has evolved through two of these scenarios. Originally, the required architectural changes were going to be added into the codebase, but after some time, the development team recognised that the technical debt inherent in the project wasn’t worth the effort of modification and refactoring. Instead a new codebase was created bringing forward the best parts of the system oriented around the molecular data model for Sanger Sequencing and Next Generation Sequencing (NGS) workflows. In the case of the Multimedia and Document Store module and the Agents module, a brand new codebase was established whose technology choices were aligned with the DINA vision. These two modules have been created from fundamental use cases for collection management and digitization workflows and will continue to evolve as more modules come online and broaden their scope. The DINA Labels & Reporting module is a generic service for transforming data in arbitrary printable layouts based on customizable templates. In order to use the module in combination with data managed in collection management software Specify (http://specifysoftware.org) for printing labels of collection objects, we wrapped the Specify 7 API with a DINA-compliant API layer called the “DINA Specify Broker”. This allows for using the easy-to-use web-based template engine within the DINA Labels & Reports module without changing Specify’s codebase. In our presentation we will explain the DINA development philosophy and will outline benefits for different stakeholders who directly or indirectly use collections data and related research data in their daily workflows. We will also highlight opportunities for joining the DINA Consortium and how to best engage with members of DINA who share their expertise in natural science, biodiversity informatics and geoinformatics.

2018 ◽  
Vol 2 ◽  
pp. e25646
Author(s):  
James Macklin ◽  
Markus Englund ◽  
Falko Glöckler ◽  
Mikko Heikkinen ◽  
Jana Hoffmann ◽  
...  

The DINA Consortium (“DIgital information system for NAtural history data”, https://dina-project.net, Fig. 1) was formed in order to provide a framework for like-minded large natural history collection-holding institutions to collaborate through a distributed Open Source development model to produce a flexible and sustainable collection management system. Target collections include zoological, botanical, mycological, geological and paleontological collections, living collections, biodiversity inventories, observation records, and molecular data. The DINA system is architected as a loosely-coupled set of several web-based modules. The conceptual basis for this modular ecosystem is a compilation of comprehensive guidelines for Web application programming interfaces (APIs) to guarantee the interoperability of its components. Thus, all DINA components can be modified or even replaced by other components without crashing the rest of the system as long as they are DINA compliant. Furthermore, the modularity enables the institutions to host only the components they need. DINA focuses on an Open Source software philosophy and on community-driven open development, so the contributors share their development resources and expertise outside of their own institutions. One of the overarching reasons to develop a new collection management system is the need to better model complex relationships between collection objects (typically specimens) involving their derivatives, preparations and storage. We will discuss enhancements made in the DINA data model to better represent these relationships and the influence it has on the management of these objects, and on the sharing of information. Technical detail of various components of the DINA system will be shown in other talks in this symposium followed by a discussion session.


Author(s):  
Tania Walisch ◽  
Claude Pepin ◽  
Paul Braun

Over the past 20 years, the Luxembourg National Museum for Natural History (LMNH) has built a bio- and geodiversity information system to collate, manage and publish natural heritage observation and specimen data on a national and international level. To date the system counts over 2 million taxon occurrence and over 100,000 specimen records. The Museum has chosen, whenever available, public or open source software tools complying to international biodiversity data standards for recording, managing and publishing data to increase resilience, stay connected with community initiatives and mutualise development costs. A central component of the Museum’s national data hub is Recorder 6, a client-server database software for wildlife recording developed by the National Biodiversity Network in the UK. Today, the Recorder-Lux database contains a large portion of natural heritage information in Luxembourg and is synchronised daily into a publication database connected via the Integrated Publishing Toolkit (IPT) to the Global Biodiversity Information Facility (GBIF). Moreover, Recorder-Lux data is accessible via the national species mapping portal mdata.mnhn.lu which has been developed in-house and is aimed at scientists, professionals and decision makers. The Museum has also developed a set of data entry and upload functionalities on its website data.mnhn.lu using the open source software Indicia, a toolkit that provides a ready-made set of services and tools for online wildlife recording. In 2019, we implement the Atlas of Living Luxembourg (ALL) website all.mnhn.lu, based on the open source Atlas of Living Australia software. ALL is the most comprehensive data portal about natural heritage in Luxembourg, showing specimen data from the museum’s botany, zoology, paleontology, petrology and mineralogy collections as well as fungi, animal and plant observations collected from national and international organisations (via GBIF). Data providers vary from individual scientific collaborators to professional regional record centers or private consultancies working for public administrations. They use different tools offered by the museum to enter, manage and transfer their data to the system. Thus several regional record centers chose the client-server Recorder 6 software to manage and exchange their data, whereas individual scientific collaborators of the Museum enter or upload their data via the online data entry forms available on data.mnhn.lu. For large-scale, long-term, professional biodiversity monitoring and inventories at the national level, specific data entry forms and functionalities have been configured on the Indicia website. Finally, citizens can record species observations via the iNaturalist smartphone app. Due to the museum’s long history of conducting field inventories alongside collating and managing natural history collections, the data hub holds observation and collection data in one database. In 2003, the Museum has developed the Collection Management and Thesaurus extensions for the Recorder 6 software to catalogue, describe and manage specimens in the Museum collections. It allows handling of field-gathered data alongside specimen-specific data such as storage location, specimen type and conservation status. In recent years this has become an essential tool for the increasing effort directed at the digitisation of the diverse natural history collections at the Museum. Our small database team faces the challenge of integrating an ever increasing number of records from a variety of datasets, tools and initiatives. To keep the technical and administrative work as simple as possible we have implemented an open data policy and aim to increase the use of IPT to connect databases instead of physically importing all data into one central database. To improve data quality we focus on training experts to work with our Indicia verification tool.


2016 ◽  
Vol 3 (1) ◽  
pp. 107-128
Author(s):  
Syed Nadeem Ahsan ◽  
Muhammad Tanvir Afzal ◽  
Safdar Zaman ◽  
Christian Gütel ◽  
Franz Wotawa

During the evolution of any software, efforts are made to fix bugs or to add new features in software. In software engineering, previous history of effort data is required to build an effort estimation model, which estimates the cost and complexity of any software. Therefore, the role of effort data is indispensable to build state-of-the-art effort estimation models. Most of the Open Source Software does not maintain any effort related information. Consequently there is no state-of-the-art effort estimation model for Open Source Software, whereas most of the existing effort models are for commercial software. In this paper we present an approach to build an effort estimation model for Open Source Software. For this purpose we suggest to mine effort data from the history of the developer’s bug fix activities. Our approach determines the actual time spend to fix a bug, and considers it as an estimated effort. Initially, we use the developer’s bug-fix-activity data to construct the developer’s activity log-book. The log-book is used to store the actual time elapsed to fix a bug. Subsequently, the log-book information is used to mine the bug fix effort data. Furthermore, the developer’s bug fix activity data is used to define three different measures for the developer’s contribution or expertise level. Finally, we used the bug-fix-activity data to visualize the developer’s collaborations and the involved source files. In order to perform an experiment we selected the Mozilla open source project and downloaded 93,607 bug reports from the Mozilla project bug tracking system i.e., Bugzilla. We also downloaded the available CVS-log data from the Mozilla project repository. In this study we reveal that in case of Mozilla only 4.9% developers have been involved in fixing 71.5% of the reported bugs.


2022 ◽  
Author(s):  
Georges Labrèche ◽  
David Evans ◽  
Dominik Marszk ◽  
Tom Mladenov ◽  
Vasundhara Shiradhonkar ◽  
...  

2011 ◽  
Vol 3 (2) ◽  
pp. 43-78 ◽  
Author(s):  
M.M. Mahbubul Syeed ◽  
Timo Aaltonen ◽  
Imed Hammouda ◽  
Tarja Systä

Open Source Software (OSS) is currently a widely adopted approach to developing and distributing software. OSS code adoption requires an understanding of the structure of the code base. For a deeper understanding of the maintenance, bug fixing and development activities, the structure of the developer community also needs to be understood, especially the relations between the code and community structures. This, in turn, is essential for the development and maintenance of software containing OSS code. This paper proposes a method and support tool for exploring the relations of the code base and community structures of OSS projects. The method and proposed tool, Binoculars, rely on generic and reusable query operations, formal definitions of which are given in the paper. The authors demonstrate the applicability of Binoculars with two examples. The authors analyze a well-known and active open source project, FFMpeg, and the open source version of the IaaS cloud computing project Eucalyptus.


2009 ◽  
pp. 1079-1110 ◽  
Author(s):  
Kevin Crowston ◽  
Barbara Scozzi

Free/Libre open source software (FLOSS, e.g., Linux or Apache) is primarily developed by distributed teams. Developers contribute from around the world and coordinate their activity almost exclusively by means of email and bulletin boards, yet some how profit from the advantages and evade the challenges of distributed software development. In this article we investigate the structure and the coordination practices adopted by development teams during the bug-fixing process, which is considered one of main areas of FLOSS project success. In particular, based on a codification of the messages recorded in the bug tracking system of four projects, we identify the accomplished tasks, the adopted coordination mechanisms, and the role undertaken by both the FLOSS development team and the FLOSS community. We conclude with suggestions for further research.


2009 ◽  
pp. 797-828
Author(s):  
Kevin Crowston ◽  
Barbara Scozzi

Free/Libre open source software (FLOSS, e.g., Linux or Apache) is primarily developed by distributed teams. Developers contribute from around the world and coordinate their activity almost exclusively by means of email and bulletin boards, yet some how profit from the advantages and evade the challenges of distributed software development. In this article we investigate the structure and the coordination practices adopted by development teams during the bug-fixing process, which is considered one of main areas of FLOSS project success. In particular, based on a codification of the messages recorded in the bug tracking system of four projects, we identify the accomplished tasks, the adopted coordination mechanisms, and the role undertaken by both the FLOSS development team and the FLOSS community. We conclude with suggestions for further research.


Proceedings ◽  
2020 ◽  
Vol 30 (1) ◽  
pp. 79
Author(s):  
Ioanna Panagea ◽  
Dangol Anuja ◽  
Marc Olijslagers ◽  
Jan Diels ◽  
Guido Wyseure

Agricultural cropping systems and experiments include complex interactions of processes and various management practices and/or treatments under a wide range of environmental and climatic conditions. The use of standardized formats to monitor and document these systems and experiments can help researchers and stakeholders to efficiently exchange data, promote interdisciplinary collaborations, and simplify modelling and analysis procedures. In the scope of the SoilCare Horizon 2020 project monitoring and assessment work package, an integrated scheme to collect, validate, store, and access cropping system information and experimental data from 16 study sites, was created. The aim of the scheme is to make the data readily available in a way that the information is useful, easy to access and download, and safe, relying only on open source software. The database design considers data and metadata required to properly and easily monitor, process, and analyse cropping systems and/or agricultural experiments. The scheme allows for the storage of data and metadata regarding the experimental set-up, associated people and institutions, information about field management operations and experimental procedures which are clearly separated for making analysis procedures faster, links between system components, and information about the environmental and climatic conditions. Raw data are entered by the users into a structured spreadsheet. The quality is checked before storing the data into the database. Providing raw data allows processing and analysing as each other user needs. A desktop import application has been created to upload the information from spreadsheet to database, which includes automated error checks of relationship tables, data types, data constraints, etc. The final component of the scheme is the database web application interface, which enables users to access and query the database across the study sites without the knowledge of query languages and to download the required data. For this system design, PostgreSQL is used for storing the data, pgAdmin 4 for database management administration, MongoDB for user management and authentication, Python for the development of the import application, Angular and Node.js/Express for the web application and spreadsheets compatible with LibreOffice Calc. The system is currently tested with data provided by the SoilCare study sites. Preliminary testing indicated that extended quality control of the spreadsheets was required from the system’s administrator to meet the standards and restrictions of the import application. Initial comments from the users indicate that the database scheme, even if it initially seems complicated, includes all the variables and details required for a complete monitoring and modelling of an agricultural cropping system.


2014 ◽  
Vol 62 ◽  
pp. 35-42 ◽  
Author(s):  
Blagoj Delipetrev ◽  
Andreja Jonoski ◽  
Dimitri P. Solomatine

Author(s):  
Shigeru Yamada ◽  
Masakazu Yamaguchi

A software development paradigm for open source software (OSS) project has been rapidly spread in recent years. On the other hand, an effective method of quality management has not been established due to the unique development characteristics such as no testing phase. In this paper, we assume that the number of fault-detections observed on the bug tracking system tends to infinity, and discuss a method of statistical process control (SPC) for OSS projects by applying the logarithmic Poisson execution time model as a software reliability growth model (SRGM) based on a nonhomogeneous Poisson process (NHPP). Then, we propose a control chart method based on the logarithmic Poisson execution time model for judging the statical stability state, and estimating the additional development time for attaining the objective software failure intensity, i.e., the target value of the instantaneous fault-detection rate per unit time. We also discuss an optimal software release problem for determining the optimum time when to stop OSS development and to transfer it to user operation. Further, numerical illustrations for SPC are shown by applying the actual fault-count data observed on the bug tracking system.


Sign in / Sign up

Export Citation Format

Share Document