The NIH BD2K center for big data in translational genomics

Benedict Paten; Mark Diekhans; Brian J Druker; Stephen Friend; Justin Guinney; Nadine Gassner; Mitchell Guttman; W James Kent; Patrick Mantey; Adam A Margolin; Matt Massie; Adam M Novak; Frank Nothaft; Lior Pachter; David Patterson; Maciej Smuga-Otto; Joshua M Stuart; Laura Van’t Veer; Barbara Wold; David Haussler

doi:10.1093/jamia/ocv047

The NIH BD2K center for big data in translational genomics

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocv047 ◽

2015 ◽

Vol 22 (6) ◽

pp. 1143-1147 ◽

Cited By ~ 15

Author(s):

Benedict Paten ◽

Mark Diekhans ◽

Brian J Druker ◽

Stephen Friend ◽

Justin Guinney ◽

...

Keyword(s):

Big Data ◽

Open Source ◽

Rare Diseases ◽

Open Source Software ◽

Global Alliance ◽

Translational Genomics ◽

Share Data ◽

Application Programming ◽

Programming Interfaces ◽

Genome Informatics

Abstract The world’s genomics data will never be stored in a single repository – rather, it will be distributed among many sites in many countries. No one site will have enough data to explain genotype to phenotype relationships in rare diseases; therefore, sites must share data. To accomplish this, the genetics community must forge common standards and protocols to make sharing and computing data among many sites a seamless activity. Through the Global Alliance for Genomics and Health, we are pioneering the development of shared application programming interfaces (APIs) to connect the world’s genome repositories. In parallel, we are developing an open source software stack (ADAM) that uses these APIs. This combination will create a cohesive genome informatics ecosystem. Using containers, we are facilitating the deployment of this software in a diverse array of environments. Through benchmarking efforts and big data driver projects, we are ensuring ADAM’s performance and utility.

Download Full-text

A Survey of Open Source Products for Building a SIP Communication Platform

Advances in Multimedia ◽

10.1155/2011/372591 ◽

2011 ◽

Vol 2011 ◽

pp. 1-21 ◽

Cited By ~ 4

Author(s):

Pavel Segec ◽

Tatiana Kovacikova

Keyword(s):

Open Source ◽

Real Time ◽

Open Source Software ◽

Instant Messaging ◽

Ip Networks ◽

Service Development ◽

Communication Platform ◽

Media Services ◽

Application Programming ◽

Programming Interfaces

The Session Initiation Protocol (SIP) is a multimedia signalling protocol that has evolved into a widely adopted communication standard. The integration of SIP into existing IP networks has fostered IP networks becoming a convergence platform for both real-time and non-real-time multimedia communications. This converged platform integrates data, voice, video, presence, messaging, and conference services into a single network that offers new communication experiences for users. The open source community has contributed to SIP adoption through the development of open source software for both SIP clients and servers. In this paper, we provide a survey on open SIP systems that can be built using publically available software. We identify SIP features for service development and programming, services and applications of a SIP-converged platform, and the most important technologies supporting SIP functionalities. We propose an advanced converged IP communication platform that uses SIP for service delivery. The platform supports audio and video calls, along with media services such as audio conferences, voicemail, presence, and instant messaging. Using SIP Application Programming Interfaces (APIs), the platform allows the deployment of advanced integrated services. The platform is implemented with open source software. Architecture components run on standardized hardware with no need for special purpose investments.

Download Full-text

Industrial Big Data Platform Based on Open Source Software

Proceedings of the International Conference on Computer Networks and Communication Technology (CNCT 2016) ◽

10.2991/cnct-16.2017.90 ◽

2017 ◽

Cited By ~ 1

Author(s):

Wen YANG ◽

Syed Naeem Haider ◽

Jian-hong ZOU ◽

Qian-chuan ZHAO

Keyword(s):

Big Data ◽

Open Source ◽

Open Source Software ◽

Industrial Big Data ◽

Data Platform

Download Full-text

Computing remote sensing big data using local hardware and open-source software packages

Kart og plan ◽

10.18261/issn.2535-6003-2021-03-04-09 ◽

2021 ◽

Vol 114 (3-04) ◽

pp. 254-273

Author(s):

Misganu Debella-Gilo ◽

Jonathan Rizzi

Keyword(s):

Remote Sensing ◽

Big Data ◽

Open Source ◽

Open Source Software ◽

Software Packages

Download Full-text

Conquery: an Open Source Application to analyze High Content Healthcare Data (Preprint)

10.2196/preprints.32745 ◽

2021 ◽

Author(s):

Fabian Kovacs ◽

Max Thonagel ◽

Marion Ludwig ◽

Alexander Albrecht ◽

Manuel Hegner ◽

...

Keyword(s):

Decision Making ◽

Big Data ◽

Data Analysis ◽

Open Source ◽

Open Source Software ◽

Medical Records ◽

Healthcare Sector ◽

Study Cohort ◽

Decision Making Processes ◽

Analytical Approaches

BACKGROUND Big data in healthcare must be exploited to achieve a substantial increase in efficiency and competitiveness. Especially the analysis of patient-related data possesses huge potential to improve decision-making processes. However, most analytical approaches used today are highly time- and resource-consuming. OBJECTIVE The presented software solution Conquery is an open-source software tool providing advanced, but intuitive data analysis without the need for specialized statistical training. Conquery aims to simplify big data analysis for novice database users in the medical sector. METHODS Conquery is a document-oriented distributed timeseries database and analysis platform. Its main application is the analysis of per-person medical records by non-technical medical professionals. Complex analyses are realized in the Conquery frontend by dragging tree nodes into the query editor. Queries are evaluated by a bespoke distributed query-engine for medical records in a column-oriented fashion. We present a custom compression scheme to facilitate low response times that uses online calculated as well as precomputed metadata and data statistics. RESULTS Conquery allows for easy navigation through the hierarchy and enables complex study cohort construction whilst reducing the demand on time and resources. The UI of Conquery and a query output is exemplified by the construction of a relevant clinical cohort. CONCLUSIONS Conquery is an efficient and intuitive open-source software for performant and secure data analysis and aims at supporting decision-making processes in the healthcare sector.

Download Full-text

Enabling External Inquiries to an Existing Patient Registry by Using the Open Source Registry System for Rare Diseases: Demonstration of the System Using the European Society for Immunodeficiencies Registry (Preprint)

10.2196/preprints.17420 ◽

2019 ◽

Author(s):

Raphael Scheible ◽

Dennis Kadioglu ◽

Stephan Ehl ◽

Marco Blum ◽

Martin Boeker ◽

...

Keyword(s):

Open Source ◽

Rare Diseases ◽

Open Source Software ◽

European Society ◽

Patient Registries ◽

Patient Organizations ◽

Registry System ◽

Decentralized Search ◽

Custom Software ◽

Set Up

BACKGROUND The German Network on Primary Immunodeficiency Diseases (PID-NET) utilizes the European Society for Immunodeficiencies (ESID) registry as a platform for collecting data. In the context of PID-NET data, we show how registries based on custom software can be made interoperable for better collaborative access to precollected data. The Open Source Registry System for Rare Diseases (<i>Open-Source-Registersystem für Seltene Erkrankungen</i> [OSSE], in German) provides patient organizations, physicians, scientists, and other parties with open source software for the creation of patient registries. In addition, the necessary interoperability between different registries based on the OSSE, as well as existing registries, is supported, which allows those registries to be confederated at both the national and international levels. OBJECTIVE Data from the PID-NET registry should be made available in an interoperable manner without losing data sovereignty by extending the existing custom software of the registry using the OSSE registry framework. METHODS This paper describes the following: (1) the installation and configuration of the OSSE bridgehead, (2) an approach using a free toolchain to set up the required interfaces to connect a registry with the OSSE bridgehead, and (3) the decentralized search, which allows the formulation of inquiries that are sent to a selected set of registries of interest. RESULTS PID-NET uses the established and highly customized ESID registry software. By setting up a so-called OSSE bridgehead, PID-NET data are made interoperable according to a federated approach, and centrally formulated inquiries for data can be received. As the first registry to use the OSSE bridgehead, the authors introduce an approach using a free toolchain to efficiently implement and maintain the required interfaces. Finally, to test and demonstrate the system, two inquiries are realized using the graphical query builder. By establishing and interconnecting an OSSE bridgehead with the underlying ESID registry, confederated queries for data can be received and, if desired, the inquirer can be contacted to further discuss any requirements for cooperation. CONCLUSIONS The OSSE offers an infrastructure that provides the possibility of more collaborative and transparent research. The decentralized search functionality includes registries into one search application while still maintaining data sovereignty. The OSSE bridgehead enables any registry software to be integrated into the OSSE network. The proposed toolchain to set up the required interfaces consists of freely available software components that are well documented. The use of the decentralized search is uncomplicated to use and offers a well-structured, yet still improvable, graphical user interface to formulate queries.

Download Full-text

TagNN: A Code Tag Generation Technology for Resource Retrieval from Open-Source Big Data

Wireless Communications and Mobile Computing ◽

10.1155/2021/9956207 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Lingbin Zeng ◽

Xin Guo ◽

Cheng Yang ◽

Yao Lu ◽

Xiao Li

Keyword(s):

Big Data ◽

Open Source ◽

Open Source Software ◽

Learning Algorithm ◽

Difficult Problem ◽

Empirical Knowledge ◽

Huge Number ◽

Source Codes ◽

Deep Learning Algorithm ◽

Generation Technology

With the vigorous development of open-source software, a huge number of open-source projects and open-source codes have been accumulated in open-source big data, which contains a wealth of code resources. However, effectively and efficiently retrieving the relevant code snippets in such a large amount of open-source big data is an extremely difficult problem. There are usually large gaps between the user’s natural language description and the open-source code snippets. In this paper, we propose a novel code tag generation and code retrieval approach named TagNN, which combines software engineering empirical knowledge and a deep learning algorithm. The experimental results show that our method has good effects on code tag generation and code snippet retrieval.

Download Full-text

Big data processing using Open Source Software- A Questionnaire on the data science

Scholedge International Journal of Multidisciplinary & Allied Studies ISSN 2394-336X ◽

10.19085/journal.sijmas030101 ◽

2016 ◽

Vol 3 (1) ◽

pp. 1

Author(s):

Andrew McCullum

Keyword(s):

Big Data ◽

Data Processing ◽

World Trade Organization ◽

Central Asia ◽

Open Source ◽

Open Source Software ◽

World Trade ◽

Data Science ◽

Customs Union ◽

The World

In 2015, Central Asia made some vital enhancements in nature for cross-fringe e-business: Kazakhstan's promotion to the World Trade Organization (WTO) will help business straightforwardness, while the Kyrgyz Republic's enrollment in the Eurasian Customs Union grows its buyer base. Why e-business? Two reasons to begin with, e-trade diminishes the expense of separation. Focal Asia is the most elevated exchange cost locale on the planet: unlimited separations from real markets make discovering purchasers testing, shipping merchandise moderate, and fare costs high. Second, e-business can pull in populaces that are customarily under-spoke to in fare markets, for example, ladies, little organizations and rustic business visionaries.

Download Full-text

Open Source Supply Chains

Management Information Systems for Enterprise Applications - Advances in Business Strategy and Competitive Advantage ◽

10.4018/978-1-4666-0164-2.ch005 ◽

2012 ◽

pp. 74-102

Keyword(s):

Supply Chain ◽

Supply Chains ◽

Open Source ◽

Reference Model ◽

Life Cycle Management ◽

Complex Products ◽

Business Cases ◽

Application Programming ◽

Related Application ◽

Programming Interfaces

This chapter deals with an ambitious Management Information System goal: the creation of open source supply chains. It starts with some basics and background for the open (source) supply chains, discusses relevant architectures and modelling work, proceeds to an analysis of real-world business cases and the related application scenarios, and presents an open source reference model. In current e-commerce frameworks, the issue of dynamic supply chain establishment and supply chain life cycle management is still misrepresented and not addressed adequately. Registration, advertisement, and change management for complex products and services heavily relies on proprietary application programming interfaces and protocols as well as emerging and partially competing (pseudo)standards.

Download Full-text

What Is Open Source Software (OSS) and What Is Big Data?

Research Anthology on Usage and Development of Open Source Software ◽

10.4018/978-1-7998-9158-1.ch041 ◽

2021 ◽

pp. 817-857

Author(s):

Richard S. Segall

Keyword(s):

Big Data ◽

Open Source ◽

Open Source Software ◽

Fog Computing ◽

Computer Software ◽

Data Sets ◽

Stream Data ◽

Big Data Visualization ◽

Continuous Stream

This chapter discusses what Open Source Software is and its relationship to Big Data and how it differs from other types of software and its software development cycle. Open source software (OSS) is a type of computer software in which source code is released under a license in which the copyright holder grants users the rights to study, change, and distribute the software to anyone and for any purpose. Big Data are data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. Big data can be discrete or a continuous stream data and is accessible using many types of computing devices ranging from supercomputers and personal workstations to mobile devices and tablets. It is discussed how fog computing can be performed with cloud computing for visualization of Big Data. This chapter also presents a summary of additional web-based Big Data visualization software.

Download Full-text

Predictive edge computing for time series of industrial IoT and large scale critical infrastructure based on open-source software analytic of big data

2017 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata.2017.8258103 ◽

2017 ◽

Cited By ~ 6

Author(s):

Emmanuel Oyekanlu

Keyword(s):

Time Series ◽

Big Data ◽

Open Source ◽

Open Source Software ◽

Large Scale ◽

Critical Infrastructure ◽

Edge Computing ◽

Industrial Iot

Download Full-text