scholarly journals A comprehensive resource for retrieving, visualizing, and integrating functional genomics data

2019 ◽  
Vol 3 (1) ◽  
pp. e201900546
Author(s):  
Matthias Blum ◽  
Pierre-Etienne Cholley ◽  
Valeriya Malysheva ◽  
Samuel Nicaise ◽  
Julien Moehlin ◽  
...  

The enormous amount of freely accessible functional genomics data is an invaluable resource for interrogating the biological function of multiple DNA-interacting players and chromatin modifications by large-scale comparative analyses. However, in practice, interrogating large collections of public data requires major efforts for (i) reprocessing available raw reads, (ii) incorporating quality assessments to exclude artefactual and low-quality data, and (iii) processing data by using high-performance computation. Here, we present qcGenomics, a user-friendly online resource for ultrafast retrieval, visualization, and comparative analysis of tens of thousands of genomics datasets to gain new functional insight from global or focused multidimensional data integration.

2019 ◽  
Author(s):  
Manoj Kumar ◽  
Cameron Thomas Ellis ◽  
Qihong Lu ◽  
Hejia Zhang ◽  
Mihai Capota ◽  
...  

Advanced brain imaging analysis methods, including multivariate pattern analysis (MVPA), functional connectivity, and functional alignment, have become powerful tools in cognitive neuroscience over the past decade. These tools are implemented in custom code and separate packages, often requiring different software and language proficiencies. Although usable by expert researchers, novice users face a steep learning curve. These difficulties stem from the use of new programming languages (e.g., Python), learning how to apply machine-learning methods to high-dimensional fMRI data, and minimal documentation and training materials. Furthermore, most standard fMRI analysis packages (e.g., AFNI, FSL, SPM) focus on preprocessing and univariate analyses, leaving a gap in how to integrate with advanced tools. To address these needs, we developed BrainIAK (brainiak.org), an open-source Python software package that seamlessly integrates several cutting-edge, computationally efficient techniques with other Python packages (e.g., Nilearn, Scikit-learn) for file handling, visualization, and machine learning. To disseminate these powerful tools, we developed user-friendly tutorials (in Jupyter format; https://brainiak.org/tutorials/) for learning BrainIAK and advanced fMRI analysis in Python more generally. These materials cover techniques including: MVPA (pattern classification and representational similarity analysis); parallelized searchlight analysis; background connectivity; full correlation matrix analysis; inter-subject correlation; inter-subject functional connectivity; shared response modeling; event segmentation using hidden Markov models; and real-time fMRI. For long-running jobs or large memory needs we provide detailed guidance on high-performance computing clusters. These notebooks were successfully tested at multiple sites, including as problem sets for courses at Yale and Princeton universities and at various workshops and hackathons. These materials are freely shared, with the hope that they become part of a pool of open-source software and educational materials for large-scale, reproducible fMRI analysis and accelerated discovery.


2021 ◽  
pp. 1-12
Author(s):  
Bilal Tahir ◽  
Muhammad Amir Mehmood

 The confluence of high performance computing algorithms and large scale high-quality data has led to the availability of cutting edge tools in computational linguistics. However, these state-of-the-art tools are available only for the major languages of the world. The preparation of large scale high-quality corpora for low resource language such as Urdu is a challenging task as it requires huge computational and human resources. In this paper, we build and analyze a large scale Urdu language Twitter corpus Anbar. For this purpose, we collect 106.9 million Urdu tweets posted by 1.69 million users during one year (September 2018-August 2019). Our corpus consists of tweets with a rich vocabulary of 3.8 million unique tokens along with 58K hashtags and 62K URLs. Moreover, it contains 75.9 million (71.0%) retweets and 847K geotagged tweets. Furthermore, we examine Anbar using a variety of metrics like temporal frequency of tweets, vocabulary size, geo-location, user characteristics, and entities distribution. To the best of our knowledge, this is the largest repository of Urdu language tweets for the NLP research community which can be used for Natural Language Understanding (NLU), social analytics, and fake news detection.


2020 ◽  
Author(s):  
Yasser Iturria-Medina ◽  
Felix Carbonell ◽  
Atoussa Assadi ◽  
Quadri Adewale ◽  
Ahmed F. Khan ◽  
...  

There is a critical need for a better multiscale and multifactorial understanding of neurological disorders, covering from genes to neuroimaging to clinical factors and treatments effects. Here we present NeuroPM-box, a cross-platform, user-friendly and open-access software for characterizing multiscale and multifactorial brain pathological mechanisms and identifying individual therapeutic needs. The implemented methods have been extensively tested and validated in the neurodegenerative context, but there is not restriction in the kind of disorders that can be analyzed. By using advanced analytic modeling of molecular, neuroimaging and/or cognitive/behavioral data, this framework allows multiple applications, including characterization of: (i) the series of sequential states (e.g. transcriptomic, imaging or clinical alterations) covering decades of disease progression, (ii) intra-brain spreading of pathological factors (e.g. amyloid and tau misfolded proteins), (iii) synergistic interactions between multiple brain biological factors (e.g. direct tau effects on vascular and structural properties), and (iv) biologically-defined patients stratification based on therapeutic needs (i.e. optimum treatments for each patient). All models outputs are biologically interpretable. A 4D-viewer allows visualization of spatiotemporal brain (dis)organization. Originally implemented in MATLAB, NeuroPM-box is compiled as standalone application for Windows, Linux and Mac environments: neuropm-lab.com/software. In a regular workstation, it can analyze over 150 subjects per day, reducing the need for using clusters or High-Performance Computing (HPC) for large-scale datasets. This open-access tool for academic researchers may significantly contribute to a better understanding of complex brain processes and to accelerating the implementation of Precision Medicine (PM) in neurology.


2021 ◽  
Author(s):  
Jin Wang ◽  
Jiacheng Wu ◽  
Mingda Li ◽  
Jiaqi Gu ◽  
Ariyam Das ◽  
...  

AbstractWith an escalating arms race to adopt machine learning (ML) in diverse application domains, there is an urgent need to support declarative machine learning over distributed data platforms. Toward this goal, a new framework is needed where users can specify ML tasks in a manner where programming is decoupled from the underlying algorithmic and system concerns. In this paper, we argue that declarative abstractions based on Datalog are natural fits for machine learning and propose a purely declarative ML framework with a Datalog query interface. We show that using aggregates in recursive Datalog programs entails a concise expression of ML applications, while providing a strictly declarative formal semantics. This is achieved by introducing simple conditions under which the semantics of recursive programs is guaranteed to be equivalent to that of aggregate-stratified ones. We further provide specialized compilation and planning techniques for semi-naive fixpoint computation in the presence of aggregates and optimization strategies that are effective on diverse recursive programs and distributed data platforms. To test and demonstrate these research advances, we have developed a powerful and user-friendly system on top of Apache Spark. Extensive evaluations on large-scale datasets illustrate that this approach will achieve promising performance gains while improving both programming flexibility and ease of development and deployment for ML applications.


Author(s):  
Dehe Wang ◽  
Weiliang Fan ◽  
Xiaolong Guo ◽  
Kai Wu ◽  
Siyu Zhou ◽  
...  

Abstract Malvaceae is a family of flowering plants containing many economically important plant species including cotton, cacao and durian. Recently, the genomes of several Malvaceae species have been decoded, and many omics data were generated for individual species. However, no integrative database of multiple species, enabling users to jointly compare and analyse relevant data, is available for Malvaceae. Thus, we developed a user-friendly database named MaGenDB (http://magen.whu.edu.cn) as a functional genomics hub for the plant community. We collected the genomes of 13 Malvaceae species, and comprehensively annotated genes from different perspectives including functional RNA/protein element, gene ontology, KEGG orthology, and gene family. We processed 374 sets of diverse omics data with the ENCODE pipelines and integrated them into a customised genome browser, and designed multiple dynamic charts to present gene/RNA/protein-level knowledge such as dynamic expression profiles and functional elements. We also implemented a smart search system for efficiently mining genes. In addition, we constructed a functional comparison system to help comparative analysis between genes on multiple features in one species or across closely related species. This database and associated tools will allow users to quickly retrieve large-scale functional information for biological discovery.


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e4234 ◽  
Author(s):  
T. Jeffrey Cole ◽  
Michael S. Brewer

Background The recent proliferation of large amounts of biodiversity transcriptomic data has resulted in an ever-expanding need for scalable and user-friendly tools capable of answering large scale molecular evolution questions. FUSTr identifies gene families involved in the process of adaptation. This is a tool that finds genes in transcriptomic datasets under strong positive selection that automatically detects isoform designation patterns in transcriptome assemblies to maximize phylogenetic independence in downstream analysis. Results When applied to previously studied spider transcriptomic data as well as simulated data, FUSTr successfully grouped coding sequences into proper gene families as well as correctly identified those under strong positive selection in relatively little time. Conclusions FUSTr provides a useful tool for novice bioinformaticians to characterize the molecular evolution of organisms throughout the tree of life using large transcriptomic biodiversity datasets and can utilize multi-processor high-performance computational facilities.


Author(s):  
C.K. Wu ◽  
P. Chang ◽  
N. Godinho

Recently, the use of refractory metal silicides as low resistivity, high temperature and high oxidation resistance gate materials in large scale integrated circuits (LSI) has become an important approach in advanced MOS process development (1). This research is a systematic study on the structure and properties of molybdenum silicide thin film and its applicability to high performance LSI fabrication.


2020 ◽  
Author(s):  
James McDonagh ◽  
William Swope ◽  
Richard L. Anderson ◽  
Michael Johnston ◽  
David J. Bray

Digitization offers significant opportunities for the formulated product industry to transform the way it works and develop new methods of business. R&D is one area of operation that is challenging to take advantage of these technologies due to its high level of domain specialisation and creativity but the benefits could be significant. Recent developments of base level technologies such as artificial intelligence (AI)/machine learning (ML), robotics and high performance computing (HPC), to name a few, present disruptive and transformative technologies which could offer new insights, discovery methods and enhanced chemical control when combined in a digital ecosystem of connectivity, distributive services and decentralisation. At the fundamental level, research in these technologies has shown that new physical and chemical insights can be gained, which in turn can augment experimental R&D approaches through physics-based chemical simulation, data driven models and hybrid approaches. In all of these cases, high quality data is required to build and validate models in addition to the skills and expertise to exploit such methods. In this article we give an overview of some of the digital technology demonstrators we have developed for formulated product R&D. We discuss the challenges in building and deploying these demonstrators.<br>


Author(s):  
В.В. ГОРДЕЕВ ◽  
В.Е. ХАЗАНОВ

При выборе типа доильной установки и ее размера необходимо учитывать максимальное планируемое поголовье дойных коров и размер технологической группы, кратность и время одного доения, продолжительность рабочей смены дояров. Анализ технико-экономических показателей наиболее распространенных на сегодняшний день типов доильных установок одинакового технического уровня свидетельствует, что наилучшие удельные показатели имеет установка типа «Карусель» (1), а установка типа «Елочка» (2) требует более высоких затрат труда и средств. Установка «Параллель» (3) занимает промежуточное положение. Из анализа пропускной способности и количества необходимых операторов: установка 2 рекомендована для ферм с поголовьем дойного стада до 600 голов, 3 — не более 1200 дойных коров, 1 — более 1200 дойных коров. «Карусель» — наиболее рациональный, высокопроизводительный, легко автоматизируемый и, следовательно, перспективный способ доения в залах, особенно для крупных молочных ферм. The choice of the proper type and size of milking installations needs to take into account the maximum planned number of dairy cows, the size of a technological group, the number of milkings per day, and the duration of one milking and the operator's working shift. The analysis of technical and economic indicators of currently most common types of milking machines of the same technical level revealed that the Carousel installation had the best specific indicators while the Herringbone installation featured higher labour inputs and cash costs. The Parallel installation was found somewhere in between. In terms of the throughput and the required number of operators Herringbone is recommended for farms with up to 600 dairy cows, Parallel — below 1200 dairy cows, Carousel — above 1200 dairy cows. Carousel was found the most practical, high-performance, easily automated and, therefore, promising milking system for milking parlours, especially on the large-scale dairy farms.


Sign in / Sign up

Export Citation Format

Share Document