Troubleshooting unstable molecules in chemical space

Salini Senthil; Sabyasachi Chakraborty; Raghunathan Ramakrishnan

doi:10.1039/d0sc05591c

Intuitive Web-Based Experimental Design for High-Throughput Biomedical Data

BioMed Research International ◽

10.1155/2015/958302 ◽

2015 ◽

Vol 2015 ◽

pp. 1-8 ◽

Cited By ~ 1

Author(s):

Andreas Friedrich ◽

Erhan Kenar ◽

Oliver Kohlbacher ◽

Sven Nahnsen

Keyword(s):

Big Data ◽

Experimental Design ◽

High Throughput ◽

Large Scale ◽

Integrated Design ◽

Added Value ◽

Biomedical Data ◽

Data Generation ◽

Web Based ◽

Data Annotation

Big data bioinformatics aims at drawing biological conclusions from huge and complex biological datasets. Added value from the analysis of big data, however, is only possible if the data is accompanied by accurate metadata annotation. Particularly in high-throughput experiments intelligent approaches are needed to keep track of the experimental design, including the conditions that are studied as well as information that might be interesting for failure analysis or further experiments in the future. In addition to the management of this information, means for an integrated design and interfaces for structured data annotation are urgently needed by researchers. Here, we propose a factor-based experimental design approach that enables scientists to easily create large-scale experiments with the help of a web-based system. We present a novel implementation of a web-based interface allowing the collection of arbitrary metadata. To exchange and edit information we provide a spreadsheet-based, humanly readable format. Subsequently, sample sheets with identifiers and metainformation for data generation facilities can be created. Data files created after measurement of the samples can be uploaded to a datastore, where they are automatically linked to the previously created experimental design model.

Download Full-text

Interference: A Much-Neglected Aspect in High-Throughput Screening of Nanoparticles

International Journal of Toxicology ◽

10.1177/1091581820938335 ◽

2020 ◽

Vol 39 (5) ◽

pp. 397-421

Author(s):

Charlene Andraos ◽

Il Je Yu ◽

Mary Gulumian

Keyword(s):

Big Data ◽

High Throughput ◽

High Throughput Screening ◽

Fluorescent Dyes ◽

Membrane Integrity ◽

Data Generation ◽

Label Free ◽

Predictive Toxicology ◽

Toxicity Assay ◽

Novel Concept

Despite several studies addressing nanoparticle (NP) interference with conventional toxicity assay systems, it appears that researchers still rely heavily on these assays, particularly for high-throughput screening (HTS) applications in order to generate “big” data for predictive toxicity approaches. Moreover, researchers often overlook investigating the different types of interference mechanisms as the type is evidently dependent on the type of assay system implemented. The approaches implemented in the literature appear to be not adequate as it often addresses only one type of interference mechanism with the exclusion of others. For example, interference of NPs that have entered cells would require intracellular assessment of their interference with fluorescent dyes, which has so far been neglected. The present study investigated the mechanisms of interference of gold NPs and silver NPs in assay systems implemented in HTS including optical interference as well as adsorption or catalysis. The conventional assays selected cover all optical read-out systems, that is, absorbance (XTT toxicity assay), fluorescence (CytoTox-ONE Homogeneous membrane integrity assay), and luminescence (CellTiter Glo luminescent assay). Furthermore, this study demonstrated NP quenching of fluorescent dyes also used in HTS (2′,7′-dichlorofluorescein, propidium iodide, and 5,5′,6,6′-tetrachloro-1,1′,3,3′-tetraethyl-benzamidazolocarbocyanin iodide). To conclude, NP interference is, as such, not a novel concept, however, ignoring this aspect in HTS may jeopardize attempts in predictive toxicology. It should be mandatory to report the assessment of all mechanisms of interference within HTS, as well as to confirm results with label-free methodologies to ensure reliable big data generation for predictive toxicology.

Download Full-text

Accelerating organic solar cell material's discovery: high-throughput screening and big data

Energy & Environmental Science ◽

10.1039/d1ee00559f ◽

2021 ◽

Author(s):

Xabier Rodríguez-Martínez ◽

Enrique Pascual-San-José ◽

Mariano Campoy-Quiles

Keyword(s):

Machine Learning ◽

Big Data ◽

High Throughput ◽

Organic Solar Cells ◽

High Throughput Screening ◽

Organic Solar Cell ◽

State Of The Art ◽

Review Article ◽

Machine Learning Algorithms ◽

Device Optimization

This review article presents the state-of-the-art in high-throughput computational and experimental screening routines with application in organic solar cells, including materials discovery, device optimization and machine-learning algorithms.

Download Full-text

Novel thermally activated delayed fluorescence materials by high-throughput virtual screening: going beyond donor–acceptor design

Journal of Materials Chemistry C ◽

10.1039/d1tc00002k ◽

2021 ◽

Vol 9 (9) ◽

pp. 3324-3333 ◽

Cited By ~ 1

Author(s):

Ke Zhao ◽

Ömer H. Omar ◽

Tahereh Nematiaram ◽

Daniele Padula ◽

Alessandro Troisi

Keyword(s):

Virtual Screening ◽

Quantum Chemistry ◽

High Throughput ◽

Delayed Fluorescence ◽

Thermally Activated Delayed Fluorescence ◽

Thermally Activated ◽

Quantum Chemistry Calculations ◽

Donor Acceptor ◽

Molecular Semiconductors ◽

High Throughput Virtual Screening

125 potential TADF candidates are identified through quantum chemistry calculations of 700 molecules derived from a database of 40 000 molecular semiconductors. Most of them are new and some do not belong to the class of donor–acceptor molecules.

Download Full-text

Thirty Years of Geometry Optimization in Quantum Chemistry and Beyond: A Tribute to Berny Schlegel

Journal of Chemical Theory and Computation ◽

10.1021/ct300950r ◽

2012 ◽

Vol 8 (12) ◽

pp. 4853-4855

Author(s):

Hrant P. Hratchian ◽

Xiaosong Li

Keyword(s):

Quantum Chemistry ◽

Geometry Optimization

Download Full-text

Big data quality framework: a holistic approach to continuous quality management

Journal Of Big Data ◽

10.1186/s40537-021-00468-0 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Ikbal Taleb ◽

Mohamed Adel Serhani ◽

Chafik Bouhaddioui ◽

Rachida Dssouli

Keyword(s):

Big Data ◽

Quality Management ◽

Data Quality ◽

Value Added ◽

Holistic Approach ◽

Research Area ◽

Heterogeneous Data ◽

Data Generation ◽

Continuous Quality ◽

Quality Profile

AbstractBig Data is an essential research area for governments, institutions, and private agencies to support their analytics decisions. Big Data refers to all about data, how it is collected, processed, and analyzed to generate value-added data-driven insights and decisions. Degradation in Data Quality may result in unpredictable consequences. In this case, confidence and worthiness in the data and its source are lost. In the Big Data context, data characteristics, such as volume, multi-heterogeneous data sources, and fast data generation, increase the risk of quality degradation and require efficient mechanisms to check data worthiness. However, ensuring Big Data Quality (BDQ) is a very costly and time-consuming process, since excessive computing resources are required. Maintaining Quality through the Big Data lifecycle requires quality profiling and verification before its processing decision. A BDQ Management Framework for enhancing the pre-processing activities while strengthening data control is proposed. The proposed framework uses a new concept called Big Data Quality Profile. This concept captures quality outline, requirements, attributes, dimensions, scores, and rules. Using Big Data profiling and sampling components of the framework, a faster and efficient data quality estimation is initiated before and after an intermediate pre-processing phase. The exploratory profiling component of the framework plays an initial role in quality profiling; it uses a set of predefined quality metrics to evaluate important data quality dimensions. It generates quality rules by applying various pre-processing activities and their related functions. These rules mainly aim at the Data Quality Profile and result in quality scores for the selected quality attributes. The framework implementation and dataflow management across various quality management processes have been discussed, further some ongoing work on framework evaluation and deployment to support quality evaluation decisions conclude the paper.

Download Full-text

Big Data and Big Data Analytics for Improved Healthcare Service and Management

International Journal of Privacy and Health Information Management ◽

10.4018/ijphim.2020010102 ◽

2020 ◽

Vol 8 (1) ◽

pp. 13-51

Author(s):

Pijush Kanti Dutta Pramanik ◽

Saurabh Pal ◽

Moutan Mukhopadhyay

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Healthcare Services ◽

Healthcare Sector ◽

Future Market ◽

Data Generation ◽

Related Data ◽

Healthcare Data ◽

Different Types

Like other fields, the healthcare sector has also been greatly impacted by big data. A huge volume of healthcare data and other related data are being continually generated from diverse sources. Tapping and analysing these data, suitably, would open up new avenues and opportunities for healthcare services. In view of that, this paper aims to present a systematic overview of big data and big data analytics, applicable to modern-day healthcare. Acknowledging the massive upsurge in healthcare data generation, various ‘V's, specific to healthcare big data, are identified. Different types of data analytics, applicable to healthcare, are discussed. Along with presenting the technological backbone of healthcare big data and analytics, the advantages and challenges of healthcare big data are meticulously explained. A brief report on the present and future market of healthcare big data and analytics is also presented. Besides, several applications and use cases are discussed with sufficient details.

Download Full-text

IoT-Based Big Data

International Journal on Semantic Web and Information Systems ◽

10.4018/ijswis.2017010103 ◽

2017 ◽

Vol 13 (1) ◽

pp. 28-47 ◽

Cited By ~ 52

Author(s):

M. Mazhar Rathore ◽

Anand Paul ◽

Awais Ahmad ◽

Gwanggil Jeon

Keyword(s):

Decision Making ◽

Big Data ◽

City Planning ◽

Water System ◽

Planning System ◽

Data Generation ◽

Next Generation ◽

Smart Systems ◽

Tier 1 ◽

Iot Devices

Recently, a rapid growth in the population in urban regions demands the provision of services and infrastructure. These needs can be come up wit the use of Internet of Things (IoT) devices, such as sensors, actuators, smartphones and smart systems. This leans to building Smart City towards the next generation Super City planning. However, as thousands of IoT devices are interconnecting and communicating with each other over the Internet to establish smart systems, a huge amount of data, termed as Big Data, is being generated. It is a challenging task to integrate IoT services and to process Big Data in an efficient way when aimed at decision making for future Super City. Therefore, to meet such requirements, this paper presents an IoT-based system for next generation Super City planning using Big Data Analytics. Authors have proposed a complete system that includes various types of IoT-based smart systems like smart home, vehicular networking, weather and water system, smart parking, and surveillance objects, etc., for dada generation. An architecture is proposed that includes four tiers/layers i.e., 1) Bottom Tier-1, 2) Intermediate Tier-1, 3) Intermediate Tier 2, and 4) Top Tier that handle data generation and collections, communication, data administration and processing, and data interpretation, respectively. The system implementation model is presented from the generation and collection of data to the decision making. The proposed system is implemented using Hadoop ecosystem with MapReduce programming. The throughput and processing time results show that the proposed Super City planning system is more efficient and scalable.

Download Full-text

ChemSpaX: Exploration of Chemical Space by Automated Functionalization of Molecular Scaffold

10.26434/chemrxiv.14617320 ◽

2021 ◽

Author(s):

Adarsh Kalikadien ◽

Evgeny A. Pidko ◽

Vivek Sinha

Keyword(s):

Force Field ◽

High Throughput ◽

High Throughput Screening ◽

Chemical Space ◽

Space Exploration ◽

Molecular Structures ◽

Data Driven ◽

Pincer Complexes ◽

Cobalt Porphyrin ◽

Input Structure

<div>Local chemical space exploration of an experimentally synthesized material can be done by making slight structural</div><div>variations of the synthesized material. This generation of many molecular structures with reasonable quality,</div><div>that resemble an existing (chemical) purposeful material, is needed for high-throughput screening purposes in</div><div>material design. Large databases of geometry and chemical properties of transition metal complexes are not</div><div>readily available, although these complexes are widely used in homogeneous catalysis. A Python-based workflow,</div><div>ChemSpaX, that is aimed at automating local chemical space exploration for any type of molecule, is introduced.</div><div>The overall computational workflow of ChemSpaX is explained in more detail. ChemSpaX uses 3D information,</div><div>to place functional groups on an input structure. For example, the input structure can be a catalyst for which one</div><div>wants to use high-throughput screening to investigate if the catalytic activity can be improved. The newly placed</div><div>substituents are optimized using a computationally cheap force-field optimization method. After placement of</div><div>new substituents, higher level optimizations using xTB or DFT instead of force-field optimization are also possible</div><div>in the current workflow. In representative applications of ChemSpaX, it is shown that the structures generated by</div><div>ChemSpaX have a reasonable quality for usage in high-throughput screening applications. Representative applications</div><div>of ChemSpaX are shown by investigating various adducts on functionalized Mn-based pincer complexes,</div><div>hydrogenation of Ru-based pincer complexes, functionalization of cobalt porphyrin complexes and functionalization</div><div>of a bipyridyl functionalized cobalt-porphyrin trapped in a M2L4 type cage complex. Descriptors such as</div><div>the Gibbs free energy of reaction and HOMO-LUMO gap, that can be used in data-driven design and discovery</div><div>of catalysts, were selected and studied in more detail for the selected use cases. The relatively fast GFN2-xTB</div><div>method was used to calculate these descriptors and a comparison was done against DFT calculated descriptors.</div><div>ChemSpaX is open-source and aims to bolster the efforts of the scientific community towards data-driven material</div><div>discovery.</div>

Download Full-text

SpiderSeqR: an R package for crawling the web of high-throughput multi-omic data repositories for data-sets and annotation

10.1101/2020.04.13.039420 ◽

2020 ◽

Author(s):

Anna M. Sozanska ◽

Charles Fletcher ◽

Dóra Bihary ◽

Shamith A. Samarajiwa

Keyword(s):

High Throughput ◽

R Package ◽

Data Reuse ◽

Massively Parallel ◽

Data Sets ◽

Similar Data ◽

Data Generation ◽

Data Repositories ◽

Public Data ◽

Omic Data

AbstractMore than three decades ago, the microarray revolution brought about high-throughput data generation capability to biology and medicine. Subsequently, the emergence of massively parallel sequencing technologies led to many big-data initiatives such as the human genome project and the encyclopedia of DNA elements (ENCODE) project. These, in combination with cheaper, faster massively parallel DNA sequencing capabilities, have democratised multi-omic (genomic, transcriptomic, translatomic and epigenomic) data generation leading to a data deluge in bio-medicine. While some of these data-sets are trapped in inaccessible silos, the vast majority of these data-sets are stored in public data resources and controlled access data repositories, enabling their wider use (or misuse). Currently, most peer reviewed publications require the deposition of the data-set associated with a study under consideration in one of these public data repositories. However, clunky and difficult to use interfaces, subpar or incomplete annotation prevent discovering, searching and filtering of these multi-omic data and hinder their re-purposing in other use cases. In addition, the proliferation of multitude of different data repositories, with partially redundant storage of similar data are yet another obstacle to their continued usefulness. Similarly, interfaces where annotation is spread across multiple web pages, use of accession identifiers with ambiguous and multiple interpretations and lack of good curation make these data-sets difficult to use. We have produced SpiderSeqR, an R package, whose main features include the integration between NCBI GEO and SRA databases, enabling an integrated unified search of SRA and GEO data-sets and associated annotations, conversion between database accessions, as well as convenient filtering of results and saving past queries for future use. All of the above features aim to promote data reuse to facilitate making new discoveries and maximising the potential of existing data-sets.Availabilityhttps://github.com/ss-lab-cancerunit/SpiderSeqR

Download Full-text