Fishing With (Proto)Net—A Principled Approach to Protein Target Selection

Michal Linial

doi:10.1002/cfg.328

Fishing With (Proto)Net—A Principled Approach to Protein Target Selection

Comparative and Functional Genomics ◽

10.1002/cfg.328 ◽

2003 ◽

Vol 4 (5) ◽

pp. 542-548

Author(s):

Michal Linial

Keyword(s):

Structural Genomics ◽

Target Selection ◽

Structural Diversity ◽

Structural Data ◽

Agglomerative Clustering ◽

Homologous Proteins ◽

Hierarchical Agglomerative Clustering ◽

Global Classification ◽

Protein Space

Structural genomics strives to represent the entire protein space. The first step towards achieving this goal is by rationally selecting proteins whose structures have not been determined, but that represent an as yet unknown structural superfamily or fold. Once such a structure is solved, it can be used as a template for modelling homologous proteins. This will aid in unveiling the structural diversity of the protein space. Currently, no reliable method for accurate 3D structural prediction is available when a sequence or a structure homologue is not available. Here we present a systematic methodology for selecting target proteins whose structure is likely to adopt a new, as yet unknown superfamily or fold. Our method takes advantage of a global classification of the sequence space as presented by ProtoNet-3D, which is a hierarchical agglomerative clustering of the proteins of interest (the proteins in Swiss-Prot) along with all solved structures (taken from the PDB). By navigating in the scaffold of ProtoNet-3D, we yield a prioritized list of proteins that are not yet structurally solved, along with the probability of each of the proteins belonging to a new superfamily or fold. The sorted list has been self-validated against real structural data that was not available when the predictions were made. The practical application of using our computational–statistical method to determine novel superfamilies for structural genomics projects is also discussed.

Download Full-text

Landings profiles and potential métiers in Greek set longliners

ICES Journal of Marine Science ◽

10.1093/icesjms/fsp279 ◽

2009 ◽

Vol 67 (4) ◽

pp. 646-656 ◽

Cited By ~ 15

Author(s):

Stelios Katsanevakis ◽

Christos D. Maravelias ◽

Laurie T. Kell

Keyword(s):

Aegean Sea ◽

Sea Bream ◽

Ionian Sea ◽

Agglomerative Clustering ◽

Small Vessels ◽

Step Procedure ◽

Diplodus Sargus ◽

Hierarchical Agglomerative Clustering ◽

Pagrus Pagrus

Abstract Katsanevakis, S., Maravelias, C. D., and Kell, L. T. 2010. Landings profiles and potential métiers in Greek set longliners. – ICES Journal of Marine Science, 67: 646–656. A very large number (>14 000) of generally small vessels operate as longliners in Greek seas. The aim of this study was to identify potential set longline métiers, based on a large sample of landings records from all over Greece. Landings data from set longliners between 2002 and 2006, collected from several ports in the Aegean and East Ionian Sea, were used. The landings profiles were grouped using a two-step procedure, the first involving factorial analysis of the log-transformed landing profiles, and the second a classification of the factorial coordinates (hierarchical agglomerative clustering). In all, 13 métiers were identified in the Aegean Sea and 7 in the Ionian Sea. The most important métiers identified were those targeting white sea bream (Diplodus sargus), hake (Merluccius merluccius), common sea bream (Pagrus pagrus), and common pandora (Pagellus erythrinus), and mixed métiers. Varying spatial (within the Aegean and Ionian Seas) and seasonal patterns were evident for the métiers identified, indicating that fisher motivation to engage in a specific métier varies both spatially and temporally.

Download Full-text

Application of Hierarchical Agglomerative Clustering (HAC) for Systemic Classification of Pop-Up Housing (PUH) Environments

Applied Sciences ◽

10.3390/app112311122 ◽

2021 ◽

Vol 11 (23) ◽

pp. 11122

Author(s):

Thomas Märzinger ◽

Jan Kotík ◽

Christoph Pfeifer

Keyword(s):

Urban Planning ◽

Hierarchical Clustering ◽

Clustering Algorithms ◽

Field Studies ◽

Assessment Tools ◽

Agglomerative Clustering ◽

Research Project ◽

Minkowski Metric ◽

Hierarchical Agglomerative Clustering

This paper is the result of the first-phase, inter-disciplinary work of a multi-disciplinary research project (“Urban pop-up housing environments and their potential as local innovation systems”) consisting of energy engineers and waste managers, landscape architects and spatial planners, innovation researchers and technology assessors. The project is aiming at globally analyzing and describing existing pop-up housings (PUH), developing modeling and assessment tools for sustainable, energy-efficient and socially innovative temporary housing solutions (THS), especially for sustainable and resilient urban structures. The present paper presents an effective application of hierarchical agglomerative clustering (HAC) for analyses of large datasets typically derived from field studies. As can be shown, the method, although well-known and successfully established in (soft) computing science, can also be used very constructively as a potential urban planning tool. The main aim of the underlying multi-disciplinary research project was to deeply analyze and structure THS and PUE. Multiple aspects are to be considered when it comes to the characterization and classification of such environments. A thorough (global) web survey of PUH and analysis of scientific literature concerning descriptive work of PUH and THS has been performed. Moreover, out of several tested different approaches and methods for classifying PUH, hierarchical clustering algorithms functioned well when properly selected metrics and cut-off criteria were applied. To be specific, the ‘Minkowski’-metric and the ‘Calinski-Harabasz’-criteria, as clustering indices, have shown the best overall results in clustering the inhomogeneous data concerning PUH. Several additional algorithms/functions derived from the field of hierarchical clustering have also been tested to exploit their potential in interpreting and graphically analyzing particular structures and dependencies in the resulting clusters. Hereby, (math.) the significance ‘S’ and (math.) proportion ‘P’ have been concluded to yield the best interpretable and comprehensible results when it comes to analyzing the given set (objects n = 85) of researched PUH-objects together with their properties (n > 190). The resulting easily readable graphs clearly demonstrate the applicability and usability of hierarchical clustering- and their derivative algorithms for scientifically profound building classification tasks in Urban Planning by effectively managing huge inhomogeneous building datasets.

Download Full-text

Hierarchical Agglomerative Clustering approach for Automated Attribute Classification of the Health Care Domain from User Generated Reviews on Web 2.0

2020 IEEE International Conference on Computing, Power and Communication Technologies (GUCON) ◽

10.1109/gucon48875.2020.9231122 ◽

2020 ◽

Author(s):

Saroj Kushwaha ◽

Sanjoy Das

Keyword(s):

Health Care ◽

Web 2.0 ◽

Agglomerative Clustering ◽

Hierarchical Agglomerative Clustering ◽

Attribute Classification ◽

Clustering Approach

Download Full-text

Clustering Techniques for Secondary Substations Siting

Energies ◽

10.3390/en14041028 ◽

2021 ◽

Vol 14 (4) ◽

pp. 1028

Author(s):

Silvia Corigliano ◽

Federico Rosato ◽

Carla Ortiz Dominguez ◽

Marco Merlo

Keyword(s):

Rural Areas ◽

Urban Areas ◽

Universal Access ◽

Distribution Networks ◽

Industrialized Countries ◽

Agglomerative Clustering ◽

Clustering Techniques ◽

Hierarchical Agglomerative Clustering ◽

Efficient Planning ◽

Target Set

The scientific community is active in developing new models and methods to help reach the ambitious target set by UN SDGs7: universal access to electricity by 2030. Efficient planning of distribution networks is a complex and multivariate task, which is usually split into multiple subproblems to reduce the number of variables. The present work addresses the problem of optimal secondary substation siting, by means of different clustering techniques. In contrast with the majority of approaches found in the literature, which are devoted to the planning of MV grids in already electrified urban areas, this work focuses on greenfield planning in rural areas. K-means algorithm, hierarchical agglomerative clustering, and a method based on optimal weighted tree partitioning are adapted to the problem and run on two real case studies, with different population densities. The algorithms are compared in terms of different indicators useful to assess the feasibility of the solutions found. The algorithms have proven to be effective in addressing some of the crucial aspects of substations siting and to constitute relevant improvements to the classic K-means approach found in the literature. However, it is found that it is very challenging to conjugate an acceptable geographical span of the area served by a single substation with a substation power high enough to justify the installation when the load density is very low. In other words, well known standards adopted in industrialized countries do not fit with developing countries’ requirements.

Download Full-text

Identifying organ dysfunction trajectory-based subphenotypes in critically ill patients with COVID-19

Scientific Reports ◽

10.1038/s41598-021-95431-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Chang Su ◽

Zhenxing Xu ◽

Katherine Hoffman ◽

Parag Goyal ◽

Monika M. Safford ◽

...

Keyword(s):

New York ◽

Respiratory Failure ◽

Sofa Score ◽

Severity Of Illness ◽

Agglomerative Clustering ◽

Baseline Severity ◽

Organ Systems ◽

Hierarchical Agglomerative Clustering ◽

Dynamic Time ◽

Post Intubation

AbstractCOVID-19-associated respiratory failure offers the unprecedented opportunity to evaluate the differential host response to a uniform pathogenic insult. Understanding whether there are distinct subphenotypes of severe COVID-19 may offer insight into its pathophysiology. Sequential Organ Failure Assessment (SOFA) score is an objective and comprehensive measurement that measures dysfunction severity of six organ systems, i.e., cardiovascular, central nervous system, coagulation, liver, renal, and respiration. Our aim was to identify and characterize distinct subphenotypes of COVID-19 critical illness defined by the post-intubation trajectory of SOFA score. Intubated COVID-19 patients at two hospitals in New York city were leveraged as development and validation cohorts. Patients were grouped into mild, intermediate, and severe strata by their baseline post-intubation SOFA. Hierarchical agglomerative clustering was performed within each stratum to detect subphenotypes based on similarities amongst SOFA score trajectories evaluated by Dynamic Time Warping. Distinct worsening and recovering subphenotypes were identified within each stratum, which had distinct 7-day post-intubation SOFA progression trends. Patients in the worsening suphenotypes had a higher mortality than those in the recovering subphenotypes within each stratum (mild stratum, 29.7% vs. 10.3%, p = 0.033; intermediate stratum, 29.3% vs. 8.0%, p = 0.002; severe stratum, 53.7% vs. 22.2%, p < 0.001). Pathophysiologic biomarkers associated with progression were distinct at each stratum, including findings suggestive of inflammation in low baseline severity of illness versus hemophagocytic lymphohistiocytosis in higher baseline severity of illness. The findings suggest that there are clear worsening and recovering subphenotypes of COVID-19 respiratory failure after intubation, which are more predictive of outcomes than baseline severity of illness. Distinct progression biomarkers at differential baseline severity of illness suggests a heterogeneous pathobiology in the progression of COVID-19 respiratory failure.

Download Full-text

Global classification of NPGCs

Nature Reviews Drug Discovery ◽

10.1038/nrd4500 ◽

2014 ◽

Vol 13 (12) ◽

pp. 888-888

Author(s):

Sarah Crunkhorn

Keyword(s):

Global Classification

Download Full-text

An inshore–offshore sorting system revealed from global classification of ocean litter

Nature Sustainability ◽

10.1038/s41893-021-00720-8 ◽

2021 ◽

Vol 4 (6) ◽

pp. 484-493

Author(s):

Carmen Morales-Caselles ◽

Josué Viejo ◽

Elisa Martí ◽

Daniel González-Fernández ◽

Hannah Pragnell-Raasch ◽

...

Keyword(s):

Global Classification ◽

Sorting System

Download Full-text

Three-Dimensional Structures of Carbohydrates and Where to Find Them

International Journal of Molecular Sciences ◽

10.3390/ijms21207702 ◽

2020 ◽

Vol 21 (20) ◽

pp. 7702 ◽

Cited By ~ 1

Author(s):

Sofya I. Scherbinina ◽

Philip V. Toukach

Keyword(s):

Experimental Data ◽

Molecular Modeling ◽

Computational Methods ◽

Three Dimensional ◽

Structural Diversity ◽

Structural Data ◽

Structural Features ◽

Data Validation ◽

Data Generation ◽

Efficient Treatment

Analysis and systematization of accumulated data on carbohydrate structural diversity is a subject of great interest for structural glycobiology. Despite being a challenging task, development of computational methods for efficient treatment and management of spatial (3D) structural features of carbohydrates breaks new ground in modern glycoscience. This review is dedicated to approaches of chemo- and glyco-informatics towards 3D structural data generation, deposition and processing in regard to carbohydrates and their derivatives. Databases, molecular modeling and experimental data validation services, and structure visualization facilities developed for last five years are reviewed.

Download Full-text

Toward a global classification of mast cell activation diseases

Journal of Allergy and Clinical Immunology ◽

10.1016/j.jaci.2010.12.1113 ◽

2011 ◽

Vol 127 (5) ◽

pp. 1311 ◽

Cited By ~ 5

Author(s):

Gerhard J. Molderings ◽

Jürgen Homann ◽

Martin Raithel ◽

Thomas Frieling

Keyword(s):

Mast Cell ◽

Cell Activation ◽

Mast Cell Activation ◽

Global Classification

Download Full-text

Target Selection for Structural Genomics: A Single Genome Approach

OMICS A Journal of Integrative Biology ◽

10.1089/153623102321112773 ◽

2002 ◽

Vol 6 (4) ◽

pp. 349-362 ◽

Cited By ~ 3

Author(s):

Igor V. Grigoriev ◽

In-Geol Choi

Keyword(s):

Structural Genomics ◽

Target Selection ◽

Single Genome ◽

Selection For

Download Full-text