Recovering rearranged cancer chromosomes from karyotype graphs

Mapping Intimacies ◽

10.1101/831057 ◽

2019 ◽

Author(s):

Sergey Aganezov ◽

Ilya Zban ◽

Vitaly Aksenov ◽

Nikita Alexeev ◽

Michael C. Schatz

Keyword(s):

Large Scale ◽

Evolutionary Process ◽

Copy Number Variations ◽

Graph Representation ◽

Covering Problem ◽

Cancer Dataset ◽

Decomposition Problem ◽

Genomic Changes ◽

Circular Structure ◽

Cancer Genomes

AbstractMany cancer genomes are extensively rearranged with highly aberrant chromosomal karyotypes. Structural and copy number variations in cancer genomes can be determined via abnormal mapping of sequenced reads to the reference genome. Recently it became possible to reconcile both of these types of large-scale variations into a karyotype graph representation of the rearranged cancer genomes. Such a representation, however, does not directly describe the linear and/or circular structure of the underlying rearranged cancer chromosomes, thus limiting possible analysis of cancer genomes somatic evolutionary process as well as functional genomic changes brought by the large-scale genome rearrangements.Here we address the aforementioned limitation by introducing a novel methodological framework for recovering rearranged cancer chromosomes from karyotype graphs. For a cancer karyotype graph we formulate an Eulerian Decomposition Problem (EDP) of finding a collection of linear and/or circular rearranged cancer chromosomes that are determined by the graph. We derive and prove computational complexities for several variations of the EDP. We then demonstrate that Eulerian decomposition of the cancer karyotype graphs is not always unique and present the Consistent Contig Covering Problem (CCCP) of recovering unambiguous cancer contigs from the cancer karyotype graph, and describe a novel algorithm CCR capable of solving CCCP in polynomial time.We apply CCR on a prostate cancer dataset and demonstrate that it is capable of consistently recovering large cancer contigs even when underlying cancer genomes are highly rearranged. CCR can recover rearranged cancer contigs from karyotype graphs thereby addressing existing limitation in inferring chromosomal structures of rearranged cancer genomes and advancing our understanding of both patient/cancer-specific as well as the overall genetic instability in cancer.

Download Full-text

Recovering rearranged cancer chromosomes from karyotype graphs

BMC Bioinformatics ◽

10.1186/s12859-019-3208-4 ◽

2019 ◽

Vol 20 (S20) ◽

Cited By ~ 2

Author(s):

Sergey Aganezov ◽

Ilya Zban ◽

Vitaly Aksenov ◽

Nikita Alexeev ◽

Michael C. Schatz

Keyword(s):

Large Scale ◽

Evolutionary Process ◽

Copy Number Variations ◽

Graph Representation ◽

Covering Problem ◽

Cancer Dataset ◽

Decomposition Problem ◽

Genomic Changes ◽

Circular Structure ◽

Cancer Genomes

Abstract Background Many cancer genomes are extensively rearranged with highly aberrant chromosomal karyotypes. Structural and copy number variations in cancer genomes can be determined via abnormal mapping of sequenced reads to the reference genome. Recently it became possible to reconcile both of these types of large-scale variations into a karyotype graph representation of the rearranged cancer genomes. Such a representation, however, does not directly describe the linear and/or circular structure of the underlying rearranged cancer chromosomes, thus limiting possible analysis of cancer genomes somatic evolutionary process as well as functional genomic changes brought by the large-scale genome rearrangements. Results Here we address the aforementioned limitation by introducing a novel methodological framework for recovering rearranged cancer chromosomes from karyotype graphs. For a cancer karyotype graph we formulate an Eulerian Decomposition Problem (EDP) of finding a collection of linear and/or circular rearranged cancer chromosomes that are determined by the graph. We derive and prove computational complexities for several variations of the EDP. We then demonstrate that Eulerian decomposition of the cancer karyotype graphs is not always unique and present the Consistent Contig Covering Problem (CCCP) of recovering unambiguous cancer contigs from the cancer karyotype graph, and describe a novel algorithm capable of solving CCCP in polynomial time. We apply on a prostate cancer dataset and demonstrate that it is capable of consistently recovering large cancer contigs even when underlying cancer genomes are highly rearranged. Conclusions can recover rearranged cancer contigs from karyotype graphs thereby addressing existing limitation in inferring chromosomal structures of rearranged cancer genomes and advancing our understanding of both patient/cancer-specific as well as the overall genetic instability in cancer.

Download Full-text

A Novel Method to Predict Drug-Target Interactions Based on Large-Scale Graph Representation Learning

Cancers ◽

10.3390/cancers13092111 ◽

2021 ◽

Vol 13 (9) ◽

pp. 2111

Author(s):

Bo-Wei Zhao ◽

Zhu-Hong You ◽

Lun Hu ◽

Zhen-Hao Guo ◽

Lei Wang ◽

...

Keyword(s):

Drug Target ◽

Large Scale ◽

Computational Models ◽

Structural Information ◽

Characteristic Curve ◽

Representation Learning ◽

Graph Representation ◽

Convolutional Network ◽

Novel Method

Identification of drug-target interactions (DTIs) is a significant step in the drug discovery or repositioning process. Compared with the time-consuming and labor-intensive in vivo experimental methods, the computational models can provide high-quality DTI candidates in an instant. In this study, we propose a novel method called LGDTI to predict DTIs based on large-scale graph representation learning. LGDTI can capture the local and global structural information of the graph. Specifically, the first-order neighbor information of nodes can be aggregated by the graph convolutional network (GCN); on the other hand, the high-order neighbor information of nodes can be learned by the graph embedding method called DeepWalk. Finally, the two kinds of feature are fed into the random forest classifier to train and predict potential DTIs. The results show that our method obtained area under the receiver operating characteristic curve (AUROC) of 0.9455 and area under the precision-recall curve (AUPR) of 0.9491 under 5-fold cross-validation. Moreover, we compare the presented method with some existing state-of-the-art methods. These results imply that LGDTI can efficiently and robustly capture undiscovered DTIs. Moreover, the proposed model is expected to bring new inspiration and provide novel perspectives to relevant researchers.

Download Full-text

Genomic alterations caused by HPV integration in a cohort of Chinese endocervical adenocarcinomas

Cancer Gene Therapy ◽

10.1038/s41417-020-00283-4 ◽

2021 ◽

Author(s):

Wenhui Li ◽

Wanjun Lei ◽

Xiaopei Chao ◽

Xiaochen Song ◽

Yalan Bi ◽

...

Keyword(s):

Copy Number ◽

Somatic Mutations ◽

Hpv Infection ◽

Copy Number Variations ◽

Cervical Adenocarcinoma ◽

Structural Variations ◽

Driver Genes ◽

Genetic Changes ◽

Genomic Changes ◽

Whole Exome

AbstractThe association between human papillomavirus (HPV) integration and relevant genomic changes in uterine cervical adenocarcinoma is poorly understood. This study is to depict the genomic mutational landscape in a cohort of 20 patients. HPV+ and HPV− groups were defined as patients with and without HPV integration in the host genome. The genetic changes between these two groups were described and compared by whole-genome sequencing (WGS) and whole-exome sequencing (WES). WGS identified 2916 copy number variations and 743 structural variations. WES identified 6113 somatic mutations, with a mutational burden of 2.4 mutations/Mb. Six genes were predicted as driver genes: PIK3CA, KRAS, TRAPPC12, NDN, GOLGA6L4 and BAIAP3. PIK3CA, NDN, GOLGA6L4, and BAIAP3 were recognized as significantly mutated genes (SMGs). HPV was detected in 95% (19/20) of patients with cervical adenocarcinoma, 7 of whom (36.8%) had HPV integration (HPV+ group). In total, 1036 genes with somatic mutations were confirmed in the HPV+ group, while 289 genes with somatic mutations were confirmed in the group without HPV integration (HPV− group); only 2.1% were shared between the two groups. In the HPV+ group, GOLGA6L4 and BAIAP3 were confirmed as SMGs, while PIK3CA, NDN, KRAS, FUT1, and GOLGA6L64 were identified in the HPV− group. ZDHHC3, PKD1P1, and TGIF2 showed copy number amplifications after HPV integration. In addition, the HPV+ group had significantly more neoantigens. HPV integration rather than HPV infection results in different genomic changes in cervical adenocarcinoma.

Download Full-text

Composite bulges – II. Classical bulges and nuclear discs in barred galaxies: the contrasting cases of NGC 4608 and NGC 4643

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/stab126 ◽

2021 ◽

Vol 502 (2) ◽

pp. 2446-2473

Author(s):

Peter Erwin ◽

Anil Seth ◽

Victor P Debattista ◽

Marja Seidel ◽

Kianusch Mehrgan ◽

...

Keyword(s):

Large Scale ◽

Stellar Kinematics ◽

Barred Galaxies ◽

Circular Structure ◽

Model Predictions ◽

The Galaxy ◽

Central Regions ◽

Total Light ◽

Kinematic Analyses ◽

Stellar Kinematic

ABSTRACT We present detailed morphological, photometric, and stellar-kinematic analyses of the central regions of two massive, early-type barred galaxies with nearly identical large-scale morphologies. Both have large, strong bars with prominent inner photometric excesses that we associate with boxy/peanut-shaped (B/P) bulges; the latter constitute ∼30 per cent of the galaxy light. Inside its B/P bulge, NGC 4608 has a compact, almost circular structure (half-light radius Re ≈ 310 pc, Sérsic n = 2.2) we identify as a classical bulge, amounting to 12.1 per cent of the total light, along with a nuclear star cluster (Re ∼ 4 pc). NGC 4643, in contrast, has a nuclear disc with an unusual broken-exponential surface-brightness profile (13.2 per cent of the light), and a very small spheroidal component (Re ≈ 35 pc, n = 1.6; 0.5 per cent of the light). IFU stellar kinematics support this picture, with NGC 4608’s classical bulge slowly rotating and dominated by high velocity dispersion, while NGC 4643’s nuclear disc shows a drop to lower dispersion, rapid rotation, V–h3 anticorrelation, and elevated h4. Both galaxies show at least some evidence for V–h3correlation in the bar (outside the respective classical bulge and nuclear disc), in agreement with model predictions. Standard two-component (bulge/disc) decompositions yield B/T ∼ 0.5–0.7 (and bulge n > 2) for both galaxies. This overestimates the true ‘spheroid’ components by factors of 4 (NGC 4608) and over 100 (NGC 4643), illustrating the perils of naive bulge-disc decompositions applied to massive barred galaxies.

Download Full-text

A large-scale survey of genetic copy number variations among Han Chinese residing in Taiwan

BMC Genetics ◽

10.1186/1471-2156-9-92 ◽

2008 ◽

Vol 9 (1) ◽

pp. 92 ◽

Cited By ~ 17

Author(s):

Chien-Hsing Lin ◽

Ling-Hui Li ◽

Sheng-Feng Ho ◽

Tzu-Po Chuang ◽

Jer-Yuarn Wu ◽

...

Keyword(s):

Copy Number ◽

Large Scale ◽

Copy Number Variations ◽

Han Chinese ◽

Large Scale Survey

Download Full-text

Abstract 367: Extreme High-Density Lipoprotein Cholesterol Genetics: An Assortment of Large and Small Polygenic Effects

Arteriosclerosis Thrombosis and Vascular Biology ◽

10.1161/atvb.37.suppl_1.367 ◽

2017 ◽

Vol 37 (suppl_1) ◽

Author(s):

Jacqueline S Dron ◽

Jian Wang ◽

Cécile Low-Kam ◽

Sumeet A Khetarpal ◽

John F Robinson ◽

...

Keyword(s):

Large Scale ◽

Genetic Basis ◽

Rare Variants ◽

Association Studies ◽

Density Lipoprotein ◽

Copy Number Variations ◽

Genome Wide Association Studies ◽

Common Variants ◽

Targeted Next Generation Sequencing ◽

Common Genetic Variants

Rationale: Although HDL-C levels are known to have a complex genetic basis, most studies have focused solely on identifying rare variants with large phenotypic effects to explain extreme HDL-C phenotypes. Objective: Here we concurrently evaluate the contribution of both rare and common genetic variants, as well as large-scale copy number variations (CNVs), towards extreme HDL-C concentrations. Methods: In clinically ascertained patients with low ( N =136) and high ( N =119) HDL-C profiles, we applied our targeted next-generation sequencing panel (LipidSeq TM ) to sequence genes involved in HDL metabolism, which were subsequently screened for rare variants and CNVs. We also developed a novel polygenic trait score (PTS) to assess patients’ genetic accumulations of common variants that have been shown by genome-wide association studies to associate primarily with HDL-C levels. Two additional cohorts of patients with extremely low and high HDL-C (total N =1,746 and N =1,139, respectively) were used for PTS validation. Results: In the discovery cohort, 32.4% of low HDL-C patients carried rare variants or CNVs in primary ( ABCA1 , APOA1 , LCAT ) and secondary ( LPL , LMF1 , GPD1 , APOE ) HDL-C–altering genes. Additionally, 13.4% of high HDL-C patients carried rare variants or CNVs in primary ( SCARB1 , CETP , LIPC , LIPG ) and secondary ( APOC3 , ANGPTL4 ) HDL-C–altering genes. For polygenic effects, patients with abnormal HDL-C profiles but without rare variants or CNVs were ~2-fold more likely to have an extreme PTS compared to normolipidemic individuals, indicating an increased frequency of common HDL-C–associated variants in these patients. Similar results in the two validation cohorts demonstrate that this novel PTS successfully quantifies common variant accumulation, further characterizing the polygenic basis for extreme HDL-C phenotypes. Conclusions: Patients with extreme HDL-C levels have various combinations of rare variants, common variants, or CNVs driving their phenotypes. Fully characterizing the genetic basis of HDL-C levels must extend to encompass multiple types of genetic determinants—not just rare variants—to further our understanding of this complex, controversial quantitative trait.

Download Full-text

Host defense mechanisms induce genome instability leading to rapid evolution in an opportunistic fungal pathogen

Infection and Immunity ◽

10.1128/iai.00328-21 ◽

2021 ◽

Author(s):

Amanda Smith ◽

Levi Morran ◽

Meleah A. Hickman

Keyword(s):

Fungal Pathogen ◽

Defense Mechanisms ◽

Large Scale ◽

Genome Instability ◽

Rapid Evolution ◽

Genomic Changes ◽

Opportunistic Fungal Pathogen ◽

Immunocompromised Hosts ◽

Stressful Environments

The ability to generate genetic variation facilitates rapid adaptation in stressful environments. The opportunistic fungal pathogen Candida albicans frequently undergoes large-scale genomic changes, including aneuploidy and loss-of heterozygosity (LOH), following exposure to host environments. However, the specific host factors inducing C. albicans genome instability remain largely unknown. Here, we leveraged the genetic tractability of nematode hosts to investigate whether innate immune components, including antimicrobial peptides (AMPs) and reactive oxygen species (ROS), induced host-associated C. albicans genome instability. C. albicans associated with immunocompetent hosts carried multiple large-scale genomic changes including LOH, whole chromosome, and segmental aneuploidies. In contrast, C. albicans associated with immunocompromised hosts deficient in AMPs or ROS production had reduced LOH frequencies and fewer, if any, additional genomic changes. To evaluate if extensive host-induced genomic changes had long-term consequences for C. albicans adaptation, we experimentally evolved C. albicans in either immunocompetent or immunocompromised hosts and selected for increased virulence. C. albicans evolved in immunocompetent hosts rapidly increased virulence, but not in immunocompromised hosts. Taken together, this work suggests that host-produced ROS and AMPs induces genotypic plasticity in C. albicans which facilitates rapid evolution.

Download Full-text

Multi-modal transportation recommendation with unified route representation learning

Proceedings of the VLDB Endowment ◽

10.14778/3430915.3430924 ◽

2020 ◽

Vol 14 (3) ◽

pp. 342-350

Author(s):

Hao Liu ◽

Jindong Han ◽

Yanjie Fu ◽

Jingbo Zhou ◽

Xinjiang Lu ◽

...

Keyword(s):

Large Scale ◽

Transportation Networks ◽

Representation Learning ◽

Transportation Systems ◽

Graph Representation ◽

Dynamic Graph ◽

Arbitrary Length ◽

Task Learning ◽

Semantic Coherence ◽

Spatio Temporal

Multi-modal transportation recommendation aims to provide the most appropriate travel route with various transportation modes according to certain criteria. After analyzing large-scale navigation data, we find that route representations exhibit two patterns: spatio-temporal autocorrelations within transportation networks and the semantic coherence of route sequences. However, there are few studies that consider both patterns when developing multi-modal transportation systems. To this end, in this paper, we study multi-modal transportation recommendation with unified route representation learning by exploiting both spatio-temporal dependencies in transportation networks and the semantic coherence of historical routes. Specifically, we propose to unify both dynamic graph representation learning and hierarchical multi-task learning for multi-modal transportation recommendations. Along this line, we first transform the multi-modal transportation network into time-dependent multi-view transportation graphs and propose a spatiotemporal graph neural network module to capture the spatial and temporal autocorrelation. Then, we introduce a coherent-aware attentive route representation learning module to project arbitrary-length routes into fixed-length representation vectors, with explicit modeling of route coherence from historical routes. Moreover, we develop a hierarchical multi-task learning module to differentiate route representations for different transport modes, and this is guided by the final recommendation feedback as well as multiple auxiliary tasks equipped in different network layers. Extensive experimental results on two large-scale real-world datasets demonstrate the performance of the proposed system outperforms eight baselines.

Download Full-text

Common adaptive strategies underlie within-host evolution of bacterial pathogens

Molecular Biology and Evolution ◽

10.1093/molbev/msaa278 ◽

2020 ◽

Author(s):

Yair E Gatt ◽

Hanah Margalit

Keyword(s):

Bacterial Infections ◽

Large Scale ◽

Bacterial Species ◽

Adaptive Strategies ◽

Single Species ◽

Adaptive Strategy ◽

Host Immune System ◽

Glycerol Phosphate ◽

Genomic Changes ◽

Virulence Attenuation

Abstract Within-host adaptation is a hallmark of chronic bacterial infections, involving substantial genomic changes. Recent large-scale genomic data from prolonged infections allow the examination of adaptive strategies employed by different pathogens and open the door to investigate whether they converge towards similar strategies. Here, we compiled extensive data of whole-genome sequences of bacterial isolates belonging to miscellaneous species sampled at sequential time points during clinical infections. Analysis of these data revealed that different species share some common adaptive strategies, achieved by mutating various genes. While the same genes were often mutated in several strains within a species, different genes related to the same pathway, structure or function were changed in other species utilizing the same adaptive strategy (e.g. mutating flagellar genes). Strategies exploited by various bacterial species were often predicted to be driven by the host immune system, a powerful selective pressure that is not species-specific. Remarkably, we find adaptive strategies identified previously within single species to be ubiquitous. Two striking examples are shifts from siderophore-based to heme-based iron scavenging (previously shown for Pseudomonas aeruginosa), and changes in glycerol-phosphate metabolism (previously shown to decrease sensitivity to antibiotics in Mycobacterium tuberculosis). Virulence factors were often adaptively affected in different species, indicating shifts from acute to chronic virulence and virulence attenuation during infection. Our study presents a global view on common within-host adaptive strategies employed by different bacterial species and provides a rich resource for further studying these processes.

Download Full-text

Solving Set Cover and Dominating Set via Maximum Satisfiability

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i02.5517 ◽

2020 ◽

Vol 34 (02) ◽

pp. 1569-1576 ◽

Cited By ~ 1

Author(s):

Zhendong Lei ◽

Shaowei Cai

Keyword(s):

Local Search ◽

Large Scale ◽

Dominating Set ◽

Scoring Function ◽

Initial Solution ◽

Set Covering ◽

Set Cover ◽

Covering Problem ◽

Greedy Heuristic ◽

Maximum Satisfiability

The Set Covering Problem (SCP) and Dominating Set Problem (DSP) are NP-hard and have many real world applications. SCP and DSP can be encoded into Maximum Satisfiability (MaxSAT) naturally and the resulting instances share a special structure. In this paper, we develop an efficient local search solver for MaxSAT instances of this kind. Our algorithm contains three phrase: construction, local search and recovery. In construction phrase, we simplify the instance by three reduction rules and construct an initial solution by a greedy heuristic. The initial solution is improved during the local search phrase, which exploits the feature of such instances in the scoring function and the variable selection heuristic. Finally, the corresponding solution of original instance is recovered in the recovery phrase. Experiment results on a broad range of large scale instances of SCP and DSP show that our algorithm significantly outperforms state of the art solvers for SCP, DSP and MaxSAT.

Download Full-text