Reanalysis of global proteomic and phosphoproteomic data identified a large number of glycopeptides

Mapping Intimacies ◽

10.1101/233247 ◽

2017 ◽

Cited By ~ 3

Author(s):

Yingwei Hu ◽

Punit Shah ◽

David J. Clark ◽

Minghui Ao ◽

Hui Zhang

Keyword(s):

Breast Cancer ◽

Protein Function ◽

Large Scale ◽

Vital Role ◽

Protein Glycosylation ◽

Data Sets ◽

Data Set ◽

Breast Cancer Xenograft ◽

Proteomic Data ◽

Large Scale Analysis

ABSTRACTProtein glycosylation plays fundamental roles in many cellular processes, and previous reports have shown dysregulation to be associated with several human diseases, including diabetes, cancer, and neurodegenerative disorders. Despite the vital role of glycosylation for proper protein function, the analysis of glycoproteins has been lagged behind to other protein modifications. In this study, we describe the re-analysis of global proteomic data from breast cancer xenograft tissues using recently developed software package GPQuest 2.0, revealing a large number of previously unidentifiedN-linked glycopeptides. More importantly, we found that using immobilized metal affinity chromatography (IMAC) technology for the enrichment of phosphopeptides had co-enriched a substantial number of sialoglycopeptides, allowing for a large-scale analysis of sialoglycopeptides in conjunction with the analysis of phosphopeptides. Collectively, combined MS/MS analyses of global proteomic and phosphoproteomic datasets resulted in the identification of 6,724 N-linked glycopeptides from 617 glycoproteins derived from two breast cancer xenograft tissues. Next, we utilized GPQuest for the re-analysis of global and phosphoproteomic data generated from 108 human breast cancer tissues that were previously analyzed by Clinical Proteomic Analysis Consortium (CPTAC). Reanalysis of the CPTAC dataset resulted in the identification of 2,683 glycopeptides from the global proteomic data set and 4,554 glycopeptides from phosphoproteomic data set, respectively. Together, 11,292 N-linked glycopeptides corresponding to 1,731 N-linked glycosites from 883 human glycoproteins were identified from the two data sets. This analysis revealed an extensive number of glycopeptides hidden in the global and enriched in IMAC-based phosphopeptide-enriched proteomic data, information which would have remained unknown from the original study otherwise. The reanalysis described herein can be readily applied to identify glycopeptides from already existing data sets, providing insight into many important facets of protein glycosylation in different biological, physiological, and pathological processes.

Download Full-text

Galaxy spin direction distribution in HST and SDSS show similar large-scale asymmetry

Publications of the Astronomical Society of Australia ◽

10.1017/pasa.2020.46 ◽

2020 ◽

Vol 37 ◽

Author(s):

Lior Shamir

Keyword(s):

Large Scale ◽

Spiral Galaxies ◽

Hubble Space Telescope ◽

Gravitational Interaction ◽

Large Data ◽

Sloan Digital Sky Survey ◽

Data Sets ◽

Dipole Axis ◽

Data Set ◽

The Asymmetry

Abstract Several recent observations using large data sets of galaxies showed non-random distribution of the spin directions of spiral galaxies, even when the galaxies are too far from each other to have gravitational interaction. Here, a data set of $\sim8.7\cdot10^3$ spiral galaxies imaged by Hubble Space Telescope (HST) is used to test and profile a possible asymmetry between galaxy spin directions. The asymmetry between galaxies with opposite spin directions is compared to the asymmetry of galaxies from the Sloan Digital Sky Survey. The two data sets contain different galaxies at different redshift ranges, and each data set was annotated using a different annotation method. The results show that both data sets show a similar asymmetry in the COSMOS field, which is covered by both telescopes. Fitting the asymmetry of the galaxies to cosine dependence shows a dipole axis with probabilities of $\sim2.8\sigma$ and $\sim7.38\sigma$ in HST and SDSS, respectively. The most likely dipole axis identified in the HST galaxies is at $(\alpha=78^{\rm o},\delta=47^{\rm o})$ and is well within the $1\sigma$ error range compared to the location of the most likely dipole axis in the SDSS galaxies with $z>0.15$ , identified at $(\alpha=71^{\rm o},\delta=61^{\rm o})$ .

Download Full-text

The Midlatitude Continental Convective Clouds Experiment (MC3E) sounding network: operations, processing and analysis

Atmospheric Measurement Techniques ◽

10.5194/amt-8-421-2015 ◽

2015 ◽

Vol 8 (1) ◽

pp. 421-434 ◽

Cited By ~ 18

Author(s):

M. P. Jensen ◽

T. Toto ◽

D. Troyan ◽

P. E. Ciesielski ◽

D. Holdridge ◽

...

Keyword(s):

Large Scale ◽

Scale Model ◽

Data Sets ◽

Central Plains ◽

Data Set ◽

Convective Systems ◽

Convective Clouds ◽

Quality Checks ◽

Network Operations ◽

The Impact

Abstract. The Midlatitude Continental Convective Clouds Experiment (MC3E) took place during the spring of 2011 centered in north-central Oklahoma, USA. The main goal of this field campaign was to capture the dynamical and microphysical characteristics of precipitating convective systems in the US Central Plains. A major component of the campaign was a six-site radiosonde array designed to capture the large-scale variability of the atmospheric state with the intent of deriving model forcing data sets. Over the course of the 46-day MC3E campaign, a total of 1362 radiosondes were launched from the enhanced sonde network. This manuscript provides details on the instrumentation used as part of the sounding array, the data processing activities including quality checks and humidity bias corrections and an analysis of the impacts of bias correction and algorithm assumptions on the determination of convective levels and indices. It is found that corrections for known radiosonde humidity biases and assumptions regarding the characteristics of the surface convective parcel result in significant differences in the derived values of convective levels and indices in many soundings. In addition, the impact of including the humidity corrections and quality controls on the thermodynamic profiles that are used in the derivation of a large-scale model forcing data set are investigated. The results show a significant impact on the derived large-scale vertical velocity field illustrating the importance of addressing these humidity biases.

Download Full-text

A fast methodology for large-scale focusing inversion of gravity and magnetic data using the structured model matrix and the 2-D fast Fourier transform

Geophysical Journal International ◽

10.1093/gji/ggaa372 ◽

2020 ◽

Vol 223 (2) ◽

pp. 1378-1397

Author(s):

Rosemary A Renaut ◽

Jarom D Hogue ◽

Saeed Vatankhah ◽

Shuang Liu

Keyword(s):

Fourier Transform ◽

Fast Fourier Transform ◽

Linear Systems ◽

Large Scale ◽

Surface Measurement ◽

Magnetic Data ◽

Uniform Grid ◽

Data Sets ◽

Inversion Algorithm ◽

Data Set

SUMMARY We discuss the focusing inversion of potential field data for the recovery of sparse subsurface structures from surface measurement data on a uniform grid. For the uniform grid, the model sensitivity matrices have a block Toeplitz Toeplitz block structure for each block of columns related to a fixed depth layer of the subsurface. Then, all forward operations with the sensitivity matrix, or its transpose, are performed using the 2-D fast Fourier transform. Simulations are provided to show that the implementation of the focusing inversion algorithm using the fast Fourier transform is efficient, and that the algorithm can be realized on standard desktop computers with sufficient memory for storage of volumes up to size n ≈ 106. The linear systems of equations arising in the focusing inversion algorithm are solved using either Golub–Kahan bidiagonalization or randomized singular value decomposition algorithms. These two algorithms are contrasted for their efficiency when used to solve large-scale problems with respect to the sizes of the projected subspaces adopted for the solutions of the linear systems. The results confirm earlier studies that the randomized algorithms are to be preferred for the inversion of gravity data, and for data sets of size m it is sufficient to use projected spaces of size approximately m/8. For the inversion of magnetic data sets, we show that it is more efficient to use the Golub–Kahan bidiagonalization, and that it is again sufficient to use projected spaces of size approximately m/8. Simulations support the presented conclusions and are verified for the inversion of a magnetic data set obtained over the Wuskwatim Lake region in Manitoba, Canada.

Download Full-text

Six years of total ozone column measurements from SCIAMACHY nadir observations

Atmospheric Measurement Techniques ◽

10.5194/amt-2-87-2009 ◽

2009 ◽

Vol 2 (1) ◽

pp. 87-98 ◽

Cited By ~ 39

Author(s):

C. Lerot ◽

M. Van Roozendael ◽

J. van Geffen ◽

J. van Gent ◽

C. Fayt ◽

...

Keyword(s):

Cross Sections ◽

Total Ozone ◽

Large Scale ◽

European Space Agency ◽

Data Sets ◽

Data Set ◽

Ozone Data ◽

Space Agency ◽

German Aerospace ◽

The Impact

Abstract. Total O3 columns have been retrieved from six years of SCIAMACHY nadir UV radiance measurements using SDOAS, an adaptation of the GDOAS algorithm previously developed at BIRA-IASB for the GOME instrument. GDOAS and SDOAS have been implemented by the German Aerospace Center (DLR) in the version 4 of the GOME Data Processor (GDP) and in version 3 of the SCIAMACHY Ground Processor (SGP), respectively. The processors are being run at the DLR processing centre on behalf of the European Space Agency (ESA). We first focus on the description of the SDOAS algorithm with particular attention to the impact of uncertainties on the reference O3 absorption cross-sections. Second, the resulting SCIAMACHY total ozone data set is globally evaluated through large-scale comparisons with results from GOME and OMI as well as with ground-based correlative measurements. The various total ozone data sets are found to agree within 2% on average. However, a negative trend of 0.2–0.4%/year has been identified in the SCIAMACHY O3 columns; this probably originates from instrumental degradation effects that have not yet been fully characterized.

Download Full-text

Collecting public RGB-D datasets for human daily activity recognition

International Journal of Advanced Robotic Systems ◽

10.1177/1729881417709079 ◽

2017 ◽

Vol 14 (4) ◽

pp. 172988141770907 ◽

Cited By ~ 2

Author(s):

Hanbo Wu ◽

Xin Ma ◽

Zhimeng Zhang ◽

Haibo Wang ◽

Yibin Li

Keyword(s):

Activity Recognition ◽

Daily Activity ◽

Visual Cues ◽

Large Scale ◽

Hot Spot ◽

Feature Representation ◽

Data Sets ◽

Activity Data ◽

Data Set ◽

Depth Motion Maps

Human daily activity recognition has been a hot spot in the field of computer vision for many decades. Despite best efforts, activity recognition in naturally uncontrolled settings remains a challenging problem. Recently, by being able to perceive depth and visual cues simultaneously, RGB-D cameras greatly boost the performance of activity recognition. However, due to some practical difficulties, the publicly available RGB-D data sets are not sufficiently large for benchmarking when considering the diversity of their activities, subjects, and background. This severely affects the applicability of complicated learning-based recognition approaches. To address the issue, this article provides a large-scale RGB-D activity data set by merging five public RGB-D data sets that differ from each other on many aspects such as length of actions, nationality of subjects, or camera angles. This data set comprises 4528 samples depicting 7 action categories (up to 46 subcategories) performed by 74 subjects. To verify the challengeness of the data set, three feature representation methods are evaluated, which are depth motion maps, spatiotemporal depth cuboid similarity feature, and curvature space scale. Results show that the merged large-scale data set is more realistic and challenging and therefore more suitable for benchmarking.

Download Full-text

Characterising RDF data sets

Journal of Information Science ◽

10.1177/0165551516677945 ◽

2017 ◽

Vol 44 (2) ◽

pp. 203-229 ◽

Cited By ~ 6

Author(s):

Javier D Fernández ◽

Miguel A Martínez-Prieto ◽

Pablo de la Fuente Redondo ◽

Claudio Gutiérrez

Keyword(s):

Data Structures ◽

Large Scale ◽

Open Data ◽

Structural Features ◽

Data Sets ◽

Data Set ◽

Wide Range ◽

Rdf Data ◽

Description Framework ◽

Resource Description

The publication of semantic web data, commonly represented in Resource Description Framework (RDF), has experienced outstanding growth over the last few years. Data from all fields of knowledge are shared publicly and interconnected in active initiatives such as Linked Open Data. However, despite the increasing availability of applications managing large-scale RDF information such as RDF stores and reasoning tools, little attention has been given to the structural features emerging in real-world RDF data. Our work addresses this issue by proposing specific metrics to characterise RDF data. We specifically focus on revealing the redundancy of each data set, as well as common structural patterns. We evaluate the proposed metrics on several data sets, which cover a wide range of designs and models. Our findings provide a basis for more efficient RDF data structures, indexes and compressors.

Download Full-text

Application of the ESMO-Magnitude of Clinical Benefit Scale (V.1.1) to the field of early breast cancer therapies

ESMO Open ◽

10.1136/esmoopen-2020-000743 ◽

2020 ◽

Vol 5 (5) ◽

pp. e000743 ◽

Cited By ~ 2

Author(s):

Shani Paluch-Shimon ◽

Nathan I Cherny ◽

Elisabeth G E de Vries ◽

Urania Dafni ◽

Martine J Piccart ◽

...

Keyword(s):

Breast Cancer ◽

Early Breast Cancer ◽

Clinical Benefit ◽

Large Scale ◽

Disease Free Survival ◽

Cancer Therapies ◽

Data Set ◽

Clinical Scenarios ◽

Meta Analyses

Click here to listen to the PodcastBackgroundThe European Society for Medical Oncology-Magnitude of Clinical Benefit Scale (ESMO-MCBS) is a validated value scale for solid tumour anticancer treatments. Form 1 of the ESMO-MCBS, used to grade therapies with curative intent including adjuvant therapies, has only been evaluated for a limited number of studies. This is the first large-scale field testing in early breast cancer to assess the applicability of the scale to this data set and the reasonableness of derived scores and to identify any shortcomings to be addressed in future modifications of the scale.MethodRepresentative key studies and meta-analyses of the major modalities of adjuvant systemic therapy of breast cancer were identified for each of the major clinical scenarios (HER2-positive, HER2-negative, endocrine-responsive) and were graded with form 1 of the ESMO-MCBS. These generated scores were reviewed by a panel of experts for reasonableness. Shortcomings and issues related to the application of the scale and interpretation of results were identified and critically evaluated.ResultsSixty-five studies were eligible for evaluation: 59 individual studies and 6 meta-analyses. These studies incorporated 101 therapeutic comparisons, 61 of which were scorable. Review of the generated scores indicated that, with few exceptions, they generally reflected contemporary standards of practice. Six shortcomings were identified related to grading based on disease-free survival (DFS), lack of information regarding acute and long-term toxicity and an inability to grade single-arm de-escalation scales.ConclusionsForm 1 of the ESMO-MCBS is a robust tool for the evaluation of the magnitude of benefit studies in early breast cancer. The scale can be further improved by addressing issues related to grading based on DFS, annotating grades with information regarding acute and long-term toxicity and developing an approach to grade single-arm de-escalation studies.

Download Full-text

Variability in the Power Spectrum of Solar Five-Minute Oscillations

International Astronomical Union Colloquium ◽

10.1017/s0252921100095749 ◽

1983 ◽

Vol 66 ◽

pp. 411-425

Author(s):

Frank Hill ◽

Juri Toomre ◽

Laurence J. November

Keyword(s):

Large Scale ◽

Solar Rotation ◽

Temporal Frequency ◽

Power Spectra ◽

Giant Cells ◽

Spectral Lines ◽

Data Sets ◽

Data Set ◽

Temporal Sampling ◽

Velocity Changes

AbstractTwo-dimensional power spectra of solar five-minute oscillations display prominent ridge structures in (k, ω) space, where k is the horizontal wavenumber and ω is the temporal frequency. The positions of these ridges in k and ω can be used to probe temperature and velocity structures in the subphotosphere. We have been carrying out a continuing program of observations of five-minute oscillations with the diode array instrument on the vacuum tower telescope at Sacramento Peak Observatory (SPO). We have sought to establish whether power spectra taken on separate days show shifts in ridge locations; these may arise from different velocity and temperature patterns having been brought into our sampling region by solar rotation. Power spectra have been obtained for six days of observations of Doppler velocities using the Mg I λ5173 and Fe I λ5434 spectral lines. Each data set covers 8 to 11 hr in time and samples a region 256″ × 1024″ in spatial extent, with a spatial resolution of 2″ and temporal sampling of 65 s. We have detected shifts in ridge locations between certain data sets which are statistically significant. The character of these displacements when analyzed in terms of eastward and westward propagating waves implies that changes have occurred in both temperature and horizontal velocity fields underlying our observing window. We estimate the magnitude of the velocity changes to be on the order of 100 m s -1; we may be detecting the effects of large-scale convection akin to giant cells.

Download Full-text

Large-scale analysis of high-speed atomic force microscopy data sets using adaptive image processing

Beilstein Journal of Nanotechnology ◽

10.3762/bjnano.3.84 ◽

2012 ◽

Vol 3 ◽

pp. 747-758 ◽

Cited By ~ 8

Author(s):

Blake W Erickson ◽

Séverine Coquoz ◽

Jonathan D Adams ◽

Daniel J Burns ◽

Georg E Fantner

Keyword(s):

High Speed ◽

Large Scale ◽

Data Sets ◽

Force Microscopy ◽

Topographic Features ◽

Atomic Force ◽

Large Scale Analysis ◽

Adaptive Image Processing ◽

True Representation ◽

Microscopy Data

Modern high-speed atomic force microscopes generate significant quantities of data in a short amount of time. Each image in the sequence has to be processed quickly and accurately in order to obtain a true representation of the sample and its changes over time. This paper presents an automated, adaptive algorithm for the required processing of AFM images. The algorithm adaptively corrects for both common one-dimensional distortions as well as the most common two-dimensional distortions. This method uses an iterative thresholded processing algorithm for rapid and accurate separation of background and surface topography. This separation prevents artificial bias from topographic features and ensures the best possible coherence between the different images in a sequence. This method is equally applicable to all channels of AFM data, and can process images in seconds.

Download Full-text

Dynamic spatio-temporal generation of large-scale synthetic gridded precipitation: with improved spatial coherence of extremes

Stochastic Environmental Research and Risk Assessment ◽

10.1007/s00477-019-01724-9 ◽

2019 ◽

Vol 34 (9) ◽

pp. 1369-1383 ◽

Cited By ~ 1

Author(s):

Dirk Diederen ◽

Ye Liu

Keyword(s):

Large Scale ◽

Spatial Coherence ◽

Original Data ◽

Return Level ◽

Data Sets ◽

Large Set ◽

Precipitation Data ◽

Data Set ◽

Spatio Temporal ◽

Synthetic Precipitation

Abstract With the ongoing development of distributed hydrological models, flood risk analysis calls for synthetic, gridded precipitation data sets. The availability of large, coherent, gridded re-analysis data sets in combination with the increase in computational power, accommodates the development of new methodology to generate such synthetic data. We tracked moving precipitation fields and classified them using self-organising maps. For each class, we fitted a multivariate mixture model and generated a large set of synthetic, coherent descriptors, which we used to reconstruct moving synthetic precipitation fields. We introduced randomness in the original data set by replacing the observed precipitation fields in the original data set with the synthetic precipitation fields. The output is a continuous, gridded, hourly precipitation data set of a much longer duration, containing physically plausible and spatio-temporally coherent precipitation events. The proposed methodology implicitly provides an important improvement in the spatial coherence of precipitation extremes. We investigate the issue of unrealistic, sudden changes on the grid and demonstrate how a dynamic spatio-temporal generator can provide spatial smoothness in the probability distribution parameters and hence in the return level estimates.

Download Full-text