entire sequence Latest Research Papers

Simple epidemic models with segmentation can be better than complex ones

PLoS ONE ◽

10.1371/journal.pone.0262244 ◽

2022 ◽

Vol 17 (1) ◽

pp. e0262244

Author(s):

Geon Lee ◽

Se-eun Yoon ◽

Kijung Shin

Keyword(s):

Dynamic Systems ◽

Epidemic Model ◽

Minimum Description Length ◽

Complex Model ◽

Epidemic Models ◽

Minimum Description Length Principle ◽

Entire Sequence ◽

Number Of Segments ◽

Better Than

Given a sequence of epidemic events, can a single epidemic model capture its dynamics during the entire period? How should we divide the sequence into segments to better capture the dynamics? Throughout human history, infectious diseases (e.g., the Black Death and COVID-19) have been serious threats. Consequently, understanding and forecasting the evolving patterns of epidemic events are critical for prevention and decision making. To this end, epidemic models based on ordinary differential equations (ODEs), which effectively describe dynamic systems in many fields, have been employed. However, a single epidemic model is not enough to capture long-term dynamics of epidemic events especially when the dynamics heavily depend on external factors (e.g., lockdown and the capability to perform tests). In this work, we demonstrate that properly dividing the event sequence regarding COVID-19 (specifically, the numbers of active cases, recoveries, and deaths) into multiple segments and fitting a simple epidemic model to each segment leads to a better fit with fewer parameters than fitting a complex model to the entire sequence. Moreover, we propose a methodology for balancing the number of segments and the complexity of epidemic models, based on the Minimum Description Length principle. Our methodology is (a) Automatic: not requiring any user-defined parameters, (b) Model-agnostic: applicable to any ODE-based epidemic models, and (c) Effective: effectively describing and forecasting the spread of COVID-19 in 70 countries.

Download Full-text

Non-autoregressive neural machine translation with auxiliary representation fusion

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-211105 ◽

2021 ◽

pp. 1-11

Author(s):

Quan Du ◽

Kai Feng ◽

Chen Xu ◽

Tong Xiao ◽

Jingbo Zhu

Keyword(s):

Machine Translation ◽

Experimental Results ◽

The Other ◽

Generation Process ◽

Neural Machine Translation ◽

Trade Off ◽

Translation Quality ◽

Other Hand ◽

Entire Sequence ◽

Translation Accuracy

Recently, many efforts have been devoted to speeding up neural machine translation models. Among them, the non-autoregressive translation (NAT) model is promising because it removes the sequential dependence on the previously generated tokens and parallelizes the generation process of the entire sequence. On the other hand, the autoregressive translation (AT) model in general achieves a higher translation accuracy than the NAT counterpart. Therefore, a natural idea is to fuse the AT and NAT models to seek a trade-off between inference speed and translation quality. This paper proposes an ARF-NAT model (NAT with auxiliary representation fusion) to introduce the merit of a shallow AT model to an NAT model. Three functions are designed to fuse the auxiliary representation into the decoder of the NAT model. Experimental results show that ARF-NAT outperforms the NAT baseline by 5.26 BLEU scores on the WMT’14 German-English task with a significant speedup (7.58 times) over several strong AT baselines.

Download Full-text

Stream Data Cleaning under Speed and Acceleration Constraints

ACM Transactions on Database Systems ◽

10.1145/3465740 ◽

2021 ◽

Vol 46 (3) ◽

pp. 1-44

Author(s):

Shaoxu Song ◽

Fei Gao ◽

Aoqian Zhang ◽

Jianmin Wang ◽

Philip S. Yu

Keyword(s):

Stock Prices ◽

Linear Time ◽

Data Cleaning ◽

Polynomial Time Algorithm ◽

Time Algorithm ◽

Global Optimum ◽

Local Optimum ◽

Stream Data ◽

Sensor Reading ◽

Entire Sequence

Stream data are often dirty, for example, owing to unreliable sensor reading or erroneous extraction of stock prices. Most stream data cleaning approaches employ a smoothing filter, which may seriously alter the data without preserving the original information. We argue that the cleaning should avoid changing those originally correct/clean data, a.k.a. the minimum modification rule in data cleaning. To capture the knowledge about what is clean , we consider the (widely existing) constraints on the speed and acceleration of data changes, such as fuel consumption per hour, daily limit of stock prices, or the top speed and acceleration of a car. Guided by these semantic constraints, in this article, we propose the constraint-based approach for cleaning stream data. It is notable that existing data repair techniques clean (a sequence of) data as a whole and fail to support stream computation. To this end, we have to relax the global optimum over the entire sequence to the local optimum in a window. Rather than the commonly observed NP-hardness of general data repairing problems, our major contributions include (1) polynomial time algorithm for global optimum, (2) linear time algorithm towards local optimum under an efficient median-based solution , and (3) experiments on real datasets demonstrate that our method can show significantly lower L1 error than the existing approaches such as smoother.

Download Full-text

Magma Migration at Shallower Levels and Lava Fountains Sequence as Revealed by Borehole Dilatometers on Etna Volcano

Frontiers in Earth Science ◽

10.3389/feart.2021.740505 ◽

2021 ◽

Vol 9 ◽

Author(s):

Alessandro Bonaccorso ◽

Luigi Carleo ◽

Gilda Currenti ◽

Antonino Sicali

Keyword(s):

High Precision ◽

Small Strain ◽

Occurrence Rate ◽

Etna Volcano ◽

Plumbing System ◽

Source Modeling ◽

Main Challenge ◽

Entire Sequence ◽

Lava Fountains ◽

Magma Migration

A main challenge in open conduit volcanoes is to detect and interpret the ultra-small strain (<10–6) associated with minor but critical eruptions such as the lava fountains. Two years after the flank eruption of December 2018, Etna generated a violent and spectacular eruptive sequence of lava fountains. There were 23 episodes from December 13, 2020 to March 31, 2021, 17 of which in the brief period 16 February to 31 March with an intensified occurrence rate. The high-precision borehole dilatometer network recorded significant strain changes in the forerunning phase of December 2020 accompanying the final magma migration at the shallower levels, and also during the single lava fountains and during the entire sequence. The source modeling provided further information on the shallow plumbing system. Moreover, the strain signals also gave useful information both on the explosive efficiency of the lava fountains sequence and the estimate of erupted volume. The high precision borehole dilatometers confirm to be strategic and very useful tool, also to detect and interpret ultra-small strain changes associated with explosive eruptions, such as lava fountains, in open conduit volcanoes.

Download Full-text

Cas9-derived peptides presented by MHC Class II that elicit proliferation of CD4+ T-cells

Nature Communications ◽

10.1038/s41467-021-25414-9 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Vijaya L. Simhadri ◽

Louis Hopkins ◽

Joseph R. McGill ◽

Brian R. Duke ◽

Swati Mukherjee ◽

...

Keyword(s):

T Cells ◽

Mhc Class Ii ◽

Mononuclear Cells ◽

Human Peripheral Blood ◽

Class Ii ◽

Data Sets ◽

Peripheral Blood Mononuclear ◽

Mhc Ii ◽

The North ◽

Entire Sequence

AbstractCRISPR–Cas9 mediated genome editing offers unprecedented opportunities for treating human diseases. There are several reports that demonstrate pre-existing immune responses to Cas9 which may have implications for clinical development of CRISPR-Cas9 mediated gene therapy. Here we use 209 overlapping peptides that span the entire sequence of Staphylococcus aureus Cas9 (SaCas9) and human peripheral blood mononuclear cells (PBMCs) from a cohort of donors with a distribution of Major Histocompatibility Complex (MHC) alleles comparable to that in the North American (NA) population to identify the immunodominant regions of the SaCas9 protein. We also use an MHC Associated Peptide Proteomics (MAPPs) assay to identify SaCas9 peptides presented by MHC Class II (MHC-II) proteins on dendritic cells. Using these two data sets we identify 22 SaCas9 peptides that are both presented by MHC-II proteins and stimulate CD4+ T-cells.

Download Full-text

ResidueFinder: extracting individual residue mentions from protein literature

Journal of Biomedical Semantics ◽

10.1186/s13326-021-00243-3 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Ton E Becker ◽

Eric Jakobsson

Keyword(s):

Amino Acids ◽

Full Text ◽

Protein Function ◽

Regular Expression ◽

Computationally Efficient ◽

Expression Library ◽

Trade Offs ◽

Entire Sequence ◽

Individual Residue ◽

Efficient Program

Abstract Background The revolution in molecular biology has shown how protein function and structure are based on specific sequences of amino acids. Thus, an important feature in many papers is the mention of the significance of individual amino acids in the context of the entire sequence of the protein. MutationFinder is a widely used program for finding mentions of specific mutations in texts. We report on augmenting the positive attributes of MutationFinder with a more inclusive regular expression list to create ResidueFinder, which finds mentions of native amino acids as well as mutations. We also consider parameter options for both ResidueFinder and MutationFinder to explore trade-offs between precision, recall, and computational efficiency. We test our methods and software in full text as well as abstracts. Results We find there is much more variety of formats for mentioning residues in the entire text of papers than in abstracts alone. Failure to take these multiple formats into account results in many false negatives in the program. Since MutationFinder, like several other programs, was primarily tested on abstracts, we found it necessary to build an expanded regular expression list to achieve acceptable recall in full text searches. We also discovered a number of artifacts arising from PDF to text conversion, which we wrote elements in the regular expression library to address. Taking into account those factors resulted in high recall on randomly selected primary research articles. We also developed a streamlined regular expression (called “cut”) which enables a several hundredfold speedup in both MutationFinder and ResidueFinder with only a modest compromise of recall. All regular expressions were tested using expanded F-measure statistics, i.e., we compute Fβ for various values of where the larger the value of β the more recall is weighted, the smaller the value of β the more precision is weighted. Conclusions ResidueFinder is a simple, effective, and efficient program for finding individual residue mentions in primary literature starting with text files, implemented in Python, and available in SourceForge.net. The most computationally efficient versions of ResidueFinder could enable creation and maintenance of a database of residue mentions encompassing all articles in PubMed.

Download Full-text

The Aspergillus niger Major Allergen (Asp n 3) DNA-Specific Sequence Is a Reliable Marker to Identify Early Fungal Contamination and Postharvest Damage in Mangifera indica Fruit

Frontiers in Microbiology ◽

10.3389/fmicb.2021.663323 ◽

2021 ◽

Vol 12 ◽

Author(s):

Jorge Martínez ◽

Ander Nevado ◽

Ester Suñén ◽

Marta Gabriel ◽

Ainara Vélez-del-Burgo ◽

...

Keyword(s):

Aspergillus Niger ◽

Mangifera Indica ◽

Major Allergen ◽

Fungal Species ◽

Specific Marker ◽

Specific Sequence ◽

Entire Sequence ◽

Conserved Region ◽

Reliable Marker ◽

Species Specific

The aim of this work was to study the value of the main allergen Asp n 3 of Aspergillus niger as a molecular marker of allergenicity and pathogenicity with the potential to be used in the identification of A. niger as a contaminant and cause of spoilage of Mangifera indica. Real-time polymerase chain reaction (RT-PCR) was used for the amplification of Asp n 3 gene. Two pairs of primers were designed: one for the amplification of the entire sequence and another one for the amplification of the most conserved region of this peroxisomal protein. The presence of A. niger was demonstrated by the early detection of the allergenic protein Asp n 3 coding gene, which could be considered a species-specific marker. The use of primers designed based on the conserved region of the Asp n 3 encoding gene allowed us to identify the presence of the closely related fungal species Aspergillus fumigatus by detecting Asp n 3 homologous protein, which can be cross-reactive. The use of conserved segments of the Asp n 3 gene or its entire sequence allows us to detect phylogenetically closely related species within the Aspergilaceae family or to identify species-specific contaminating fungi.

Download Full-text

Embeddings from protein language models predict conservation and variant effects

10.21203/rs.3.rs-584804/v1 ◽

2021 ◽

Author(s):

Céline Marquet ◽

Michael Heinzinger ◽

Tobias Olenyi ◽

Christian Dallago ◽

Michael Bernhofer ◽

...

Keyword(s):

Protein Function ◽

Language Models ◽

Single Amino Acid ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Human Proteins ◽

Entire Sequence ◽

Embedding Methods ◽

Better Than

Abstract The emergence of SARS-CoV-2 variants stressed the demand for tools allowing to interpret the effect of single amino acid variants (SAVs) on protein function. While Deep Mutational Scanning (DMS) sets continue to expand our understanding of the mutational landscape of single proteins, the results continue to challenge analyses. Protein Language Models (LMs) use the latest deep learning (DL) algorithms to leverage growing databases of protein sequences. These methods learn to predict missing or marked amino acids from the context of entire sequence regions. Here, we explored how to benefit from learned protein LM representations (embeddings) to predict SAV effects. Although we have failed so far to predict SAV effects directly from embeddings, this input alone predicted residue conservation almost as accurately from single sequences as using multiple sequence alignments (MSAs) with a two-state per-residue accuracy (conserved/not) of Q2=80% (embeddings) vs. 81% (ConSeq). Considering all SAVs at all residue positions predicted as conserved to affect function reached 68.6% (Q2: effect/neutral; for PMD) without optimization, compared to an expert solution such as SNAP2 (Q2=69.8). Combining predicted conservation with BLOSUM62 to obtain variant-specific binary predictions, DMS experiments of four human proteins were predicted better than by SNAP2, and better than by applying the same simplistic approach to conservation taken from ConSeq. Thus, embedding methods have become competitive with methods relying on MSAs for SAV effect prediction at a fraction of the costs in computing/energy. This allowed prediction of SAV effects for the entire human proteome (~20k proteins) within 17 minutes on a single GPU.

Download Full-text

Structure-based survey of ligand binding in the human insulin receptor

10.22541/au.162245427.70273837/v1 ◽

2021 ◽

Author(s):

Judith Klein-Seetharaman ◽

Whitney Vizgaudis ◽

Lokender Kumar

Keyword(s):

Ligand Binding ◽

Insulin Receptor ◽

Human Insulin ◽

Nutrient Balance ◽

Docking Study ◽

Human Insulin Receptor ◽

Receptor Pharmacology ◽

Entire Sequence ◽

Treatment Of Diabetes ◽

Structural Insights

The insulin receptor is a membrane protein responsible for regulation of nutrient balance and therefore an attractive target in the treatment of diabetes and metabolic syndrome. Pharmacology of the insulin receptor involves two distinct mechanisms, (1) activation of the receptor by insulin mimetics that bind in the extracellular domain and (2) inhibition of the receptor tyrosine kinase enzymatic activity in the cytoplasmic domain. While a complete structural picture of the full-length receptor comprising the entire sequence covering extracellular, transmembrane, juxtamembrane and cytoplasmic domains is still elusive, recent progress through cryoelectron microscopy has made it possible to describe the initial insulin ligand binding events at atomistic detail. We utilize this opportunity to obtain structural insights into the pharmacology of the insulin receptor. To this end, we conducted a comprehensive docking study of known ligands to the new structures of the receptor. Through this approach, we provide an in-depth, structure-based review of human insulin receptor pharmacology in light of the new structures.

Download Full-text

Adapting the Selective Exposure Perspective to Algorithmically Governed Platforms: The Case of Google Search

Communication Research ◽

10.1177/00936502211012154 ◽

2021 ◽

pp. 009365022110121

Author(s):

Laura Slechten ◽

Cédric Courtois ◽

Lennert Coenen ◽

Bieke Zaman

Keyword(s):

Selection Process ◽

Selective Exposure ◽

Information Need ◽

Prior Beliefs ◽

Information Selection ◽

Search Result ◽

Structural Impact ◽

Entire Sequence ◽

Available Information ◽

Google Search

Experimental research on selective exposure on online platforms is generally limited by a narrow focus on specific parts of the information selection process, rather than integrating the entire sequence of user-platform interactions. The current study, focusing on online search, incorporates the entire process that stretches from formulating an initial query to finally satisfying an information need. As such, it comprehensively covers how both users and platforms exercise agency by enabling and constraining each other in progressively narrowing down the available information. During a tailored online experiment, participants are asked to search for social and political information in a fully tracked, manipulated Google Search environment. Although the results show a structural impact of varying search result rankings, users still appear to be able to tailor their information exposure to maintain their prior beliefs, hence defying that algorithmic impact. This corroborates the need to conceptually and methodologically expand online selective exposure research.

Download Full-text

entire sequence
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Simple epidemic models with segmentation can be better than complex ones

Non-autoregressive neural machine translation with auxiliary representation fusion

Stream Data Cleaning under Speed and Acceleration Constraints

Magma Migration at Shallower Levels and Lava Fountains Sequence as Revealed by Borehole Dilatometers on Etna Volcano

Cas9-derived peptides presented by MHC Class II that elicit proliferation of CD4+ T-cells

ResidueFinder: extracting individual residue mentions from protein literature

The Aspergillus niger Major Allergen (Asp n 3) DNA-Specific Sequence Is a Reliable Marker to Identify Early Fungal Contamination and Postharvest Damage in Mangifera indica Fruit

Embeddings from protein language models predict conservation and variant effects

Structure-based survey of ligand binding in the human insulin receptor

Adapting the Selective Exposure Perspective to Algorithmically Governed Platforms: The Case of Google Search

Export Citation Format

entire sequenceRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Simple epidemic models with segmentation can be better than complex ones

Non-autoregressive neural machine translation with auxiliary representation fusion

Stream Data Cleaning under Speed and Acceleration Constraints

Magma Migration at Shallower Levels and Lava Fountains Sequence as Revealed by Borehole Dilatometers on Etna Volcano

Cas9-derived peptides presented by MHC Class II that elicit proliferation of CD4+ T-cells

ResidueFinder: extracting individual residue mentions from protein literature

The Aspergillus niger Major Allergen (Asp n 3) DNA-Specific Sequence Is a Reliable Marker to Identify Early Fungal Contamination and Postharvest Damage in Mangifera indica Fruit

Embeddings from protein language models predict conservation and variant effects

Structure-based survey of ligand binding in the human insulin receptor

Adapting the Selective Exposure Perspective to Algorithmically Governed Platforms: The Case of Google Search

entire sequence
Recently Published Documents