Le système de renseignements C.F.C.

W. Pleines; L. Letourneau

doi:10.5558/tfc46039-1

Le système de renseignements C.F.C.

The Forestry Chronicle ◽

10.5558/tfc46039-1 ◽

1970 ◽

Vol 46 (1) ◽

pp. 39-43

Author(s):

W. Pleines ◽

L. Letourneau

Keyword(s):

Maintenance Phase ◽

Data Bank ◽

Relevant Information ◽

General Purpose ◽

Computer Applications ◽

Permanent Plots ◽

Enormous Amount ◽

Forest Experiment Station ◽

Tree Data ◽

Reporting Phase

Forest surveys based on permanent plots possess peculiarities unusual in other computer applications. The enormous amount of information (30,000 plots, 1,000,000 trees), must be checked, corrected. Relevant information must be selected from the data bank for statistical computations. Because information for decision-making changes, the computer programs must be flexible.This article explains how this was done. In a temporary phase, all card data were "converted" to standard codes and format and written on magnetic tapes. In the file maintenance phase, the data bank is checked and corrected. Volumes are computed, plots checked by accumulating tree data, etc. The file creation phase builds a unit record from plot and tree information. Stratification data can also be merged on the new file. The reporting phase consists of modified versions of programs of the Northeastern Forest Experiment Station in the U.S. (Wilson and Peters, 1967). TABLE computes statistics by strata and condenses them in matrices, OUTPUT prints them in desired form.This computer system is a harmonious combination of special and general purpose programs. CIP experiences in developing these programs may help other foresters hence more exchange of information about data processing is desired.

Download Full-text

Extension of the sasCIF format and its applications for data processing and deposition

Journal of Applied Crystallography ◽

10.1107/s1600576715024942 ◽

2016 ◽

Vol 49 (1) ◽

pp. 302-310 ◽

Cited By ~ 8

Author(s):

Michael Kachala ◽

John Westbrook ◽

Dmitri Svergun

Keyword(s):

Data Analysis ◽

Data Processing ◽

Data Exchange ◽

Hybrid Methods ◽

Data Bank ◽

Relevant Information ◽

Experimental Information ◽

Biological Data ◽

Task Forces ◽

Software Modules

Recent advances in small-angle scattering (SAS) experimental facilities and data analysis methods have prompted a dramatic increase in the number of users and of projects conducted, causing an upsurge in the number of objects studied, experimental data available and structural models generated. To organize the data and models and make them accessible to the community, the Task Forces on SAS and hybrid methods for the International Union of Crystallography and the Worldwide Protein Data Bank envisage developing a federated approach to SAS data and model archiving. Within the framework of this approach, the existing databases may exchange information and provide independent but synchronized entries to users. At present, ways of exchanging information between the various SAS databases are not established, leading to possible duplication and incompatibility of entries, and limiting the opportunities for data-driven research for SAS users. In this work, a solution is developed to resolve these issues and provide a universal exchange format for the community, based on the use of the widely adopted crystallographic information framework (CIF). The previous version of the sasCIF format, implemented as an extension of the core CIF dictionary, has been available since 2000 to facilitate SAS data exchange between laboratories. The sasCIF format has now been extended to describe comprehensively the necessary experimental information, results and models, including relevant metadata for SAS data analysis and for deposition into a database. Processing tools for these files (sasCIFtools) have been developed, and these are available both as standalone open-source programs and integrated into the SAS Biological Data Bank, allowing the export and import of data entries as sasCIF files. Software modules to save the relevant information directly from beamline data-processing pipelines in sasCIF format are also developed. This update of sasCIF and the relevant tools are an important step in the standardization of the way SAS data are presented and exchanged, to make the results easily accessible to users and to promote further the application of SAS in the structural biology community.

Download Full-text

Improving Results Aggregation Strategies in Distributed Information Retrieval

International Journal of Engineering Research in Africa ◽

10.4028/www.scientific.net/jera.17.94 ◽

2015 ◽

Vol 17 ◽

pp. 94-104

Author(s):

Benjamin Ghansah ◽

Sheng Li Wu ◽

Nathaniel Ekow Ghansah

Keyword(s):

Information Retrieval ◽

User Satisfaction ◽

Information Needs ◽

Relevant Information ◽

General Purpose ◽

Distributed Information ◽

Distributed Information Retrieval ◽

Result Diversification ◽

Result Merging ◽

Ranked List

The top-ranked documents from various information sources that are merged together into a unified ranked list may cover the same piece of relevant information, and cannot satisfy different user needs. Result diversification(RD) solves this problem by diversifying results to cover more information needs. In recent times, RD has attracted much attention as a means of increasing user satisfaction in general purpose search engines. A myriad of approaches have been proposed in the related works for the diversification problem. However, no concrete study of search result diversification has been done in a Distributed Information Retrieval(DIR) setting. In this paper, we survey, classify and propose a theoretical framework that aims at improving diversification at the result merging phase of a DIR environment.

Download Full-text

A portable method for acquiring information extraction patterns without annotated corpora

Natural Language Engineering ◽

10.1017/s1351324902003042 ◽

2003 ◽

Vol 9 (2) ◽

pp. 151-179 ◽

Cited By ~ 2

Author(s):

NEUS CATALÀ ◽

NÚRIA CASTELL ◽

MARIO MARTÍN

Keyword(s):

Information Extraction ◽

Learning Algorithm ◽

Relevant Information ◽

General Purpose ◽

Human Intervention ◽

Lexical Knowledge ◽

Distinctive Features ◽

Domain Specific ◽

Building Information ◽

Automatic Acquisition

The main issue when building Information Extraction (IE) systems is how to obtain the knowledge needed to identify relevant information in a document. Most approaches require expert human intervention in many steps of the acquisition process. In this paper we describe ESSENCE, a new method for acquiring IE patterns that significantly reduces the need for human intervention. The method is based on ELA, a specifically designed learning algorithm for acquiring IE patterns without tagged examples. The distinctive features of ESSENCE and ELA are that (1) they permit the automatic acquisition of IE patterns from unrestricted and untagged text representative of the domain, due to (2) their ability to identify regularities around semantically relevant concept-words for the IE task by (3) using non-domain-specific lexical knowledge tools such as WordNet, and (4) restricting the human intervention to defining the task, and validating and typifying the set of IE patterns obtained. Since ESSENCE does not require a corpus annotated with the type of information to be extracted and it uses a general purpose ontology and widely applied syntactic tools, it reduces the expert effort required to build an IE system and therefore also reduces the effort of porting the method to any domain. The results of the application of ESSENCE to the acquisition of IE patterns in an MUC-like task are shown.

Download Full-text

Evaluation of variability in high resolution protein structures by global distance scoring

10.1101/202028 ◽

2017 ◽

Author(s):

Risa Anzai ◽

Yoshiki Asami ◽

Waka Inoue ◽

Hina Ueno ◽

Koya Yamada ◽

...

Keyword(s):

High Resolution ◽

Global Analysis ◽

Protein Structures ◽

Data Bank ◽

Relevant Information ◽

Systematic Analysis ◽

Structure Variation ◽

Model Calculations ◽

Biologically Relevant ◽

Global Comparison

AbstractSystematic analysis of statistical and dynamical properties of proteins is critical to understanding cellular events. Extraction of biologically relevant information from a set of high-resolution structures is important because it can provide mechanistic details behind the functional properties of protein families, enabling rational comparison between families. Most of the current structure comparisons are pairwise-based, which hampers the global analysis of increasing contents in the Protein Data Bank. Additionally, pairing of protein structures introduces uncertainty with respect to reproducibility because it frequently accompanies other settings for superimposition. This study introduces intramolecular distance scoring, for the analysis of human proteins, for each of which at least several high-resolution are available. We show that the results are comprehensively used to overview advances at the atomic level exploration of each protein and protein family. This method, and the interpretation based on model calculations, provide new criteria for understanding specific and non-specific structure variation in a protein, enabling global comparison of the dynamics among a vast variety of proteins from different species.

Download Full-text

A non-spatial account of place and grid cells based on clustering models of concept learning

10.1101/421842 ◽

2018 ◽

Author(s):

Robert M. Mok ◽

Bradley C. Love

Keyword(s):

Medial Temporal Lobe ◽

Conceptual Knowledge ◽

Learning Algorithm ◽

Grid Cell ◽

Relevant Information ◽

Neural Circuitry ◽

General Purpose ◽

Clustering Model ◽

Higher Dimensional ◽

Processing Steps

ABSTRACTOne view is that conceptual knowledge is organized using the circuitry in the medial temporal lobe (MTL) that supports spatial processing and navigation. In contrast, we find that a domain-general learning algorithm explains key findings in both spatial and conceptual domains. When the clustering model is applied to spatial navigation tasks, so called place and grid cell-like representations emerge because of the relatively uniform distribution of possible inputs in these tasks. The same mechanism applied to conceptual tasks, where the overall space can be higher-dimensional and sampling sparser, leads to representations more aligned with human conceptual knowledge. Although the types of memory supported by the MTL are superficially dissimilar, the information processing steps appear shared. Our account suggests that the MTL uses a general-purpose algorithm to learn and organize context-relevant information in a useful format, rather than relying on navigation-specific neural circuitry.

Download Full-text

The Performance Improvement of the Low-Cost Ultrasonic Range Finder (HC-SR04) Using Newton’s Polynomial Interpolation Algorithm

JURNAL INFOTEL ◽

10.20895/infotel.v11i4.456 ◽

2019 ◽

Vol 11 (4) ◽

Author(s):

Gutama Indra Gandha ◽

Dedi Nurcipto

Keyword(s):

Polynomial Interpolation ◽

Low Cost ◽

General Purpose ◽

Computer Applications ◽

Range Finder ◽

Interpolation Algorithm ◽

Measuring Process ◽

Underwater Environment ◽

Industrial Grade ◽

Accuracy Level

The ultrasonic range finder sensors are widely used sensor in many applications such as computer applications, general purpose applications, medical applications, automotive applications and industrial grade applications. The ultrasonic range finder sensor has many advantages. The advantages are easy to use, fast in measuring process, non-contact measurement and suitable for air and underwater environment. However, the ultrasonic range finder has deviation especially for low-cost sensor. It affects the accuracy level of the measurement result that performed by its sensor directly. The HC-SR04 categorized as a low-cost ultrasonic range finder sensor. This sensor has significant error level. The improvement of the accuracy level of this low-cost ultrasonic sensor is expected to this research. The Newton’s polynomial interpolation algorithm has been used in this research to reduce the error during the measurement process. The implementation of Newton’s polynomial interpolation has succeeded to improve the sensor accuracy. The MSE level of 29,96 is obtained without the Newton’s Polynomial Interpolation implementation. The implementation of the Newton’s Polynomial Interpolation algorithm has succeeded to increase the accuracy level of the sensor by 55,54%. It has been proofed by the decrease of MSE level by 13,32.

Download Full-text

A general-purpose protein design framework based on mining sequence-structure relationships in known protein structures

10.1101/431635 ◽

2018 ◽

Cited By ~ 2

Author(s):

Jianfu Zhou ◽

Alexandra E. Panaitiu ◽

Gevorg Grigoryan

Keyword(s):

Protein Design ◽

Structure Prediction ◽

Fluorescent Protein ◽

Protein Structures ◽

Building Blocks ◽

Data Bank ◽

General Purpose ◽

Design Framework ◽

Target Structure ◽

Sequence Structure

AbstractThe ability to routinely design functional proteins, in a targeted manner, would have enormous implications for biomedical research and therapeutic development. Computational protein design (CPD) offers the potential to fulfill this need, and though recent years have brought considerable progress in the field, major limitations remain. Current state-of-the-art approaches to CPD aim to capture the determinants of structure from physical principles. While this has led to many successful designs, it does have strong limitations associated with inaccuracies in physical modeling, such that a robust general solution to CPD has yet to be found. Here we propose a fundamentally novel design framework—one based on identifying and applying patterns of sequence-structure compatibility found in known proteins, rather than approximating them from models of inter-atomic interactions. Specifically, we systematically decompose the target structure to be designed into structural building blocks we call TERMs (tertiary motifs) and use rapid structure search against the Protein Data Bank (PDB) to identify sequence patterns associated with each TERM from known protein structures that contain it. These results are then combined to produce a sequence-level pseudo-energy model that can score any sequence for compatibility with the target structure. This model can then be used to extract the optimal-scoring sequence via combinatorial optimization or otherwise sample the sequence space predicted to be well compatible with folding to the target. Here we carry out extensive computational analyses, showing that our method, which we dub dTERMen (design with TERM energies): 1) produces native-like sequences given native crystallographic or NMR backbones, 2) produces sequence-structure compatibility scores that correlate with thermodynamic stability, and 3) is able to predict experimental success of designed sequences generated with other methods, and 4) designs sequences that are found to fold to the desired target by structure prediction more frequently than sequences designed with an atomistic method. As an experimental validation of dTERMen, we perform a total surface redesign of Red Fluorescent Protein mCherry, marking a total of 64 residues as variable. The single sequence identified as optimal by dTERMen harbors 48 mutations relative to mCherry, but nevertheless folds, is monomeric in solution, exhibits similar stability to chemical denaturation as mCherry, and even preserves the fluorescence property. Our results strongly argue that the PDB is now sufficiently large to enable proteins to be designed by using only examples of structural motifs from unrelated proteins. This is highly significant, given that the structural database will only continue to grow, and signals the possibility of a whole host of novel data-driven CPD methods. Because such methods are likely to have orthogonal strengths relative to existing techniques, they could represent an important step towards removing remaining barriers to robust CPD.

Download Full-text

Responses of Conversational Agents to Health and Lifestyle Prompts: Investigation of Appropriateness and Presentation Structures (Preprint)

10.2196/preprints.15823 ◽

2019 ◽

Cited By ~ 2

Author(s):

Ahmet Baki Kocaballi ◽

Juan C Quiroz ◽

Dana Rezazadegan ◽

Shlomo Berkovsky ◽

Farah Magrabi ◽

...

Keyword(s):

Web Search ◽

Information Sources ◽

Relevant Information ◽

Spoken Language ◽

General Purpose ◽

Conversational Agents ◽

Natural Language Interfaces ◽

Safety Critical ◽

Safety Risks ◽

Potential Safety

BACKGROUND Conversational agents (CAs) are systems that mimic human conversations using text or spoken language. Their widely used examples include voice-activated systems such as Apple Siri, Google Assistant, Amazon Alexa, and Microsoft Cortana. The use of CAs in health care has been on the rise, but concerns about their potential safety risks often remain understudied. OBJECTIVE This study aimed to analyze how commonly available, general-purpose CAs on smartphones and smart speakers respond to health and lifestyle prompts (questions and open-ended statements) by examining their responses in terms of content and structure alike. METHODS We followed a piloted script to present health- and lifestyle-related prompts to 8 CAs. The CAs’ responses were assessed for their appropriateness on the basis of the prompt type: responses to safety-critical prompts were deemed appropriate if they included a referral to a health professional or service, whereas responses to lifestyle prompts were deemed appropriate if they provided relevant information to address the problem prompted. The response structure was also examined according to information sources (Web search–based or precoded), response content style (informative and/or directive), confirmation of prompt recognition, and empathy. RESULTS The 8 studied CAs provided in total 240 responses to 30 prompts. They collectively responded appropriately to 41% (46/112) of the safety-critical and 39% (37/96) of the lifestyle prompts. The ratio of appropriate responses deteriorated when safety-critical prompts were rephrased or when the agent used a voice-only interface. The appropriate responses included mostly directive content and empathy statements for the safety-critical prompts and a mix of informative and directive content for the lifestyle prompts. CONCLUSIONS Our results suggest that the commonly available, general-purpose CAs on smartphones and smart speakers with unconstrained natural language interfaces are limited in their ability to advise on both the safety-critical health prompts and lifestyle prompts. Our study also identified some response structures the CAs employed to present their appropriate responses. Further investigation is needed to establish guidelines for designing suitable response structures for different prompt types.

Download Full-text

The Digital Library and the Archiving System for Educational Institutes

Pakistan Journal of Information Management and Libraries ◽

10.47657/2018201453 ◽

2018 ◽

pp. 94-117

Author(s):

Atta ur Rahman ◽

Fahd Abdulsalam Alhaidari

Keyword(s):

Digital Library ◽

Search Engines ◽

State Of The Art ◽

Data Bank ◽

Relevant Information ◽

Query Languages ◽

Meaningful Information ◽

Educational Domain ◽

Primary Representation ◽

Information Repository

At present, there are several formats that exist through which data is distributed among online stakeholders. An example of this is the XML, which like other such formats is helpful for traditional inquiry methods and for forming the foundation of query languages such as SPARQL and SQL. Information about primary representation demands a broader assistance for the languages where every piece of data from any resource can substantiate the original queries for searching. Such models are useful for XML based retrieval since several cooperative XML search engines have been developed already. These search engines perform semantic investigation of XML files with data surrounded by the important fields. Therefore, XML files are used to store and index data intended for competent retrieval. In this research, an attempt is made to fill this gap of customized representation and retrieval with a focus on the educational domain. An institute's repository of books, e-books, journals, articles and research theses has been used to retrieve results. A system has been proposed and developed to store the contents of Institute's Databank as an object of the Digital Library. A structured method has been proposed to organize all the data and a system has been developed which extracts meaningful information from the Data Bank. The information repository is established, and the entire data is represented in terms of a unit called Digital Object in the Digital Library. The single unit is represented by recording some quantitative data about it referred to as ‘Metadata'. The search is focused on extracting meaningful information from the repository by applying some filtration strategies to get relevant information, best matched with the query terms. At the end, a partitioning and parallelism focused architecture to archive the information for sharing, back-up and collaboration is also proposed. Comparison of the proposed scheme with state of the art schemes is provided in terms of computational complexity and recall measurement.

Download Full-text

A general-purpose protein design framework based on mining sequence–structure relationships in known protein structures

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1908723117 ◽

2019 ◽

Vol 117 (2) ◽

pp. 1059-1068 ◽

Cited By ~ 9

Author(s):

Jianfu Zhou ◽

Alexandra E. Panaitiu ◽

Gevorg Grigoryan

Keyword(s):

Protein Design ◽

Experimental Validation ◽

Protein Structures ◽

Data Bank ◽

General Purpose ◽

Design Framework ◽

Sequence Structure ◽

Current State ◽

Computational Analyses ◽

Physical Principles

Current state-of-the-art approaches to computational protein design (CPD) aim to capture the determinants of structure from physical principles. While this has led to many successful designs, it does have strong limitations associated with inaccuracies in physical modeling, such that a reliable general solution to CPD has yet to be found. Here, we propose a design framework—one based on identifying and applying patterns of sequence–structure compatibility found in known proteins, rather than approximating them from models of interatomic interactions. We carry out extensive computational analyses and an experimental validation for our method. Our results strongly argue that the Protein Data Bank is now sufficiently large to enable proteins to be designed by using only examples of structural motifs from unrelated proteins. Because our method is likely to have orthogonal strengths relative to existing techniques, it could represent an important step toward removing remaining barriers to robust CPD.

Download Full-text