Scanning Compressed Full Text Files

Author(s):  
Carolyn R. Watters ◽  
Matthew Young-Lai

From the 1994 CAIS Conference: The Information Industry in Transition McGill University, Montreal, Quebec. May 25 - 27, 1994.In this paper we discuss an application of compression, not with the overall goal of reducing disk space, but with the goal of extending the applicability of full text scan procedures to larger text files for use in on-line search environments. This paper presents an alternative to inverted file generation for access to full text data files of medium size, 50-200Mbytes, for which the cost of generating a full inverted file is not warranted. Full scan techniques, which are often useful in interactive situations for small files, become unacceptably slow for interactive sessions with files above 10 Mbytes and so the use of compression to reduce the quantity of data scanned is an attractive alternative. Furthermore, an index that can be used to reduce search time further to very acceptable 1-2 second times can be generated as a byproduct of the compression.

Author(s):  
Carol A. Keene

From the 1994 CAIS Conference: The Information Industry in Transition McGill University, Montreal, Quebec. May 25 - 27, 1994.Presented with the task of locating needed information in on-line, full-text documentation, users must express queries in the language of the retrieval system. Many of these query languages are based on Boolean logic or restricted natural language syntax, and users find it difficult to express information needs. Experiments conducted at the University of Colorado asked participants to enter English queries to locate information needed to solve problems ranging from very specific to very general ones. No restrictions were placed upon grammar or vocabulary. The collected queries were very short, telegraphic in style, used few verbs, and contained frequently occurring terms from stored vocabulary. There were no statistically significant differences in query contents based upon a participant's knowledge of the topic or English communication skills.


1972 ◽  
Vol 11 (03) ◽  
pp. 152-162 ◽  
Author(s):  
P. GAYNON ◽  
R. L. WONG

With the objective of providing easier access to pathology specimens, slides and kodachromes with linkage to x-ray and the remainder of the patient’s medical records, an automated natural language parsing routine, based on dictionary look-up, was written for Surgical Pathology document-pairs, each consisting of a Request for Examination (authored by clinicians) and its corresponding report (authored by pathologists). These documents were input to the system in free-text English without manual editing or coding.Two types of indices were prepared. The first was an »inverted« file, available for on-line retrieval, for display of the content of the document-pairs, frequency counts of cases or listing of cases in table format. Retrievable items are patient’s and specimen’s identification data, date of operation, name of clinician and pathologist, etc. The English content of the operative procedure, clinical findings and pathologic diagnoses can be retrieved through logical combination of key words. The second type of index was a catalog. Three catalog files — »operation«, »clinical«, and »pathology« — were prepared by alphabetization of lines formed by the rotation of phrases, headed by keywords. These keywords were automatically selected and standardized by the parsing routine and the phrases were extracted from each sentence of each input document. Over 2,500 document-pairs have been entered and are currently being utilized for purpose of medical education.


Author(s):  
Andy Large ◽  
Jamshid Behesti ◽  
Alain Breuleux ◽  
Andre Renaud

From the 1994 CAIS Conference: The Information Industry in Transition McGill University, Montreal, Quebec. May 25 - 27, 1994.Multimedia products are now widely available on a variety of platforms, and there is a widespread assumption that the addition of still images, animation and sound to text will enhance any information product. The research reported in this paper investigates such claims for multimedia in an educational context and for a specific user group: grad-six primary school students. The students' ability to recall, make inferences from, and comprehend articles presented to them in print, as text on screen, and in mutlimedia format has been mesured. The findings to date suggest that the impact of multimedia is subtle, and that generalisations about the effectiveness of multimedia, at least with children in an educational context, should be employed cautionously. The long-term goal is to identify design criteria which can be employed in the production of multimedia products for schools.


2018 ◽  
Vol 7 (2.4) ◽  
pp. 46 ◽  
Author(s):  
Shubhanshi Singhal ◽  
Akanksha Kaushik ◽  
Pooja Sharma

Due to drastic growth of digital data, data deduplication has become a standard component of modern backup systems. It reduces data redundancy, saves storage space, and simplifies the management of data chunks. This process is performed in three steps: chunking, fingerprinting, and indexing of fingerprints. In chunking, data files are divided into the chunks and the chunk boundary is decided by the value of the divisor. For each chunk, a unique identifying value is computed using a hash signature (i.e. MD-5, SHA-1, SHA-256), known as fingerprint. At last, these fingerprints are stored in the index to detect redundant chunks means chunks having the same fingerprint values. In chunking, the chunk size is an important factor that should be optimal for better performance of deduplication system. Genetic algorithm (GA) is gaining much popularity and can be applied to find the best value of the divisor. Secondly, indexing also enhances the performance of the system by reducing the search time. Binary search tree (BST) based indexing has the time complexity of  which is minimum among the searching algorithm. A new model is proposed by associating GA to find the value of the divisor. It is the first attempt when GA is applied in the field of data deduplication. The second improvement in the proposed system is that BST index tree is applied to index the fingerprints. The performance of the proposed system is evaluated on VMDK, Linux, and Quanto datasets and a good improvement is achieved in deduplication ratio.


1996 ◽  
Vol 33 (1) ◽  
pp. 147-157 ◽  
Author(s):  
Henrik A. Thomsen ◽  
Kenneth Kisbye

State-of-the-art on-line meters for determination of ammonium, nitrate and phosphate are presented. The on-line meters employ different measuring principles and are available in many different designs differing with respect to size, calibration and cleaning principle, user-friendliness, response time, reagent and sample consumption. A study of Danish experiences on several plants has been conducted. The list price of an on-line meter is between USD 8000 and USD 35,000. To this should be added the cost of sample preparation, design, installation and running-in. The yearly operating for one meter are in the range of USD 200-2500 and the manpower consumption is in the range of 1-5 hours/month. The accuracy obtained is only slightly smaller than the accuracy on collaborative laboratory analyses, which is sufficient for most control purposes.


Mathematics ◽  
2020 ◽  
Vol 8 (9) ◽  
pp. 1522
Author(s):  
Ricardo F. Díaz ◽  
Blanca Sanchez-Robles

Increases in the cost of research, specialization and reductions in public expenditure in health are changing the economic environment for the pharmaceutical industry. Gains in productivity and efficiency are increasingly important in order for firms to succeed in this environment. We analyze empirically the performance of efficiency in the pharmaceutical industry over the period 2010–2018. We work with microdata from a large sample of European firms of different characteristics regarding size, main activity, country of origin and other idiosyncratic features. We compute efficiency scores for the firms in the sample on a yearly basis by means of non-parametric data envelopment analysis (DEA) techniques. Basic results show a moderate average level of efficiency for the firms which encompass the sample. Efficiency is higher for companies which engage in manufacturing and distribution than for firms focusing on research and development (R&D) activities. Large firms display higher levels of efficiency than medium-size and small firms. Our estimates point to a decreasing pattern of average efficiency over the years 2010–2018. Furthermore, we explore the potential correlation of efficiency with particular aspects of the firms’ performance. Profit margins and financial solvency are positively correlated with efficiency, whereas employee costs display a negative correlation. Institutional aspects of the countries of origin also influence efficiency levels.


Author(s):  
Sergio Balderrama ◽  
Gabriela Peña ◽  
Francesco Lombardi ◽  
Nicolo Stevanato ◽  
Andreas Sahlberg ◽  
...  

2022 ◽  
Vol 22 (1) ◽  
Author(s):  
Alen Brkic ◽  
Andreas P. Diamantopoulos ◽  
Espen Andre Haavardsholm ◽  
Bjørg Tilde Svanes Fevang ◽  
Lene Kristin Brekke ◽  
...  

Abstract Background In Norway, an annual tender system for the prescription of biologic and targeted synthetic disease-modifying antirheumatic drugs (b/tsDMARDs) has been used since 2007. This study aimed to explore annual b/tsDMARDs costs and disease outcomes in Norwegian rheumatoid arthritis (RA) patients between 2010 and 2019 under the influence of the tender system. Methods RA patients monitored in ordinary clinical practice were recruited from 10 Norwegian centers. Data files from each center for each year were collected to explore demographics, disease outcomes, and the prescribed treatment. The cost of b/tsDMARDs was calculated based on the drug price given in the annual tender process. Results The number of registered RA patients increased from 4909 in 2010 to 9335 in 2019. The percentage of patients receiving a b/tsDMARD was 39% in 2010 and 45% in 2019. The proportion of b/tsDMARDs treated patients achieving DAS28 remission increased from 42 to 67%. The estimated mean annual cost to treat a patient on b/tsDMARDs fell by 47%, from 13.1 thousand euros (EUR) in 2010 to 6.9 thousand EUR in 2019. The mean annual cost to treat b/tsDMARDs naïve patients was reduced by 75% (13.0 thousand EUR in 2010 and 3.2 thousand EUR in 2019). Conclusions In the period 2010–2019, b/tsDMARD treatment costs for Norwegian RA patients were significantly reduced, whereas DAS28 remission rates increased. Our data may indicate that the health authorities’ intention to reduce treatment costs by implementing a tender system has been successful.


Author(s):  
Jiao Ma ◽  
Colin G. Drury ◽  
Ann M. Bisantz

Training has been a consistently effective intervention in improving inspection performance. For example, existing inspection training in the aircraft maintenance domain is mainly a combination of classroom and on-the-job training (OJT). Computer-based training (CBT) has been promoted ever since it was introduced to this domain. In this study we investigate how effectively feedback training can be combined with CBT to improve visual inspection performance. Specifically, we examine the potential positive impacts of performance and process feedback in CBT, given in an on-line manner, on a trainee's performance and process assessment in a visual inspection task. The CBT system for inspection we used was adopted from the ASSIST program (Chen, Gramopadhye and Melloy, 2000). In our computer simulation of a familiar situation, participants were asked to search certain areas inside of a car in order to detect certain targets (dropped coins) with the aid of computerized tools (e.g., a magnifying glass, a flashlight), and fill out an inspection report based upon detection. A significant test effect was found across performance measures. Type of feedback training was found to be significant for search time. Performance measures were significantly correlated with target difficulty level; on-line performance feedback was significantly more efficient in improving performance measures than conventional delayed performance feedback; feedback training did affect process assessment measures.


1971 ◽  
Vol 4 (9) ◽  
pp. T151-T157 ◽  
Author(s):  
P D Roberts

The paper describes a digital simulation study of the application of a non-linear controller to the regulation of a single stage neutralisation process. In the controller, the proportional gain increases with amplitude of controller error signal. The performance of the non-linear controller is compared with that of a conventional linear controller and with the performance obtained by employing a linear controller with a linearisation network designed to compensate for the non-linear characteristic of the neutralisation curve. Although the performance of the non-linear controller is inferior to that obtained by employing a perfect linearisation network, its performance is still considerably superior to that obtained by using a conventional linear controller when operating at a symmetrical point on the neutralisation curve. In contrast to the linearisation network technique, the non-linear controller contains only one extra parameter and can be readily tuned on-line without prior knowledge of the neutralisation curve. Hence, it can be considered as an attractive alternative for the control of neutralisation processes.


Sign in / Sign up

Export Citation Format

Share Document