Indexing Highly Repetitive String Collections, Part I

2021 ◽  
Vol 54 (2) ◽  
pp. 1-31
Author(s):  
Gonzalo Navarro

Two decades ago, a breakthrough in indexing string collections made it possible to represent them within their compressed space while at the same time offering indexed search functionalities. As this new technology permeated through applications like bioinformatics, the string collections experienced a growth that outperforms Moore’s Law and challenges our ability to handle them even in compressed form. It turns out, fortunately, that many of these rapidly growing string collections are highly repetitive, so that their information content is orders of magnitude lower than their plain size. The statistical compression methods used for classical collections, however, are blind to this repetitiveness, and therefore a new set of techniques has been developed to properly exploit it. The resulting indexes form a new generation of data structures able to handle the huge repetitive string collections that we are facing. In this survey, formed by two parts, we cover the algorithmic developments that have led to these data structures. In this first part, we describe the distinct compression paradigms that have been used to exploit repetitiveness, and the algorithmic techniques that provide direct access to the compressed strings. In the quest for an ideal measure of repetitiveness, we uncover a fascinating web of relations between those measures, as well as the limits up to which the data can be recovered, and up to which direct access to the compressed data can be provided. This is the basic aspect of indexability, which is covered in the second part of this survey.

2021 ◽  
Vol 54 (2) ◽  
pp. 1-32
Author(s):  
Gonzalo Navarro

Two decades ago, a breakthrough in indexing string collections made it possible to represent them within their compressed space while at the same time offering indexed search functionalities. As this new technology permeated through applications like bioinformatics, the string collections experienced a growth that outperforms Moore’s Law and challenges our ability of handling them even in compressed form. It turns out, fortunately, that many of these rapidly growing string collections are highly repetitive, so that their information content is orders of magnitude lower than their plain size. The statistical compression methods used for classical collections, however, are blind to this repetitiveness, and therefore a new set of techniques has been developed to properly exploit it. The resulting indexes form a new generation of data structures able to handle the huge repetitive string collections that we are facing. In this survey, formed by two parts, we cover the algorithmic developments that have led to these data structures. In this second part, we describe the fundamental algorithmic ideas and data structures that form the base of all the existing indexes, and the various concrete structures that have been proposed, comparing them both in theoretical and practical aspects, and uncovering some new combinations. We conclude with the current challenges in this fascinating field.


2018 ◽  
Vol 89 ◽  
pp. 82-93
Author(s):  
Pedro Correia ◽  
Luís Paquete ◽  
José Rui Figueira

Author(s):  
Ana María Gil Antón

Este trabajo aborda, de manera sintética, uno de los problemas más relevantes con los que nos estamos encontrando en el Siglo XXI resultado del fenómeno de Internet, el de las redes sociales que constituyen vías consolidadas de relación e interacción cotidianas, no sólo de las nuevas generaciones de adolescentes y jóvenes, sino también de todo el conjunto de nuestra sociedad. Y pese a que la utilización de las nuevas Tecnologías de la Información y Comunicación ofrece grandes oportunidades y ventajas, no puede obviarse igualmente que éstas nos pueden situar en la sociedad del riesgo, por cuanto que pueden entrañar múltiples peligros, entre los que cobra una especial relevancia la posibilidad de conculcación de los derechos fundamentales a la intimidad, al honor, a la propia imagen y a la protección de datos personales, bien individualmente considerados o, bien de forma conjunta, acrecentándose los citados riesgos entre jóvenes y adolescentes, en cuanto usuarios indiscriminados. Pero, a éstos se añaden además otros riesgos por conductas delictivas, como el denominado Ciberacoso.This research recollects in a synthetic way, one of the most relevant problems the society is facing today, as a consequence of the Internet phenomenon. The routes of social Networks in the daily relations and interactions are consolidating in such a way that is not only affecting the young teenagers and the new generation, but also the whole of our society. In spite of the fact that, the utilization of new Technology of Information and Communication offer great opportunities and have many advantages, however, one should not ignore that this situation is putting the society at risk. This phenomenon contains many dangers, as well as the possibility of violating the fundamental laws to intimacy, to the honor, to one’s own image and to the personal data protection, being individually considered or as a whole form in conjunction of the mentioned risks between the youth and adults users. Moreover, there will be an increase of this risk, because of criminal behaviors as Ciber bullying.


Author(s):  
Steffen Paeper ◽  
Bryce Brown ◽  
Thomas Beuker

A new generation of geometry sensor for ILI tools has been developed. This sensor provides highly accurate geometry data of the internal pipe contour. The technology uses the benefits of a touchless distance measurement in combination with the advantages of a mechanical caliper arm. The complementary interaction allow the measurement of accurate data under demanding operational conditions. The geometry sensor technology can be combined with a navigation unit and the high resolution MFL inspection technology on so called multi-purpose ILI-tools. The merging of different inspection tasks on a single tool is an economic solution to create and add to an ILI-database for integrity management. Field experience with this new technology will be discussed, based on more than 500 miles inspected pipeline. Most inspections were performed in the US and Canada. The operational performance of the sensors justified the new design.


2002 ◽  
Vol 124 (2) ◽  
pp. 126-133 ◽  
Author(s):  
Eduardo Zarza ◽  
Loreto Valenzuela ◽  
Javier Leo´n ◽  
H.-Dieter Weyers ◽  
Martin Eickhoff ◽  
...  

The DISS (DIrect Solar Steam) project is a complete R+TD program aimed at developing a new generation of solar thermal power plants with direct steam generation (DSG) in the absorber tubes of parabolic trough collectors. During the first phase of the project (1996-1998), a life-size test facility was implemented at the Plataforma Solar de Almerı´a (PSA) to investigate the basic DSG processes under real solar conditions and evaluate the unanswered technical questions concerning this new technology. This paper updates DISS project status and explains O&M-related experience (e.g., main problems faced and solutions applied) with the PSA DISS test facility since January 1999.


1988 ◽  
Vol 13 (2) ◽  
pp. 27-31 ◽  
Author(s):  
James C. Boyles

New technology is giving researchers greater independence in their use of bibliographic databases. Art librarians should promote ‘end-user services’ which provide library users with direct access to online databases, although there are a number of problems which are liable to detract from the efficiency and thoroughness associated with computer-assisted searching.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 25949-25963
Author(s):  
Carlos Quijada Fuentes ◽  
Miguel R. Penabad ◽  
Susana Ladra ◽  
Gilberto Gutierrez Retamal

Author(s):  
Soumen Chakrabarti ◽  
Sasidhar Kasturi ◽  
Bharath Balakrishnan ◽  
Ganesh Ramakrishnan ◽  
Rohit Saraf

Sign in / Sign up

Export Citation Format

Share Document