Design of an Enhanced Web Archiving System for Preserving Content Integrity with Blockchain

Hyun Cheon Hwang; Jin Gon Shon; Ji Su Park

doi:10.3390/electronics9081255

Design of an Enhanced Web Archiving System for Preserving Content Integrity with Blockchain

Electronics ◽

10.3390/electronics9081255 ◽

2020 ◽

Vol 9 (8) ◽

pp. 1255

Author(s):

Hyun Cheon Hwang ◽

Jin Gon Shon ◽

Ji Su Park

Keyword(s):

Reference Model ◽

Web Content ◽

Web Archiving ◽

Iso Standard ◽

Blockchain Technology ◽

Open Archival Information System ◽

Archive System ◽

Web Archive ◽

The Web

A Web archive system is a traditional subject for preserving web content for the future and the importance is getting more significant due to the explosive growth of web content. The reference model for an open archival information system (OAIS) has been advising guidance for a long-term archiving system and most organizations that archive web content follow this guidance. In addition, the web archive (WARC) ISO standard is for web content archiving. However, there is no way to secure content integrity, and it is hard to identify the original. Because of limitations, a web archive system has a weakness against the dispute of content integrity. In this paper, we proposed the blockchain linked (BCLinked) web archiving system, which uses blockchain technology and an extended WARC field to keep a web content integrity metadata into a blockchain. Furthermore, we designed the BCLinked web archiving system, and we confirmed the proposed system secures content integrity through the experiment.

Download Full-text

Climate change and web archives: an Ibero-American study based on the Portuguese and Brazilian contexts

Records Management Journal ◽

10.1108/rmj-11-2020-0039 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Moisés Rockembach ◽

Anabela Serrano

Keyword(s):

Climate Changes ◽

Digital Preservation ◽

Web Content ◽

Web Archiving ◽

Content Type ◽

Internet Archive ◽

Digital Heritage ◽

Object Of Study ◽

Web Archive ◽

The Web

Purpose The purpose of this investigation is to analyze information on the web and its preservation as a digital heritage, having as object of study information about events related to climate changes and the environment in Portugal and Brazil, thus contributing to an applied case of preservation of web in the Ibero-American context. Design/methodology/approach It is a theoretical and applied investigation and the methodology uses mixed methods, collecting and analyzing quantitative and qualitative data, from three data sources: the Internet Archive and public collections of Archive-it, the Portuguese web archive and a complementation from collections formed by the research group on web archiving and digital preservation in Brazil. Findings The web archiving initiatives started in 1996, however, over the years, the collections have been specializing, from nationally relevant themes, to thematic niches. The theme “climate changes” has had an impact on scientific and mainstream discussions in the 2000s, and in the years 2010 the theme becomes the focus of digital preservation of web content, as demonstrated in this study. To not preserve data can lead to a rapid loss of this information owing to the ephemerality of the web. Originality/value The originality of this paper is to show the relevance of preserving web content on climate changes, to demonstrate information on climate changes on the web that is currently preserved and what information would need to be preserved.

Download Full-text

Spacio-Temporal Analysis Using the Web Archive System Based on Ajax

Digital Libraries: Universal and Ubiquitous Access to Information - Lecture Notes in Computer Science ◽

10.1007/978-3-540-89533-6_34 ◽

2008 ◽

pp. 317-320

Author(s):

Suguru Yoshioka ◽

Masumi Morii ◽

Shintaro Matsushima ◽

Seiichi Tani

Keyword(s):

Temporal Analysis ◽

Archive System ◽

Web Archive ◽

The Web

Download Full-text

The historic context of web archiving and the web archive

The Historical Web and Digital Humanities ◽

10.4324/9781315231662-2 ◽

2019 ◽

pp. 13-28

Author(s):

Kees Teszelszky

Keyword(s):

Web Archiving ◽

Web Archive ◽

The Web

Download Full-text

A Supplementary Tool for Web-archiving Using Blockchain Technology

The African Journal of Information and Communication ◽

10.23962/10539/29194 ◽

2020 ◽

pp. 1-14

Author(s):

John E. De Villiers ◽

André P. Calitz

Keyword(s):

World Wide ◽

Uniform Resource Locator ◽

Web Archiving ◽

Initial Development ◽

Blockchain Technology ◽

Smart Contract ◽

Formidable Challenge ◽

The World ◽

Potential Tool ◽

The Web

The usefulness of a uniform resource locator (URL) on the World Wide Web is reliant on the resource being hosted at the same URL in perpetuity. When URLs are altered or removed, this results in the resource, such as an image or document, being inaccessible. While web-archiving projects seek to prevent such a loss of online resources, providing complete backups of the web remains a formidable challenge. This article outlines the initial development and testing of a decentralised application (DApp), provisionally named Repudiation Chain, as a potential tool to help address these challenges presented by shifting URLs and uncertain web-archiving. Repudiation Chain seeks to make use of a blockchain smart contract mechanism in order to allow individual users to contribute to web-archiving. Repudiation Chain aims to offer unalterable assurance that a specific file and its URL existed at a given point in time—by generating a compact, non-reversible representation of the file at the time of its non-repudiation. If widely adopted, such a tool could contribute to decentralisation and democratisation of web-archiving.

Download Full-text

Expressing Needs of Digital Audio-Visual Applications in Different Communities of Practice for Long-Term Preservation

Digital Curation ◽

10.4018/978-1-5225-6921-3.ch011 ◽

2018 ◽

pp. 234-258

Author(s):

Naresh Kumar ◽

Vittore Casarosa

Keyword(s):

Communities Of Practice ◽

Reference Model ◽

Analysis Data ◽

Research Approach ◽

Digital Audio ◽

Open Archival Information System ◽

Automatic Matching ◽

Structure Of Knowledge ◽

Long Term Preservation

Lack of awareness on preservation tools and applications is a big issue today. To solve it European Commission has initiated research project, Presto4U that aimed to enable semi-automatic matching of preservation tools with audio-visual needs. To express the audio-visual needs formally it has mapped a knowledge schema. The knowledge schema was first cut and needed evaluation in terms of its ability to represent the Needs of different communities of practice, classes, their association and ability to represent requirements of Audio-visual community through properties of its classes. This evaluative study is conducted through Qualitative research approach using Interview and Questionnaire. Open Archival Information System reference model is used as theoretical framework. Fourteen members from Europe of three communities of practice have provided their needs for analysis. Data was analysed through six stages. The study found that knowledge schema is useful to express the needs of communities of practice but collected data should easily fit into the structure of knowledge schema.

Download Full-text

Arquivamento da Web no contexto das Humanidades Digitais: da produção a preservação da informação digital | Web archiving in the context of digital humanities: from production to preservation of digital information

Liinc em Revista ◽

10.18617/liinc.v15i1.4578 ◽

2019 ◽

Vol 15 (1) ◽

Author(s):

Moisés Rockembach

Keyword(s):

Sense Of Community ◽

Digital Humanities ◽

Digital Content ◽

Digital Information ◽

Web Content ◽

Web Archiving ◽

Research And Practice ◽

Online Access ◽

Short Lifecycle

RESUMO Este trabalho procurou trazer as relações existentes entre as humanidades digitais, como campo emergente de estudos inter e transdisciplinares, e a área de estudo do arquivamento da web, que consiste em políticas, metodologias e tecnologias que envolvem a seleção, captura, armazenamento, preservação e disponibilização de conteúdos da web para acesso e uso retrospectivo. Sabendo-se que os conteúdos web possuem um ciclo de vida relativamente curto devido à rápida obsolescência tecnológica e dificuldades no armazenamento a longo prazo. e que diversos esforços são necessários para a preservação digital, observou-se que o senso de comunidade, como uma das características das humanidades digitais, pode ser um fator importante para um melhor desenvolvimento de arquivos da web, nacionais e em diversos outros países, já que se verifica que a grande parte dos países do hemisfério sul não possui arquivos da web e, portanto, correm o risco de perderem seus conteúdos web e sua memória digital.Palavras-chave: Arquivamento da Web; Humanidades Digitais; Preservação Digital; Comunidades de Pesquisa e Práticas.ABSTRACT This work aims to bring the existing relationships between Digital Humanities, as an emerging field of inter and transdisciplinary studies, and the field of web archiving, which consists of policies, methodologies and technologies that involve the selection, capture, storage, preservation and availability of web content for access and retrospective use. Web content has a relatively short lifecycle, since content producers generally do not care about keeping online access for the long term and that several efforts are required for success fully preserving digital content. Given this, it has been observed that the sense of community, one of the characteristics of Digital Humanities, can be an important factor for a better development of archives – both nationally and for other countries in the Southern Hemisphere, since most of these countries do not have web archives and, therefore, risk losing their web content and digital memory.Keywords: Web Archiving; Digital Humanities; Digital Preservation: Research and Practice Communities.

Download Full-text

How can we improve our web collection? An evaluation of webarchiving at the KB National Library of the Netherlands (2007–2017)

Alexandria The Journal of National and International Library and Information Issues ◽

10.1177/0955749017725930 ◽

2017 ◽

Vol 27 (2) ◽

pp. 94-107 ◽

Cited By ~ 2

Author(s):

Barbara Sierman ◽

Kees Teszelszky

Keyword(s):

The Netherlands ◽

Selection Criteria ◽

National Library ◽

Open Archival Information System ◽

Legal Restrictions ◽

Long Term Preservation ◽

Initial Selection ◽

The Web ◽

Selection Of

In 2007, the Koninklijke Bibliotheek, the Dutch National Library (KB-NL), started the project ‘webarchiving’ based on a selection of Dutch websites. The initial selection of 1000 websites has currently grown into over 12,000 selected websites, crawled at different intervals. Although due to legal restrictions the current use is limited to the KB-NL reading room, it is important that the KB-NL includes the requirements of the (future) users in its approach to creating a web collection. With respect to the long-term preservation of the collection, we also need to incorporate the requirements for long-term archiving in our approach, as described in the Open Archival Information System (OAIS) Model ISO 14721: 2012. This article describes the results of a research project on webarchiving and the web collection of archived sites in the KB-NL, investigating the following questions. What is webarchiving in the Netherlands? What are the selection criteria of KB-NL and how are these related to what can be found on the Dutch web by the contemporary user? What is the influence of the choice of tools we use to harvest the final archived website? Do we know enough of the value of the web collection and the potential usage of it by researchers and how can we improve this value? This article will describe the outcomes of the research, the conclusions and advice that can be drawn from it and it is hoped will inspire broader discussions about the essence of creating web collections for long-term preservation as part of cultural heritage.

Download Full-text

Collecting and preserving the Ukraine conflict (2014-2015): a web archive at University of California, Berkeley

Collection Building ◽

10.1108/cb-04-2016-0006 ◽

2016 ◽

Vol 35 (3) ◽

pp. 64-72 ◽

Cited By ~ 1

Author(s):

Liladhar R. Pendse

Keyword(s):

World Wide Web ◽

Russian Federation ◽

World Wide ◽

The Internet ◽

Web Archiving ◽

Content Type ◽

The World ◽

The Russian Federation ◽

Web Archive ◽

The Web

Purpose The purpose of this paper is to highlight the web-archiving as a tool for possible collection development in a research level academic library. The paper highlights the web-archiving project that dealt with the contemporary Ukraine conflict. Currently, as the conflict in Ukraine drags on, the need for collecting and preserving the information from various web-based resources with different ideological orientations acquires a special importance. The demise of the Soviet Union in 1991 and the emergence of independent republics were heralded by some as a peaceful transition to the “free-market” style economies. This transition was nevertheless nuanced and not seamless. Besides the incomplete market liberalization, rent-seeking behaviors of different sort, it was also accompanied by the almost ubiquitous use of and access to the internet and the internet communication technologies. Now 24 years later, the ongoing conflict in Ukraine also appears to be unfolding on the World Wide Web. With the Russian annexation of Crimea and its unification to the Russian Federation, the governmental and non-governmental websites of the Ukrainian Crimea suddenly came to represent a sort of “an endangered archive”. Design/methodology/approach The main purpose of this project was to make the information that is contained in Ukrainian and Russia websites available to the wider body of scholars and students over the longer period of time in a web archive. The author does not take any ideological stance on the legal status of Crimea or on the ongoing conflict in Ukraine. There are currently several projects that are devoted to the preservation of these websites. This article also focuses on providing a survey of the landscape of these projects and highlights the ongoing web-archiving project that is entitled, “the Ukraine Crisis: 2014-2015” at the UC Berkeley Library. Findings The UC Berkeley’s Ukraine Conflict Archive was made available to public in March of 2015 after enough materials were archived. The initial purpose of the archive was to selectively harvest, and archive those websites that are bound to either disappear or change significantly during the evolution of Crimea’s accession to Russia. However, in the aftermath of the Crimean conflict, the ensuing of military conflict in Ukraine had forced to reevaluate the web-archiving strategy. The project was never envisioned to be a competing project to the Ukraine Conflict project. Instead, it was supposed to capture complimentary data that could have been missed by other similar projects. This web archive has been made public to provide a glimpse of what was happening and what is happening in Ukraine. Research limitations/implications Now 24 years later, the ongoing conflict in Ukraine also appears to be unfolding on the World Wide Web. With the Russian annexation of Crimea and its unification to the Russian Federation, the governmental and non-governmental websites of the Ukrainian Crimea suddenly came to represent a sort of “an endangered archive”. The impetus for archiving the selected Ukrainian websites came as a result of the changing geopolitical realities of Crimea. The daily changes to the websites and also loss of information that is contained within them is one of the many problems faced by the users of these websites. In some cases, the likelihood of these websites is relatively high. This in turn was followed by the author’s desire to preserve the information about the daily lives in Ukraine’s east in light of the unfolding violent armed conflict. Originality/value Upon close survey of the Library and Information Sciences currently published articles on Ukraine Conflict, no articles that are currently dedicated to archiving the Crimean and Ukrainian situations were found.

Download Full-text

Archiwizacja Webu w Europie – narodowe archiwa Sieci

Archeion ◽

10.4467/26581264arc.20.016.12973 ◽

2020 ◽

pp. 445-465

Author(s):

Bartłomiej Konopa

Keyword(s):

Web Archiving ◽

Web Resources ◽

It Implementation ◽

Web Archives ◽

The World ◽

Web Archive ◽

The Web

WEB archiving in Europe – National WEB Archives Web archiving, that is activities aimed at collecting and preserving Web resources, has been carried out for almost 25 years. During this time, many projects have been created to fulfill that task, as well as several organizations, such as the International Internet Preservation Consortium, that support it implementation. The article presents the development of activities in this area, and then presents the conclusions of the analysis of the functioning of selected European national Web archives, based on publicly available materials concerning them. This analysis was intended to examine how the Web is currently archived in this part of the world. Three main issues were considered: gathering, describing and access to the resources of the former WWW. The first of them covers the scope of archiving, namely determining what materials are subject to it, as well as the gathering strategies used for this purpose, which shape the archival collections. The second concerns the metadata and other elements used to convey information about what was collected during that process. The last element of the analysis includes the scope of access to archival WWW resources, existing restrictions and their causes, as well as the tools used for this. During the research, the author also became interested in the software used in individual projects. The obtained results show that the model of Web archive has been developed and the activities of the analyzed initiatives in Europe are very similar.

Download Full-text

Towards a COST MOBILISE Guideline for Long Term Preservation and Archiving of Data Constructs from Scientific Collections Facilities

Biodiversity Information Science and Standards ◽

10.3897/biss.5.73901 ◽

2021 ◽

Vol 5 ◽

Author(s):

Dagmar Triebel ◽

Dragan Ivanovic ◽

Gila Kahila Bar-Gal ◽

Sven Bingert ◽

Tanja Weibulat

Keyword(s):

Data Storage ◽

Reference Model ◽

Supplementary Information ◽

Cost Models ◽

Digital Object ◽

Open Archival Information System ◽

Data Products ◽

Scientific Collections ◽

Long Term Preservation

COST (European Cooperation in Science and Technology) is a funding organisation for research and innovation networks. One of the objectives of the COSTAction called “Mobilising Data, Policies and Experts in Scientific Collections“ (MOBILISE) is to work on documents for expert training with broad involvement of professionals from the participating European countries. The guideline presented here in its general concept will address principles, strategies and standards for long term preservation and archiving of data constructs (data packages, data products) as addressed by and under control of the scientific collections community. The document is being developed as part of the MOBILISE Action targeted towards primarily scientific staff at natural scientific collection facilities, as well as management bodies of collections like museums, herbaria and information technology personnel less familiar with data archiving principles and routines. The challenges of big data storage and (distributed, cloud-based) storage solutions as well as that of data mirroring, backing up, synchronisation and publication in productive data environments are well addressed by documents, guidelines and online platforms, e.g., in the DISSCo knowledge base (see Hardisty et al. (2020)) and as part of concepts of the European Open Science Cloud (EOSC). Archival processes and the resulting data constructs, however, are often left outside of the considerations. This is a large gap because archival issues are not only simple technical ones as addressed by the term “bit preservation” but also envisage a number of logical, functional, normative, administrative and semantic issues as addressed by the term “functional long-term archiving”. The main target digital object types addressed by this COST MOBILISE Guideline are data constructs called Digital or Digital Extended Specimens and data products with the persistent identifier assignment lying under the authority of scientific collections facilities. Such digital objects are specified according to the Digital Object Architecture (DOA , see Wittenburg et al. 2018) and similar abstract models introduced by Harjes et al. (2020) and Lannom et al. (2020). The scientific collection-specific types are defined following evolving concepts in the context of the Consortium of European Taxonomic Facilities (CETAF), the research infrastructure DiSSCo (Distributed System of Scientific Collections), and the Biodiversity Information Standards (TDWG). Archival processes are described following the OAIS (Open Archival Information System) reference model. The archived objects should be reusable in the sense of the FAIR (Findable, Accessible, Interoperable, and Reusable) guiding principles. Organisations like national (digital) archives, computing or professional (domain-specific) data centers as well as libraries might offer specific archiving services and act as partner organisations of scientific collections facilities. The guideline consists of key messages that have been defined. They address the collection community, especially the staff and leadership of taxonomic facilities. Aspects of several groups of stakeholders are discussed as well as cost models. The guideline does not recommend specific solutions for archiving software and workflows. Supplementary information is delivered via a wiki-based platform for the COST MOBILISE Archiving Working Group WG4.

Download Full-text