Indexing Blocks to Reduce Space and Time Requirements for Searching Large Data Files

Author(s):  
Tzuhsien Wu ◽  
Hao Shyng ◽  
Jerry Chou ◽  
Bin Dong ◽  
Kesheng Wu
Author(s):  
Bill Trevillion

Abstract Radian Corporation has developed extensive data display capabilities to analyze vibration and acoustic data from structures and rotating equipment. The Machinery Interactive Display and Analysis System (MIDAS) displays data collected through the acquisition functions of MIDAS. The graphics capabilities include displaying spectra in three-dimensional waterfall and in X-Y formats. Both types of plots can relate vibrations to time, equipment speed, or process parameters. Using menu-driven parameter selection, data can be displayed in formats that are the most useful for analysis. The system runs on a popular mini-computer, and it can be used with a great variety of graphics terminals, workstations, and printer/plotters. The software was designed and written for interactive display and plotting. Automatic plotting of large data files is facilitated by a batch plotting mode. The user can define display formats for the analysis of noise and vibration problems in the electric utility, chemical processing, paper, and automotive industries. This paper describes the history and development of graphics capabilities of the MIDAS system. The system, as illustrated in the examples, has proven efficient and economical for displaying large quantities of data.


2016 ◽  
Vol 13 (1) ◽  
pp. 181-186 ◽  
Author(s):  
Dawna M. Drum ◽  
Andrew Pulvermacher

ABSTRACT Modern organizations are inundated with data, and they often struggle to organize it in an efficient and effective manner in order to get the most value from the data. The context of this case is, thus, situated in current business practice. Students are given large data files that were extracted from an enterprise system. They must use Microsoft Access and Excel to summarize and organize the data to create a dynamic profit and loss statement. Basic skills in Excel and general accounting knowledge are assumed, while Access knowledge is not assumed. The Teaching Notes provide solutions and are organized to allow instructors to provide minimal guidance or fully annotated directions.


1997 ◽  
Vol 3 (S2) ◽  
pp. 931-932 ◽  
Author(s):  
Ian M. Anderson ◽  
Jim Bentley

Recent developments in instrumentation and computing power have greatly improved the potential for quantitative imaging and analysis. For example, products are now commercially available that allow the practical acquisition of spectrum images, where an EELS or EDS spectrum can be acquired from a sequence of positions on the specimen. However, such data files typically contain megabytes of information and may be difficult to manipulate and analyze conveniently or systematically. A number of techniques are being explored for the purpose of analyzing these large data sets. Multivariate statistical analysis (MSA) provides a method for analyzing the raw data set as a whole. The basis of the MSA method has been outlined by Trebbia and Bonnet.MSA has a number of strengths relative to other methods of analysis. First, it is broadly applicable to any series of spectra or images. Applications include characterization of grain boundary segregation (position-), of channeling-enhanced microanalysis (orientation-), or of beam damage (time-variation of spectra).


2019 ◽  
Vol 16 (9) ◽  
pp. 3824-3829
Author(s):  
Deepak Ahlawat ◽  
Deepali Gupta

Due to advancement in the technological world, there is a great surge in data. The main sources of generating such a large amount of data are social websites, internet sites etc. The large data files are combined together to create a big data architecture. Managing the data file in such a large volume is not easy. Therefore, modern techniques are developed to manage bulk data. To arrange and utilize such big data, Hadoop Distributed File System (HDFS) architecture from Hadoop was presented in the early stage of 2015. This architecture is used when traditional methods are insufficient to manage the data. In this paper, a novel clustering algorithm is implemented to manage a large amount of data. The concepts and frames of Big Data are studied. A novel algorithm is developed using the K means and cosine-based similarity clustering in this paper. The developed clustering algorithm is evaluated using the precision and recall parameters. The prominent results are obtained which successfully manages the big data issue.


1990 ◽  
Vol 73 (7) ◽  
pp. 1945-1955 ◽  
Author(s):  
V. Ducrocq ◽  
D. Boichard ◽  
B. Bonaiti ◽  
A. Barbat ◽  
M. Briend

2021 ◽  
Author(s):  
Matthias Schneider ◽  
Benjamin Ertl ◽  
Christopher J. Diekmann ◽  
Farahnaz Khosrawi ◽  
Andreas Weber ◽  
...  

Abstract. IASI (Infrared Atmospheric Sounding Interferometer) is the core instrument of the currently three Metop (Meteorological operational) satellites of EUMETSAT (European Organization for the Exploitation of Meteorological Satellites). The MUSICA IASI processing has been developed in the framework of the European Research Council project MUSICA (MUlti-platform remote Sensing of Isotopologues for investigating the Cycle of Atmospheric water). The processor performs an optimal estimation of the vertical distributions of water vapour (H2O), the ratio between two water vapour isotopologues (the HDO / H2O ratio), nitrous oxide (N2O), methane (CH4), and nitric acid (HNO3), and works with IASI radiances measured under cloud-free conditions in the spectral window between 1190 and 1400 cm−1. The retrieval of the trace gas profiles is performed on a logarithmic scale, which allows the constraint and the analytic treatment of ln[HDO] – ln[H2O] as proxy for the HDO / H2O ratio. Currently, the MUSICA IASI processing has been applied to all IASI measurements available between October 2014 and April 2020, so more than 1.4 billion individual retrievals have been performed. Here we describe the MUSICA IASI full retrieval product data set. The data set is made available in form of netcdf data files that are compliant with version 1.7 of the CF (Climate and Forecast) metadata convention. For each orbit an individual standard output data file is provided. These files contain for each individual retrieval information on the a priori usage and constraint, the retrieved atmospheric trace gas and temperature profiles, profiles of the leading error components, information on vertical representativeness in form of the averaging kernels as well as averaging kernel metrics, which are more handy than the full kernels. We discuss data filtering options and give examples of the high horizontal and continuous temporal coverage of the MUSICA IASI data products. The standard output data files provide comprehensive information for each individual retrieval resulting in a rather large data volume (about 25 TB for the more than five years of data with global daily coverage). This at a first glance apparent drawback of large data files and data volume is counterbalanced by multiple possibilities of data reusability, which are briefly discussed. In an extended output data file the same variables as in the standard output data files are provided in addition to Jacobians for many different uncertainty sources and Gain matrices (due to this additional variables it is called the extended output). It is limited to 74 observations over a polar, mid-latitudinal and tropical site. We use this additional Jacobian and Gain data for assessing the typical impact of different uncertainty sources – like surface emissivity or spectroscopic parameters – and different cloud types on the retrieval results. We offer two data packages with DOI for free download via the repository RADAR4KIT. The first data package has a data volume of about 17.5 GB and is linked to https://doi.org/10.35097/408 (Schneider, et al., 2021b). It contains example standard output data files for all MUSICA IASI retrievals made for a single day (more than 0.6 million). Furthermore, it includes a ReadMe.pdf file with a description of how to access the total data set (the 25 TB) or parts of it. This data package is for users interested in the typical global daily data coverage and in information about how to download the large data volumes of global daily data for longer periods. The second data package is linked to https://doi.org/10.35097/412 (Schneider et al., 2021a) and contains the extended output data file. Because it provides data for only 74 example retrievals, its data volume is only 73 MB and it is thus recommended to users for having a quick look on the data.


Sign in / Sign up

Export Citation Format

Share Document