scholarly journals Real-World Data Difficulty Estimation with the Use of Entropy

Entropy ◽  
2021 ◽  
Vol 23 (12) ◽  
pp. 1621
Author(s):  
Przemysław Juszczuk ◽  
Jan Kozak ◽  
Grzegorz Dziczkowski ◽  
Szymon Głowania ◽  
Tomasz Jach ◽  
...  

In the era of the Internet of Things and big data, we are faced with the management of a flood of information. The complexity and amount of data presented to the decision-maker are enormous, and existing methods often fail to derive nonredundant information quickly. Thus, the selection of the most satisfactory set of solutions is often a struggle. This article investigates the possibilities of using the entropy measure as an indicator of data difficulty. To do so, we focus on real-world data covering various fields related to markets (the real estate market and financial markets), sports data, fake news data, and more. The problem is twofold: First, since we deal with unprocessed, inconsistent data, it is necessary to perform additional preprocessing. Therefore, the second step of our research is using the entropy-based measure to capture the nonredundant, noncorrelated core information from the data. Research is conducted using well-known algorithms from the classification domain to investigate the quality of solutions derived based on initial preprocessing and the information indicated by the entropy measure. Eventually, the best 25% (in the sense of entropy measure) attributes are selected to perform the whole classification procedure once again, and the results are compared.

Author(s):  
Monika Siejka

One of the main tasks of real estate management in the area of the municipality is making decisions concerning the location of investments on a local scale. These decisions should be taken with the principle of sustainable development. For such an action obliges Poland's membership in the European Union. Poland as a member of the EU is obliged to implement the rules in force in the Member States. Bearing in mind that any investment impact directly or indirectly on the economic development of the municipality, is therefore a significant impact on the local real estate market. Investments that have a negative impact on the environment can contribute to a reduction in the activity of the local real estate market. While performing tasks related to the economic development of the region and the increase in quality of life, increases the activity of the local real estate market. The work was carried out research on the dynamics of changes in the local real estate market in the area of the municipality Skrzyszow in the Malopolska province in Poland, in connection with the construction of the reservoir.


Author(s):  
Deepali Virmani ◽  
Nikita Jain ◽  
Ketan Parikh ◽  
Shefali Upadhyaya ◽  
Abhishek Srivastav

This article describes how data is relevant and if it can be organized, linked with other data and grouped into a cluster. Clustering is the process of organizing a given set of objects into a set of disjoint groups called clusters. There are a number of clustering algorithms like k-means, k-medoids, normalized k-means, etc. So, the focus remains on efficiency and accuracy of algorithms. The focus is also on the time it takes for clustering and reducing overlapping between clusters. K-means is one of the simplest unsupervised learning algorithms that solves the well-known clustering problem. The k-means algorithm partitions data into K clusters and the centroids are randomly chosen resulting numeric values prohibits it from being used to cluster real world data containing categorical values. Poor selection of initial centroids can result in poor clustering. This article deals with a proposed algorithm which is a variant of k-means with some modifications resulting in better clustering, reduced overlapping and lesser time required for clustering by selecting initial centres in k-means and normalizing the data.


2015 ◽  
Vol 15 (1) ◽  
pp. 162-173
Author(s):  
Sebastian G. Kokot

Abstract The observation of price movements on the real estate market is an extremely difficult task as we have to face problems belonging to two spheres. The first of them is the specific nature of real estate as marketable objects and of the real estate market itself. The second one is the character and quality of data on real estate transaction prices. In this article the author, based on an empirical study, attempts to prove that even in a single segment of a local real estate market the prices in individual sub-segments can fluctuate with different intensity. The range of price movements can be so vast that it seems pointless to apply a single averaged price index for the whole segment, and usually that is what analysts do.


Author(s):  
Giovanni Corrao ◽  
Giovanni Alquati ◽  
Giovanni Apolone ◽  
Andrea Ardizzoni ◽  
Giuliano Buzzetti ◽  
...  

The current COVID pandemic crisis made it even clearer that the solutions to several questions that public health must face require the access to good quality data. Several issues of the value and potential of health data and the current critical issues that hinder access are discussed in this paper. In particular, the paper (i) focuses on “real-world data” definition; (ii) proposes a review of the real-world data availability in our country; (iii) discusses its potential, with particular focus on the possibility of improving knowledge on the quality of care provided by the health system; (iv) emphasizes that the availability of data alone is not sufficient to increase our knowledge, underlining the need that innovative analysis methods (e.g., artificial intelligence techniques) must be framed in the paradigm of clinical research; and (v) addresses some ethical issues related to their use. The proposal is to realize an alliance between organizations interested in promoting research aimed at collecting scientifically solid evidence to support the clinical governance of public health.


2019 ◽  
Vol 30 ◽  
pp. v744-v745
Author(s):  
T. Kosmidis ◽  
B. Athanasakou ◽  
P.A. Kosmidis

2013 ◽  
Vol 16 (7) ◽  
pp. A511
Author(s):  
S. Purwins ◽  
C. Spehr ◽  
M. Augustin ◽  
M.A. Radtke ◽  
K. Reich ◽  
...  

Blood ◽  
2021 ◽  
Vol 138 (Supplement 1) ◽  
pp. 3062-3062
Author(s):  
Yan Li ◽  
Junping Zhang ◽  
Hui Wei ◽  
Ying Wang ◽  
Bingcheng Liu ◽  
...  

Abstract Introduction Major resources of our current knowledge on acute leukemia epidemiology and prognosis are based on data from clinical trials. Due to the selective bias of clinical trials, data might differ from the general leukemia population in real-life setting. National Clinical Research Center for Blood Disease established a comprehensive database through the electronic health records (EHR) to facilitate research of the hematologic cancers i.e. acute myeloid leukemia (AML), acute lymphoblastic leukemia (ALL), and acute promyelocytic leukemia (APL). The aim of the database is to gain insight into the epidemiology of these cancers, to evaluate treatment responses, to compare results between geographical regions of China. Furthermore, with the privilege of national research center, the database expects to identify prognostic and predictive factors for outcome to improve the quality of treatment and patients care. Methods The database development was initiated in 2001. Standard data elements were established to capture the key clinical variables. For individual patients, data from EHRs were extracted, integrated and quality checked. The implement of database facilitated the clinical professions to identify eligible patients, establish research projects, conduct retrospective analysis and follow-up patient outcomes. Continued efforts were made for improving the construction and quality of the database over two decades. We performed a 10-year real-world data review in the database to evaluate the quality of the recorded data and, moreover to describe the clinical, cytogenetic characteristics and survival of acute leukemia patients. The completeness for collected variables was acceptable for statistical analysis. In total, 3,404 patients (1,895 males and 1,509 females) who were diagnosed and treated between Jan. 1, 2010 and Dec. 31, 2020 were enrolled. A substantial proportion (>60%) of patients were residents of the northern and northeast region of China. Demographic and baseline characteristics also included age, age class, baseline blood test, transplantation and research participation. Molecular mutations such as nucleophosmin-1 (NPM1), FMS-related tyrosine kinase 3 (FLT3), and CCAAT/enhancer-binding protein alpha (CEBPA) et al were included in the screening panels. We explored the treatment remission rate and prognosis of different chromosomal karyotype groups among AML patients. Results The patient numbers of the AML, ALL and APL subgroups were 2,345, 769 and 290 respectively. Blood routine results well demonstrated the clinical characteristics of each subgroup (Tbl. 1). In AML group, the frequencies of NPM1, FLT3-ITD, KIT and CEBPA double mutations were 17.9%, 13.2%, 8.7% and 10.1%, respectively (Tbl. 2). In term of ALL, 640 cases (83.2%) were B-ALL and 129 (16.8%) were T-ALL. Among B-ALL, 256 cases (33.3%) were Ph positive. 10-year analysis for overall survival shown that AML patients had better outcomes as compared with ALL group (Fig. 1). In this database, 1,780 AML cases (excluding APL) were enrolled in cytogenetic analysis. The survival rates of different cytogenetic risk groups from our real-world data were separated by the ELN2017 and MRC risk stratification respectively (Fig. 2A-B). Remarkably, we found two rare but recurrent abnormalities, 16 cases with t(7;11) (p15;p15) and 12 cases with t(16;21)(p11;q22/q24;q22). Cases showed high relapse and mortality rate. Compared with the normal karyotype group, the survival of both subentities was worse and transplantation might be recommended in CR1 phase (Fig. 2C), therefore, we recommend that these two subtypes might be regarded as the worse risk group, although neither is mentioned in the current guidelines. The incidence of t(8;21) in our database was 17.9% (Fig. 3). To explore the impact of additional chromosomal abnormalities on the prognosis of t(8;21), we found that the overall survival of patients with additional trisomy 4 was worse than those without trisomy 4 (Fig. 2D), which was rarely mentioned in previous reports. Conclusion The real-world database is of great importance for defining the comprehensive features of AML, APL and ALL in clinical setting. The results offered a remarkable contribution to our knowledge on acute leukemia and identified the prognosis of rare chromosomal karyotype in AML. Figure 1 Figure 1. Disclosures Wang: AbbVie: Consultancy; Astellas Pharma, Inc.: Research Funding.


Sign in / Sign up

Export Citation Format

Share Document