Community Clustering Algorithm in Complex Networks Based on Microcommunity Fusion

Mathematical Problems in Engineering ◽

10.1155/2015/754029 ◽

2015 ◽

Vol 2015 ◽

pp. 1-8 ◽

Cited By ~ 1

Author(s):

Jin Qi ◽

Fei Jiang ◽

Xiaojun Wang ◽

Bin Xu ◽

Yanfei Sun

Keyword(s):

Complex Networks ◽

Clustering Algorithm ◽

Test Results ◽

Solution Quality ◽

Data Set ◽

Community Mining ◽

Effectiveness And Efficiency ◽

Definition Of ◽

Community Clustering ◽

Mining Algorithms

With the further research on physical meaning and digital features of the community structure in complex networks in recent years, the improvement of effectiveness and efficiency of the community mining algorithms in complex networks has become an important subject in this area. This paper puts forward a concept of the microcommunity and gets final mining results of communities through fusing different microcommunities. This paper starts with the basic definition of the network community and appliesExpansionto the microcommunity clustering which provides prerequisites for the microcommunity fusion. The proposed algorithm is more efficient andhas higher solution qualitycompared with other similar algorithms through the analysis of test results based on network data set.

Download Full-text

Dimensional Reduction of Data for Anomaly Detection and Speed Performance using PCA and DBSCAN

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1041.1291s219 ◽

2019 ◽

Vol 9 (1S2) ◽

pp. 39-41

Keyword(s):

Anomaly Detection ◽

Clustering Algorithm ◽

Spatial Clustering ◽

Test Results ◽

High Complexity ◽

Medical Sciences ◽

Data Set ◽

Network Intrusion ◽

Speed Performance ◽

Data Points

Anomaly detection is the major problem facing by many of industries. It includes network intrusion and medical sciences. Several fields like Astronomy and research also facing difficulties in finding effective anomaly detection. They have included several techniques to solve such problems. Clustering is the technique which has been employed by many of the researchers. The most commonly used algorithm to perform clustering is DBSCAN. It is well known clustering algorithm used in data mining and Machine learning. It is referred as Density based spatial clustering of application with noise. Because of its high complexity in computation, it must be decreased in terms of dimensionality of data points. PCA is a method used then to reduce dimensionality and produced a new data set which is again undergo DBSCAN. Here by the nature of the test results was precise there by such a methodology can be adjusted. The mix of PCA and DBSCAN was acutely confirmed and resultant examination shows that a speedup of 25% was improved while the quality was 80% diminishing the dimensionality of informational index of half.

Download Full-text

An approach for document retrieval using cluster-based inverted indexing

Journal of Information Science ◽

10.1177/01655515211018401 ◽

2021 ◽

pp. 016555152110184

Author(s):

Gunjan Chandwani ◽

Anil Ahlawat ◽

Gaurav Dubey

Keyword(s):

High Performance ◽

Clustering Algorithm ◽

Pearson Correlation ◽

Relevant Information ◽

Document Retrieval ◽

Bhattacharyya Distance ◽

Data Set ◽

Query Matching ◽

Inverted Indexing ◽

Query Optimisation

Document retrieval plays an important role in knowledge management as it facilitates us to discover the relevant information from the existing data. This article proposes a cluster-based inverted indexing algorithm for document retrieval. First, the pre-processing is done to remove the unnecessary and redundant words from the documents. Then, the indexing of documents is done by the cluster-based inverted indexing algorithm, which is developed by integrating the piecewise fuzzy C-means (piFCM) clustering algorithm and inverted indexing. After providing the index to the documents, the query matching is performed for the user queries using the Bhattacharyya distance. Finally, the query optimisation is done by the Pearson correlation coefficient, and the relevant documents are retrieved. The performance of the proposed algorithm is analysed by the WebKB data set and Twenty Newsgroups data set. The analysis exposes that the proposed algorithm offers high performance with a precision of 1, recall of 0.70 and F-measure of 0.8235. The proposed document retrieval system retrieves the most relevant documents and speeds up the storing and retrieval of information.

Download Full-text

Haemostatic and thrombo-embolic complications in pregnant women with COVID-19: a systematic review and critical analysis

BMC Pregnancy and Childbirth ◽

10.1186/s12884-021-03568-0 ◽

2021 ◽

Vol 21 (1) ◽

Cited By ~ 2

Author(s):

Juliette Servante ◽

Gill Swallow ◽

Jim G. Thornton ◽

Bethan Myers ◽

Sandhya Munireddy ◽

...

Keyword(s):

Venous Thrombosis ◽

Pregnant Women ◽

Clinical Evidence ◽

Case Reports ◽

Standard Form ◽

Test Results ◽

Maternal Deaths ◽

High Clinical Suspicion ◽

Increased Risk ◽

Definition Of

Abstract Background As pregnancy is a physiological prothrombotic state, pregnant women may be at increased risk of developing coagulopathic and/or thromboembolic complications associated with COVID-19. Methods Two biomedical databases were searched between September 2019 and June 2020 for case reports and series of pregnant women with a diagnosis of COVID-19 based either on a positive swab or high clinical suspicion where no swab had been performed. Additional registry cases known to the authors were included. Steps were taken to minimise duplicate patients. Information on coagulopathy based on abnormal coagulation test results or clinical evidence of disseminated intravascular coagulation (DIC), and on arterial or venous thrombosis, were extracted using a standard form. If available, detailed laboratory results and information on maternal outcomes were analysed. Results One thousand sixty-three women met the inclusion criteria, of which three (0.28, 95% CI 0.0 to 0.6) had arterial and/or venous thrombosis, seven (0.66, 95% CI 0.17 to 1.1) had DIC, and a further three (0.28, 95% CI 0.0 to 0.6) had coagulopathy without meeting the definition of DIC. Five hundred and thirty-seven women (56%) had been reported as having given birth and 426 (40%) as having an ongoing pregnancy. There were 17 (1.6, 95% CI 0.85 to 2.3) maternal deaths in which DIC was reported as a factor in two. Conclusions Our data suggests that coagulopathy and thromboembolism are both increased in pregnancies affected by COVID-19. Detection of the former may be useful in the identification of women at risk of deterioration.

Download Full-text

Has the Credibility of the Social Sciences Been Credibly Destroyed? Reanalyzing the “Many Analysts, One Data Set” Project

Socius Sociological Research for a Dynamic World ◽

10.1177/23780231211024421 ◽

2021 ◽

Vol 7 ◽

pp. 237802312110244

Author(s):

Katrin Auspurg ◽

Josef Brüderl

Keyword(s):

Social Sciences ◽

Causal Reasoning ◽

Research Question ◽

Social Science Research ◽

Science Research ◽

Skin Tone ◽

Data Set ◽

The Social ◽

The Many ◽

Definition Of

In 2018, Silberzahn, Uhlmann, Nosek, and colleagues published an article in which 29 teams analyzed the same research question with the same data: Are soccer referees more likely to give red cards to players with dark skin tone than light skin tone? The results obtained by the teams differed extensively. Many concluded from this widely noted exercise that the social sciences are not rigorous enough to provide definitive answers. In this article, we investigate why results diverged so much. We argue that the main reason was an unclear research question: Teams differed in their interpretation of the research question and therefore used diverse research designs and model specifications. We show by reanalyzing the data that with a clear research question, a precise definition of the parameter of interest, and theory-guided causal reasoning, results vary only within a narrow range. The broad conclusion of our reanalysis is that social science research needs to be more precise in its “estimands” to become credible.

Download Full-text

A graphical heuristic for reduction and partitioning of large datasets for scalable supervised training

Journal Of Big Data ◽

10.1186/s40537-019-0259-3 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 2

Author(s):

Sumedh Yadav ◽

Mathis Bode

Keyword(s):

Prediction Accuracy ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Large Datasets ◽

Test Results ◽

Accuracy Test ◽

Reasonable Proportion ◽

Speed Up ◽

Run Time ◽

Information Graph

Abstract A scalable graphical method is presented for selecting and partitioning datasets for the training phase of a classification task. For the heuristic, a clustering algorithm is required to get its computation cost in a reasonable proportion to the task itself. This step is succeeded by construction of an information graph of the underlying classification patterns using approximate nearest neighbor methods. The presented method consists of two approaches, one for reducing a given training set, and another for partitioning the selected/reduced set. The heuristic targets large datasets, since the primary goal is a significant reduction in training computation run-time without compromising prediction accuracy. Test results show that both approaches significantly speed-up the training task when compared against that of state-of-the-art shrinking heuristics available in LIBSVM. Furthermore, the approaches closely follow or even outperform in prediction accuracy. A network design is also presented for a partitioning based distributed training formulation. Added speed-up in training run-time is observed when compared to that of serial implementation of the approaches.

Download Full-text

Evaluating Presence/Absence of Target Microbes in Microbiological Tests

Journal of AOAC International ◽

10.1093/jaoac/83.6.1429 ◽

2000 ◽

Vol 83 (6) ◽

pp. 1429-1434

Author(s):

Robert J Blodgett ◽

Anthony D Hitchins

Keyword(s):

Performance Studies ◽

Collaborative Study ◽

False Negative ◽

The Other ◽

Test Results ◽

Logistic Curve ◽

Microbiological Method ◽

Test Portion ◽

Method Performance ◽

Data Set

Abstract A typical qualitative microbiological method performance (collaborative) study gathers a data set of responses about a test for the presence or absence of a target microbe. We developed 2 models that estimate false-positive and false-negative rates. One model assumes a constant probability that the tests will indicate the target microbe is present for any positive concentration in the test portion. The other model assumes that this probability follows a logistic curve. Test results from several method performance studies illustrate these estimates.

Download Full-text

Similar Vertices and Isomorphism Detection for Planar Kinematic Chains Based on Ameliorated Multi-Order Adjacent Vertex Assignment Sequence

Chinese Journal of Mechanical Engineering ◽

10.1186/s10033-020-00521-8 ◽

2021 ◽

Vol 34 (1) ◽

Author(s):

Liang Sun ◽

Zhizheng Ye ◽

Fuwei Lu ◽

Rongjiang Cui ◽

Chuanyu Wu

Keyword(s):

Degree Of Freedom ◽

Adjacent Vertex ◽

Topological Graph ◽

Innovative Design ◽

Kinematic Chains ◽

Effectiveness And Efficiency ◽

Definition Of ◽

Specific Definition

AbstractIsomorphism detection is fundamental to the synthesis and innovative design of kinematic chains (KCs). The detection can be performed accurately by using the similarity of KCs. However, there are very few works on isomorphism detection based on the properties of similar vertices. In this paper, an ameliorated multi-order adjacent vertex assignment sequence (AMAVS) method is proposed to seek out similar vertices and identify the isomorphism of the planar KCs. First, the specific definition of AMAVS is described. Through the calculation of the AMAVS, the adjacent vertex value sequence reflecting the uniqueness of the topology features is established. Based on the value sequence, all possible similar vertices, corresponding relations, and isomorphism discrimination can be realized. By checking the topological graph of KCs with a different number of links, the effectiveness and efficiency of the proposed method are verified. Finally, the method is employed to implement the similar vertices and isomorphism detection of all the 9-link 2-DOF(degree of freedom) planar KCs.

Download Full-text

Research on Spectral Clustering

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.687-691.1350 ◽

2014 ◽

Vol 687-691 ◽

pp. 1350-1353

Author(s):

Li Li Fu ◽

Yong Li Liu ◽

Li Jing Hao

Keyword(s):

Spectral Clustering ◽

Clustering Algorithm ◽

Theoretical Foundation ◽

Clustering Algorithms ◽

Spectral Graph Theory ◽

Graph Partition ◽

Mining Areas ◽

Spectral Graph ◽

Definition Of ◽

Spectral Clustering Algorithm

Spectral clustering algorithm is a kind of clustering algorithm based on spectral graph theory. As spectral clustering has deep theoretical foundation as well as the advantage in dealing with non-convex distribution, it has received much attention in machine learning and data mining areas. The algorithm is easy to implement, and outperforms traditional clustering algorithms such as K-means algorithm. This paper aims to give some intuitions on spectral clustering. We describe different graph partition criteria, the definition of spectral clustering, and clustering steps, etc. Finally, in order to solve the disadvantage of spectral clustering, some improvements are introduced briefly.

Download Full-text

Empirical Evaluation of Genetic Clustering Methods Using Multilocus Genotypes From 20 Chicken Breeds

Genetics ◽

10.1093/genetics/159.2.699 ◽

2001 ◽

Vol 159 (2) ◽

pp. 699-713

Author(s):

Noah A Rosenberg ◽

Terry Burke ◽

Kari Elo ◽

Marcus W Feldman ◽

Paul J Freidlin ◽

...

Keyword(s):

Cluster Analysis ◽

Population Structure ◽

Clustering Algorithm ◽

Empirical Evaluation ◽

Unknown Origin ◽

Clustering Methods ◽

Genetic Cluster ◽

Data Set ◽

Multilocus Genotypes ◽

Chicken Breeds

Abstract We tested the utility of genetic cluster analysis in ascertaining population structure of a large data set for which population structure was previously known. Each of 600 individuals representing 20 distinct chicken breeds was genotyped for 27 microsatellite loci, and individual multilocus genotypes were used to infer genetic clusters. Individuals from each breed were inferred to belong mostly to the same cluster. The clustering success rate, measuring the fraction of individuals that were properly inferred to belong to their correct breeds, was consistently ~98%. When markers of highest expected heterozygosity were used, genotypes that included at least 8–10 highly variable markers from among the 27 markers genotyped also achieved >95% clustering success. When 12–15 highly variable markers and only 15–20 of the 30 individuals per breed were used, clustering success was at least 90%. We suggest that in species for which population structure is of interest, databases of multilocus genotypes at highly variable markers should be compiled. These genotypes could then be used as training samples for genetic cluster analysis and to facilitate assignments of individuals of unknown origin to populations. The clustering algorithm has potential applications in defining the within-species genetic units that are useful in problems of conservation.

Download Full-text

Reporting altered test results in hemolyzed samples: is the cure worse than the disease?

Clinical Chemistry and Laboratory Medicine (CCLM) ◽

10.1515/cclm-2017-0011 ◽

2017 ◽

Vol 55 (8) ◽

pp. 1112-1114 ◽

Cited By ~ 7

Author(s):

Giuseppe Lippi ◽

Gianfranco Cervellin ◽

Mario Plebani

Keyword(s):

Laboratory Data ◽

Turnaround Time ◽

Reliable Data ◽

Test Results ◽

The Past ◽

Laboratory Test Results ◽

Performance Specifications ◽

Life Threatening ◽

The Many ◽

Definition Of

AbstractThe management of laboratory data in unsuitable (hemolyzed) samples remains an almost unresolved dilemma. Whether or not laboratory test results obtained by measuring unsuitable specimens should be made available to the clinicians has been the matter of fierce debates over the past decades. Recently, an intriguing alternative to suppressing test results and recollecting the specimen has been put forward, entailing the definition and implementation of specific algorithms that would finally allow reporting a preanalytically altered laboratory value within a specific comment about its uncertainty of measurement. This approach carries some advantages, namely the timely communication of potentially life-threatening laboratory values, but also some drawbacks. These especially include the challenging definition of validated performance specifications for hemolyzed samples, the need to producing reliable data with the lowest possible uncertainty, the short turnaround time for repeating most laboratory tests, the risk that the comments may be overlooked in short-stay and frequently overcrowded units (e.g. the emergency department), as well as the many clinical advantages of a direct communication with the physician in charge of the patient. Despite the debate remains open, we continue supporting the suggestion that suppressing data in unsuitable (hemolyzed) samples and promptly notifying the clinicians about the need to recollect the samples remains the most (clinically and analytically) safe practice.

Download Full-text