scholarly journals Approximating Expensive Distance Metrics

2021 ◽  
Vol 1 (1) ◽  
pp. 22-27
Author(s):  
Elliott Pryor ◽  
◽  
Nathan Stouffer ◽  

Computing the distance between point a and point b is typically considered to be very easy. However, there are times when computing a distance can take significant computation time; we call these expensive distance metrics. Suppose we have some expensive distance metric and we need to compute the distances between a bunch of points. This paper explores a method that to reduce the number of queries to the distance metric and the effect on clustering. The authors find that total run time can be reduced while only inducing small inaccuracies in clustering output.

2021 ◽  
Vol 1 (1) ◽  
pp. 22-27
Author(s):  
Elliott Pryor ◽  
◽  
Nathan Stouffer ◽  

Computing the distance between point a and point b is typically considered to be very easy. However, there are times when computing a distance can take significant computation time; we call these expensive distance metrics. Suppose we have some expensive distance metric and we need to compute the distances between a bunch of points. This paper explores a method that to reduce the number of queries to the distance metric and the effect on clustering. The authors find that total run time can be reduced while only inducing small inaccuracies in clustering output.


Author(s):  
Jia Xu

In most embedded, real-time applications, processes need to satisfy various important constraints and dependencies, such as release times, offsets, precedence relations, and exclusion relations. Embedded, real-time systems with high assurance requirements often must execute many different types of processes with such constraints and dependencies. Some of the processes may be periodic and some of them may be asynchronous. Some of the processes may have hard deadlines and some of them may have soft deadlines. For some of the processes, especially the hard real-time processes, complete knowledge about their characteristics can and must be acquired before run-time. For other processes, prior knowledge of their worst case computation time and their data requirements may not be available. It is important for many embedded real-time systems to be able to simultaneously satisfy as many important constraints and dependencies as possible for as many different types of processes as possible. In this paper, we discuss what types of important constraints and dependencies can be satisfied among what types of processes. We also present a method which guarantees that, for every process, no matter whether it is periodic or asynchronous, and no matter whether it has a hard deadline or a soft deadline, as long as the characteristics of that process are known before run-time, then that process will be guaranteed to be completed before predetermined time limits, while simultaneously satisfying many important constraints and dependencies with other processes.


2013 ◽  
Vol 25 (06) ◽  
pp. 1350053
Author(s):  
Valiallah Saba ◽  
Saeed Setayeshi

Amongst the motion detection and correction algorithms during the scanning procedures, data-processing methods are the most frequently proposed solution to detect and correct patient motions. There are different distance metrics which have been used to detect the patient motions using information contained in the projections. Unfortunately, the performance of usually used metrics is low in the case of small motions while detecting the motions with magnitude of 1 pixel and smaller are very important in the accuracy of diagnosis. In this work, a new distance metric, normalized prediction of projection data algorithm (NPPDA) is developed based on the linear prediction filter. The performance of the NPPDA is quantitatively evaluated and compared with usual distance metrics by different experimental studies. A high detection rate is achieved by means of the newly developed distance metric, NPPDA.


Author(s):  
Jia Xu

Many embedded systems applications have hard timing requirements where real-time processes with precedence and exclusion relations must be completed before specified deadlines. This requires that the worst-case computation times of the real-time processes be estimated with sufficient precision during system design, which sometimes can be difficult in practice. If the actual computation time of a real-time process during run-time exceeds the estimated worst-case computation time, an overrun will occur, which may cause the real-time process to not only miss its own deadline, but also cause a cascade of other real-time processes to also miss their deadline, possibly resulting in total system failure. However, if the actual computation time of a real-time process during run-time is less than the estimated worst-case computation time, an underrun will occur, which may result in under-utilization of system resources. This paper describes a method for handling underruns and overruns when scheduling a set of real-time processes with precedence and exclusion relations using a pre-run-time schedule. The technique effectively tracks and utilizes unused processor time resources to reduce the chances of missing real-time process deadlines, thereby providing the capability to significantly increase both system utilization and system robustness in the presence of inaccurate estimates of the worst-case computation times of real-time processes.


1997 ◽  
Vol 7 (4) ◽  
pp. 421-440
Author(s):  
GAD AHARONI ◽  
AMNON BARAK ◽  
AMIR RONEN

Execution of functional programs on distributed-memory multiprocessors gives rise to the problem of evaluating expressions that are shared between several Processing Elements (PEs). One of the main difficulties of solving this problem is that, for a given shared expression, it is not known in advance whether realizing the sharing is more cost effective than duplicating its evaluation. Realizing the sharing requires coordination between the sharing PEs to ensure that the shared expression is evaluated only once. This coordination involves relatively high communication costs, and is therefore only worthwhile when the shared expressions require much computation time to evaluate. In contrast, when the shared expression is not computation intensive, it is more cost effective to duplicate the evaluation, and thus avoid the communication overhead costs. This dilemma of deciding whether to duplicate the work or to realize the sharing stems from the unknown computation time that is required to evaluate a shared expression. This computation time is difficult to estimate due to unknown run-time evolution of loops and recursion that may be part of the expression. This paper presents an on-line (run-time) algorithm that decides which of the expressions that are shared between several PEs should be evaluated only once, and which expressions should be evaluated locally by each sharing PE. By applying competitive considerations, the algorithm manages to exploit sharing of computation-intensive expressions, while it duplicates the evaluation of expressions that require little time to compute. The algorithm accomplishes this goal even though it has no a priori knowledge of the amount of computation that is required to evaluate the shared expression. We show that this algorithm is competitive with a hypothetical optimal off-line algorithm, which does have such knowledge, and we prove that the algorithm is deadlock free. Furthermore, this algorithm does not require any programmer intervention, it has low overhead, and it is designed to run on a wide variety of distributed systems.


2018 ◽  
Vol 47 (3) ◽  
pp. 489-507 ◽  
Author(s):  
Alexis Comber ◽  
Khanh Chi ◽  
Man Q Huy ◽  
Quan Nguyen ◽  
Binbin Lu ◽  
...  

This paper explores the impact of different distance metrics on collinearity in local regression models such as geographically weighted regression. Using a case study of house price data collected in Hà Nội, Vietnam, and by fully varying both power and rotation parameters to create different Minkowski distances, the analysis shows that local collinearity can be both negatively and positively affected by distance metric choice. The Minkowski distance that maximised collinearity in a geographically weighted regression was approximate to a Manhattan distance with (power =  0.70) with a rotation of 30°, and that which minimised collinearity was parameterised with power  = 0.05 and a rotation of 70°. The results indicate that distance metric choice can provide a useful extra tuning component to address local collinearity issues in spatially varying coefficient modelling and that understanding the interaction of distance metric and collinearity can provide insight into the nature and structure of the data relationships. The discussion considers first, the exploration and selection of different distance metrics to minimise collinearity as an alternative to localised ridge regression, lasso and elastic net approaches. Second, it discusses the how distance metric choice could extend the methods that additionally optimise local model fit (lasso and elastic net) by selecting a distance metric that further helped minimise local collinearity. Third, it identifies the need to investigate the relationship between kernel bandwidth, distance metrics and collinearity as an area of further work.


2020 ◽  
Author(s):  
Daniel B Hier ◽  
Jonathan Kopel ◽  
Steven U Brint ◽  
Donald C Wunsch II ◽  
Gayla R Olbricht ◽  
...  

Abstract Background: When patient distances are calculated based on phenotype, signs and symptoms are often converted to concepts from an ontological hierarchy. There is controversy as to whether patient distance metrics that consider the semantic similarity between concepts can outperform standard patient distance metrics that are agnostic to concept similarity. The choice of distance metric often dominates the performance of classification or clustering algorithms. Our objective was to determine if semantically augmented distance metrics would outperform standard metrics on machine learning tasks. Methods: We converted the neurological signs and symptoms from 382 published neurology cases into sets of concepts with corresponding machine-readable codes. We calculated inter-patient distances by four different metrics (cosine distance, a semantically augmented cosine distance, Jaccard distance, and a semantically augmented bipartite distance). Semantic augmentation for two of the metrics depended on concept similarities from a hierarchical neuro-ontology. For machine learning algorithms, we used the patient diagnosis as the ground truth label and patient signs and symptoms as the machine learning features . We assessed classification accuracy for four classifiers and cluster quality for two clustering algorithms for each of the distance metrics. Results: Inter-patient distances were smaller when the distance metric was semantically augmented. Classification accuracy and cluster quality were not significantly different by distance metric. Conclusion: Using patient diagnoses as labels and patient signs and symptoms as features, we did not find improved classification accuracy or improved cluster quality with semantically augmented distance metrics. Semantic augmentation reduced inter-patient distances but did not improve machine learning performance.


2020 ◽  
Author(s):  
Daniel B Hier ◽  
Jonathan Kopel ◽  
Steven U Brint ◽  
Donald C Wunsch II ◽  
Gayla R Olbricht ◽  
...  

Abstract Background: Patient distances can be calculated based on signs and symptoms derived from an ontological hierarchy. There is controversy as to whether patient distance metrics that consider the semantic similarity between concepts can outperform standard patient distance metrics that are agnostic to concept similarity. The choice of distance metric can dominate the performance of classification or clustering algorithms. Our objective was to determine if semantically augmented distance metrics would outperform standard metrics on machine learning tasks. Methods: We converted the neurological findings from 382 published neurology cases into sets of concepts with corresponding machine-readable codes. We calculated patient distances by four different metrics (cosine distance, a semantically augmented cosine distance, Jaccard distance, and a semantically augmented bipartite distance). Semantic augmentation for two of the metrics depended on concept similarities from a hierarchical neuro-ontology. For machine learning algorithms, we used the patient diagnosis as the ground truth label and patient findings as machine learning features. We assessed classification accuracy for four classifiers and cluster quality for two clustering algorithms for each of the distance metrics.Results: Inter-patient distances were smaller when the distance metric was semantically augmented. Classification accuracy and cluster quality were not significantly different by distance metric.Conclusion: Although semantic augmentation reduced inter-patient distances, we did not find improved classification accuracy or improved cluster quality with semantically augmented patient distance metrics.


2020 ◽  
Author(s):  
Daniel B Hier ◽  
Jonathan Kopel ◽  
Steven U Brint ◽  
Donald C Wunsch II ◽  
Gayla R Olbricht ◽  
...  

Abstract Background: Patient distances can be calculated based on signs and symptoms derived from an ontological hierarchy. There is controversy as to whether patient distance metrics that consider the semantic similarity between concepts can outperform standard patient distance metrics that are agnostic to concept similarity. The choice of distance metric can dominate the performance of classification or clustering algorithms. Our objective was to determine if semantically augmented distance metrics would outperform standard metrics on machine learning tasks.Methods: We converted the neurological findings from 382 published neurology cases into sets of concepts with corresponding machine-readable codes. We calculated patient distances by four different metrics (cosine distance, a semantically augmented cosine distance, Jaccard distance, and a semantically augmented bipartite distance). Semantic augmentation for two of the metrics depended on concept similarities from a hierarchical neuro-ontology. For machine learning algorithms, we used the patient diagnosis as the ground truth label and patient findings as machine learning features . We assessed classification accuracy for four classifiers and cluster quality for two clustering algorithms for each of the distance metrics.Results: Inter-patient distances were smaller when the distance metric was semantically augmented. Classification accuracy and cluster quality were not significantly different by distance metric.Conclusion: Although semantic augmentation reduced inter-patient distances, we did not find improved classification accuracy or improved cluster quality with semantically augmented patient distance metrics when applied to a dataset of neurology patients. Further work is needed to assess the utility of semantically augmented patient distances.


Author(s):  
Jia Xu

Utilizing non-zero offsets when scheduling real-time periodic processes significantly increases the chances of satisfying all the timing constraints in a real-time system. In this paper, a method that enables the utilization of non-zero offsets in the pre-run-time scheduling of asynchronous and periodic processes with release times, deadlines, precedence and exclusion relations on either a uniprocessor or on a multiprocessor in real-time embedded systems is presented. This paper also identifies for the first time, the set of general conditions that a periodic process newpi with release time rnewpi, computation time cnewpi, deadline dnewpi, period prdnewpi, permitted range of offset onewpi, must satisfy, in order to satisfy the timing constraints of any given asynchronous process ai with computation time cai, deadline dai, minimum time between two consecutive requests minai, and earliest time that asynchronous process ai can make a request for execution lai. A method based on these general conditions for converting asynchronous processes with earliest request times into new periodic processes with offset constraints is also introduced.


Sign in / Sign up

Export Citation Format

Share Document