scholarly journals Validation for 2D/3D registration I: A new gold standard data set

2011 ◽  
Vol 38 (3) ◽  
pp. 1481-1490 ◽  
Author(s):  
S. A. Pawiro ◽  
P. Markelj ◽  
F. Pernuš ◽  
C. Gendrin ◽  
M. Figl ◽  
...  
2011 ◽  
Vol 38 (3) ◽  
pp. 1491-1502 ◽  
Author(s):  
Christelle Gendrin ◽  
Primož Markelj ◽  
Supriyanto Ardjo Pawiro ◽  
Jakob Spoerk ◽  
Christoph Bloch ◽  
...  

2021 ◽  
Author(s):  
Qi Jia ◽  
Dezheng Zhang ◽  
Haifeng Xu ◽  
Yonghong Xie

BACKGROUND Traditional Chinese medicine (TCM) clinical records contain the symptoms of patients, diagnoses, and subsequent treatment of doctors. These records are important resources for research and analysis of TCM diagnosis knowledge. However, most of TCM clinical records are unstructured text. Therefore, a method to automatically extract medical entities from TCM clinical records is indispensable. OBJECTIVE Training a medical entity extracting model needs a large number of annotated corpus. The cost of annotated corpus is very high and there is a lack of gold-standard data sets for supervised learning methods. Therefore, we utilized distantly supervised named entity recognition (NER) to respond to the challenge. METHODS We propose a span-level distantly supervised NER approach to extract TCM medical entity. It utilizes the pretrained language model and a simple multilayer neural network as classifier to detect and classify entity. We also designed a negative sampling strategy for the span-level model. The strategy randomly selects negative samples in every epoch and filters the possible false-negative samples periodically. It reduces the bad influence from the false-negative samples. RESULTS We compare our methods with other baseline methods to illustrate the effectiveness of our method on a gold-standard data set. The F1 score of our method is 77.34 and it remarkably outperforms the other baselines. CONCLUSIONS We developed a distantly supervised NER approach to extract medical entity from TCM clinical records. We estimated our approach on a TCM clinical record data set. Our experimental results indicate that the proposed approach achieves a better performance than other baselines.


2020 ◽  
Vol 4 (Supplement_2) ◽  
pp. 1167-1167
Author(s):  
Keisuke Ejima ◽  
Roger Zoh ◽  
Carmen Tekwe ◽  
David Allison ◽  
Andrew Brown

Abstract Objectives A gold standard method to measure energy intake (EI) is doubly labeled water (DLW), but it is expensive and not feasible for large studies. EI from self-report (EISR) is prone to bias, but is still widely used due to convenience; however, estimated associations between EISR and outcomes are biased in many cases. Double sampling with multiple imputation (MI) involves obtaining gold standard (e.g., EIDLW) measurements on a random subsample, and proxy data (e.g., EISR) on the whole sample, and recovering missing gold standard information using MI. However, it is not known what proportion of missingness in EIDLW is acceptable to obtain unbiased estimates of associations between EI and outcomes. Methods We used body weight as an example outcome from the CALERIE Study (N = 218). We performed two regressions on the complete dataset: EIDLW as a predictor and body weight (kg) as an outcome to estimate the ‘true’ coefficient (denoted βDLW), or using EISR as the predictor (βSR). Random subsets of EIDLW were deleted (10% to 90% of full data in 10% increments) to simulate obtaining EIDLW data on only a subset of participants. Regressions were performed using the subset EIDLW data using two different approaches: complete case analysis of only the subset (βDLWsub) and MI informed by EISR on the full data set (βMI). Bias was estimated as the difference between βDLW and βSR, between βDLW and βDLWsub for each EIDLW subset, and between βDLW and βMI for each subset. Resampling was repeated 100 times to assess the uncertainty of the bias. Results Bias of EISR was substantial (∼50%). Bias of βDLWsub was not significantly different from zero for all proportions of missing EIDLW; 95% CIs increased as proportion of missingness increased (as expected). Bias for βMI was not significantly different from zero for missingness of EIDLW up to 80%. βMI was significantly negatively biased toward βSR when the proportion of missingness was 90%. 95%CIs of βMI estimates were narrower than those of βDLWsub for all amounts of missingness. Conclusions Unbiased, more precise estimates of the association between EI and body weight using MI were obtained with missing EIDLW as high as 80%. Obtaining gold standard data collection on subsets may allow for unbiased estimates using self-report data feasible in larger samples. Funding Sources NIH R25HL124208. JSPS KAKENHI 18K18146. Meiji Yasuda Foundation of Health and Welfare 2019.


2020 ◽  
pp. 383-391 ◽  
Author(s):  
Yalun Li ◽  
Yung-Hung Luo ◽  
Jason A. Wampfler ◽  
Samuel M. Rubinstein ◽  
Firat Tiryaki ◽  
...  

PURPOSE Electronic health records (EHRs) are created primarily for nonresearch purposes; thus, the amounts of data are enormous, and the data are crude, heterogeneous, incomplete, and largely unstructured, presenting challenges to effective analyses for timely, reliable results. Particularly, research dealing with clinical notes relevant to patient care and outcome is seldom conducted, due to the complexity of data extraction and accurate annotation in the past. RECIST is a set of widely accepted research criteria to evaluate tumor response in patients undergoing antineoplastic therapy. The aim for this study was to identify textual sources for RECIST information in EHRs and to develop a corpus of pharmacotherapy and response entities for development of natural language processing tools. METHODS We focused on pharmacotherapies and patient responses, using 55,120 medical notes (n = 72 types) in Mayo Clinic’s EHRs from 622 randomly selected patients who signed authorization for research. Using the Multidocument Annotation Environment tool, we applied and evaluated predefined keywords, and time interval and note-type filters for identifying RECIST information and established a gold standard data set for patient outcome research. RESULTS Key words reduced clinical notes to 37,406, and using four note types within 12 months postdiagnosis further reduced the number of notes to 5,005 that were manually annotated, which covered 97.9% of all cases (n = 609 of 622). The resulting data set of 609 cases (n = 503 for training and n = 106 for validation purpose), contains 736 fully annotated, deidentified clinical notes, with pharmacotherapies and four response end points: complete response, partial response, stable disease, and progressive disease. This resource is readily expandable to specific drugs, regimens, and most solid tumors. CONCLUSION We have established a gold standard data set to accommodate development of biomedical informatics tools in accelerating research into antineoplastic therapeutic response.


10.2196/28219 ◽  
2021 ◽  
Vol 9 (6) ◽  
pp. e28219
Author(s):  
Qi Jia ◽  
Dezheng Zhang ◽  
Haifeng Xu ◽  
Yonghong Xie

Background Traditional Chinese medicine (TCM) clinical records contain the symptoms of patients, diagnoses, and subsequent treatment of doctors. These records are important resources for research and analysis of TCM diagnosis knowledge. However, most of TCM clinical records are unstructured text. Therefore, a method to automatically extract medical entities from TCM clinical records is indispensable. Objective Training a medical entity extracting model needs a large number of annotated corpus. The cost of annotated corpus is very high and there is a lack of gold-standard data sets for supervised learning methods. Therefore, we utilized distantly supervised named entity recognition (NER) to respond to the challenge. Methods We propose a span-level distantly supervised NER approach to extract TCM medical entity. It utilizes the pretrained language model and a simple multilayer neural network as classifier to detect and classify entity. We also designed a negative sampling strategy for the span-level model. The strategy randomly selects negative samples in every epoch and filters the possible false-negative samples periodically. It reduces the bad influence from the false-negative samples. Results We compare our methods with other baseline methods to illustrate the effectiveness of our method on a gold-standard data set. The F1 score of our method is 77.34 and it remarkably outperforms the other baselines. Conclusions We developed a distantly supervised NER approach to extract medical entity from TCM clinical records. We estimated our approach on a TCM clinical record data set. Our experimental results indicate that the proposed approach achieves a better performance than other baselines.


Author(s):  
Yanyi Chu ◽  
Xiaoqi Shan ◽  
Dennis R. Salahub ◽  
Yi Xiong ◽  
Dong-Qing Wei

AbstractIdentifying drug-target interactions (DTIs) is an important step for drug discovery and drug repositioning. To reduce heavily experiment cost, booming machine learning has been applied to this field and developed many computational methods, especially binary classification methods. However, there is still much room for improvement in the performance of current methods. Multi-label learning can reduce difficulties faced by binary classification learning with high predictive performance, and has not been explored extensively. The key challenge it faces is the exponential-sized output space, and considering label correlations can help it. Thus, we facilitate the multi-label classification by introducing community detection methods for DTIs prediction, named DTI-MLCD. On the other hand, we updated the gold standard data set proposed in 2008 and still in use today. The proposed DTI-MLCD is performed on the gold standard data set before and after the update, and shows the superiority than other classical machine learning methods and other benchmark proposed methods, which confirms the efficiency of it. The data and code for this study can be found at https://github.com/a96123155/DTI-MLCD.


Algorithms ◽  
2021 ◽  
Vol 14 (2) ◽  
pp. 37
Author(s):  
Shixun Wang ◽  
Qiang Chen

Boosting of the ensemble learning model has made great progress, but most of the methods are Boosting the single mode. For this reason, based on the simple multiclass enhancement framework that uses local similarity as a weak learner, it is extended to multimodal multiclass enhancement Boosting. First, based on the local similarity as a weak learner, the loss function is used to find the basic loss, and the logarithmic data points are binarized. Then, we find the optimal local similarity and find the corresponding loss. Compared with the basic loss, the smaller one is the best so far. Second, the local similarity of the two points is calculated, and then the loss is calculated by the local similarity of the two points. Finally, the text and image are retrieved from each other, and the correct rate of text and image retrieval is obtained, respectively. The experimental results show that the multimodal multi-class enhancement framework with local similarity as the weak learner is evaluated on the standard data set and compared with other most advanced methods, showing the experience proficiency of this method.


2015 ◽  
Vol 15 (1) ◽  
pp. 253-272 ◽  
Author(s):  
M. R. Canagaratna ◽  
J. L. Jimenez ◽  
J. H. Kroll ◽  
Q. Chen ◽  
S. H. Kessler ◽  
...  

Abstract. Elemental compositions of organic aerosol (OA) particles provide useful constraints on OA sources, chemical evolution, and effects. The Aerodyne high-resolution time-of-flight aerosol mass spectrometer (HR-ToF-AMS) is widely used to measure OA elemental composition. This study evaluates AMS measurements of atomic oxygen-to-carbon (O : C), hydrogen-to-carbon (H : C), and organic mass-to-organic carbon (OM : OC) ratios, and of carbon oxidation state (OS C) for a vastly expanded laboratory data set of multifunctional oxidized OA standards. For the expanded standard data set, the method introduced by Aiken et al. (2008), which uses experimentally measured ion intensities at all ions to determine elemental ratios (referred to here as "Aiken-Explicit"), reproduces known O : C and H : C ratio values within 20% (average absolute value of relative errors) and 12%, respectively. The more commonly used method, which uses empirically estimated H2O+ and CO+ ion intensities to avoid gas phase air interferences at these ions (referred to here as "Aiken-Ambient"), reproduces O : C and H : C of multifunctional oxidized species within 28 and 14% of known values. The values from the latter method are systematically biased low, however, with larger biases observed for alcohols and simple diacids. A detailed examination of the H2O+, CO+, and CO2+ fragments in the high-resolution mass spectra of the standard compounds indicates that the Aiken-Ambient method underestimates the CO+ and especially H2O+ produced from many oxidized species. Combined AMS–vacuum ultraviolet (VUV) ionization measurements indicate that these ions are produced by dehydration and decarboxylation on the AMS vaporizer (usually operated at 600 °C). Thermal decomposition is observed to be efficient at vaporizer temperatures down to 200 °C. These results are used together to develop an "Improved-Ambient" elemental analysis method for AMS spectra measured in air. The Improved-Ambient method uses specific ion fragments as markers to correct for molecular functionality-dependent systematic biases and reproduces known O : C (H : C) ratios of individual oxidized standards within 28% (13%) of the known molecular values. The error in Improved-Ambient O : C (H : C) values is smaller for theoretical standard mixtures of the oxidized organic standards, which are more representative of the complex mix of species present in ambient OA. For ambient OA, the Improved-Ambient method produces O : C (H : C) values that are 27% (11%) larger than previously published Aiken-Ambient values; a corresponding increase of 9% is observed for OM : OC values. These results imply that ambient OA has a higher relative oxygen content than previously estimated. The OS C values calculated for ambient OA by the two methods agree well, however (average relative difference of 0.06 OS C units). This indicates that OS C is a more robust metric of oxidation than O : C, likely since OS C is not affected by hydration or dehydration, either in the atmosphere or during analysis.


Sign in / Sign up

Export Citation Format

Share Document