Development of a Medical-text Parsing Algorithm Based on Character Adjacent Probability Distribution for Japanese Radiology Reports

2008 ◽  
Vol 47 (06) ◽  
pp. 513-521 ◽  
Author(s):  
S. Terae ◽  
M. Uesugi ◽  
K. Ogasawara ◽  
T. Sakurai ◽  
N. Nishimoto

Summary Objectives: The objectives of this study were to investigate the transitional probability distribution of medical term boundaries between characters and to develop a parsing algorithm specifically for medical texts. Methods: Medical terms in Japanese computed tomography (CT) reports were identified using the ChaSen morphological analysis system. MeSH-based medical terms (51,385 entries), obtained from the metathesaurus in the Unified Medical Language System (UMLS, 2005AA), were added as a medical dictionary for ChaSen. A radiographer corrected the set of results containing 300 parsed CT reports. In addition, two radiologists checked the medical term parsing of 200 CT sentences. Results: We obtained modified inter-annotator agreement scores for the text corrected by the radiologists. We retrieved the transitional probability as the conditional probability of a uni-gram, bi-gram, and tri-gram. The highest transitional probability P(Ci | Ci - 2*Ci - 1) was 1.00. For an example of anatomical location, the term “pulmonary hilum” was parsed as a tri-gram. Conclusions: Retrieval of transitional probability will improve the accuracy of parsing compound medical terms.

2019 ◽  
Vol 44 (1) ◽  
Author(s):  
Julian Varghese ◽  
Michael Fujarski ◽  
Martin Dugas

AbstractStudyPortal was implemented as the first multilingual search platform for geographic visualization of clinical trials and scientific articles. The platform queries information from ClinicalTrials.gov, PubMed, a geodatabase and geographic maps to enable geospatial study search and real-time rendering of study locations or research networks on a map. Thus, disease-specific clinical studies or whole research networks can be shown in a geographic proximity. Moreover, a semantic layer enables multilingual disease input and autosuggestion of medical terms based on the Unified Medical Language System. The portal is accessible on https://studyportal.uni-muenster.de. This paper presents details on implementation of the novel search platform, its search evaluation and future work.


Author(s):  
Hannes Seuss ◽  
Peter Dankerl ◽  
Matthias Ihle ◽  
Andrea Grandjean ◽  
Rebecca Hammon ◽  
...  

Purpose Projects involving collaborations between different institutions require data security via selective de-identification of words or phrases. A semi-automated de-identification tool was developed and evaluated on different types of medical reports natively and after adapting the algorithm to the text structure. Materials and Methods A semi-automated de-identification tool was developed and evaluated for its sensitivity and specificity in detecting sensitive content in written reports. Data from 4671 pathology reports (4105 + 566 in two different formats), 2804 medical reports, 1008 operation reports, and 6223 radiology reports of 1167 patients suffering from breast cancer were de-identified. The content was itemized into four categories: direct identifiers (name, address), indirect identifiers (date of birth/operation, medical ID, etc.), medical terms, and filler words. The software was tested natively (without training) in order to establish a baseline. The reports were manually edited and the model re-trained for the next test set. After manually editing 25, 50, 100, 250, 500 and if applicable 1000 reports of each type re-training was applied. Results In the native test, 61.3 % of direct and 80.8 % of the indirect identifiers were detected. The performance (P) increased to 91.4 % (P25), 96.7 % (P50), 99.5 % (P100), 99.6 % (P250), 99.7 % (P500) and 100 % (P1000) for direct identifiers and to 93.2 % (P25), 97.9 % (P50), 97.2 % (P100), 98.9 % (P250), 99.0 % (P500) and 99.3 % (P1000) for indirect identifiers. Without training, 5.3 % of medical terms were falsely flagged as critical data. The performance increased, after training, to 4.0 % (P25), 3.6 % (P50), 4.0 % (P100), 3.7 % (P250), 4.3 % (P500), and 3.1 % (P1000). Roughly 0.1 % of filler words were falsely flagged. Conclusion Training of the developed de-identification tool continuously improved its performance. Training with roughly 100 edited reports enables reliable detection and labeling of sensitive data in different types of medical reports. Key Points:  Citation Format


2021 ◽  
pp. medethics-2020-107192
Author(s):  
David Shaw ◽  
Alex Manara ◽  
Anne Laure Dalle Ave

In this paper, we discuss the largely neglected topic of semantics in medicine and the associated ethical issues. We analyse several key medical terms from the informed perspective of the healthcare professional, the lay perspective of the patient and the patient’s family, and the descriptive perspective of what the term actually signifies objectively. The choice of a particular medical term may deliver different meanings when viewed from these differing perspectives. Consequently, several ethical issues may arise. Technical terms that are not commonly understood by lay people may be used by physicians, consciously or not, and may obscure the understanding of the situation by lay people. The choice of particular medical terms may be accidental use of jargon, an attempt to ease the communication of psychologically difficult information, or an attempt to justify a preferred course of action and/or to manipulate the decision-making process.


2021 ◽  
pp. e2021092
Author(s):  
Gulsen Akoglu ◽  
Pelin Esme ◽  
Irem Yildiz

Background: The use of medical terms and folk names (euphemisms) affect a patient’s understanding of diseases and perceptions of severity. Objectives: We determine the psychological effects on patients with hidradenitis suppurativa of medical and folk names of their disease. Methods: This was a cross-sectional and exploratory study conducted at a tertiary referral university hospital in Turkey. A questionnaire on the medical and folk names of hidradenitis suppurativa was administered to 31 males and 25 females. Results: The patients expressed that they found the medical term hidradenitis suppurativa to be incomprehensible because it is a foreign term. When hearing it for the first time, it evoked negative responses such as confusion and worry about their health. Half of the patients preferred their doctors to use a more understandable and pronounceable name. More than 80% of patients expressed feeling depressed and stigmatized by the folk name of their disease. They preferred the terms boils, abscesses, or hidradenitis when referring to their disease. Conclusion: Both medical and folk names for hidradenitis suppurativa have negative effects on patients, and most patients feel stigmatized by either term.


2004 ◽  
Vol 10 (3) ◽  
pp. 295
Author(s):  
Jung Ae Lee ◽  
Hwa Jeong Seo ◽  
Kee Won Kim ◽  
Mingoo Kim ◽  
Seung Kwon Hong ◽  
...  

1999 ◽  
Vol 38 (04/05) ◽  
pp. 303-307 ◽  
Author(s):  
I. Antipov ◽  
W. Hersh ◽  
C. A. Smith ◽  
M. Mailhot ◽  
H. J. Lowe

AbstractThis paper describes preliminary work evaluating automated semantic indexing of radiology imaging reports to represent images stored in the Image Engine multimedia medical record system at the University of Pittsburgh Medical Center. The authors used the SAPHIRE indexing system to automatically identify important biomedical concepts within radiology reports and represent these concepts with terms from the 1998 edition of the U.S. National Library of Medicine’s Unified Medical Language System (UMLS) Metathesaurus. This automated UMLS indexing was then compared with manual UMLS indexing of the same reports. Human indexing identified appropriate UMLS Metathesaurus descriptors for 81% of the important biomedical concepts contained in the report set. SAPHIRE automatically identified UMLS Metathesaurus descriptors for 64% of the important biomedical concepts contained in the report set. The overall conclusions of this pilot study were that the UMLS metathesaurus provided adequate coverage of the majority of the important concepts contained within the radiology report test set and that SAPHIRE could automatically identify and translate almost two thirds of these concepts into appropriate UMLS descriptors. Further work is required to improve both the recall and precision of this automated concept extraction process.


Author(s):  
Наталия Октябревна Золотова ◽  
Людмила Константиновна Гордеева

В статье обсуждаются верифицированные на основе метода субъективного шкалирования результаты свободного ассоциативного эксперимента, проведенного с участием студентов медицинского вуза, которым в качестве стимулов были предложены медицинские термины, называющие болезни. Особое внимание уделяется динамике эмоционально-оценочной составляющей значения медицинского термина, которая обусловлена разным уровнем профессиональных компетенций воспринимающих термин испытуемых. The article discusses the results of two psycholinguistic experiments: free associative experiment and subjective scaling with the participation of medical university students who worked with medical terms naming diseases as stimuli. Special attention is paid to the dynamics of the emotional-evaluative component, represented in the psychological structure of the meaning of the medical term, associated with different levels of professional competencies of the subjects perceiving the term.


2013 ◽  
Vol 14 (4) ◽  
pp. 224
Author(s):  
Jesús PEINADO RODRIGUEZ

The automated retrieval of files depends critically on the ability to generate precise signs concepts. Stemming is a technique very useful but medical terms are very complex terms which need special attention. Our main was to develop a modular algorithm for complex medical terms in order to follow a new space of research on Information retrieval. The algorithm was developed using LISP, a programming language, and exhaustive controlled list of rules. As a result, we found good precision with less recall when we asked for concepts saving signs concepts for each medical term. (Rev Hed Hered 2003; 14:224-229).


2021 ◽  
Author(s):  
Ruoxue Wu ◽  
Mu Qiao ◽  
Jeffrey Zheng

Abstract COVID-19 is outbreaking in worldwide. It caused millions of infections, killing hundreds of thousands of people and making all countries loss immeasurable trade. For finding the secret of SARS-CoV-2, researchers need to analyze various variation information such as multiple coronaviruses in different times over distinct countries. In this paper, the metagenetic analysis system MAS is used to analyze SARS-CoV-2 genomes collected from different countries as input datasets, and special genomic indices are provided to be a global characteristic quantity based on the A1 and C1 modules of the MAS for visualizations. In this method, one RNA sequence is split into M segments and counting the number of genetic probability measures for 16 combinations of four genomic symbols. After statistical probability processes, each probability distribution can be transferred into an entropy quantity on both 2D and 1D histograms to show these results for all collected genomes. Under this approach, a pair of combinatorial entropies determine a 2D genomic index map to generate a heatmap for more massive clusters of genomes with similarity contents to provide basic quantitative in variants to organize further collected genomes as a construction of a phylogenetic tree. Further explorations are required.


Sign in / Sign up

Export Citation Format

Share Document