BACKGROUND
Electronic medical records (EMRs) are usually stored in relational databases that require structured query language (SQL) queries to retrieve information of interest. Effectively completing such queries is usually a challenging task for medical experts due to the barriers in expertise. However, existing text-to-SQL generation studies have not been fully embraced in the medical domain.
OBJECTIVE
The objective of this study was to propose a neural generation model, which can jointly consider the characteristics of medical text and the SQL structure, to automatically transform medical texts to SQL queries for EMRs.
METHODS
In contrast to regarding the SQL query as an ordinary word sequence, the syntax tree, introduced as an intermediate representation, is more in line with the tree-structure nature of SQL and also can effectively reduce the search space during generation. We proposed a medical text-to-SQL model (MedTS), which employed a pre-trained BERT as the encoder and leveraged a grammar-based LSTM as the decoder to predict the tree-structured intermediate representation that can be easily transformed to the final SQL query. Experiments are conducted on the MIMICSQL dataset and five competitor methods are compared.
RESULTS
Experimental results demonstrated that MedTS achieved the accuracy of 0.770 and 0.888 on the test set in terms of logic form and execution respectively, which significantly outperformed the existing state-of-the-art methods. Further analyses proved that the performance on each component of the generated SQL was relatively balanced and has substantial improvements.
CONCLUSIONS
The proposed MedTS was effective and robust for improving the performance of medical text-to-SQL generation, indicating strong potentials to be applied in the real medical scenario.