The IZA Evaluation Dataset Survey: A Scientific Use File

The IZA evaluation dataset survey: a scientific use file

IZA Journal of European Labor Studies ◽

10.1186/2193-9012-3-6 ◽

2014 ◽

Vol 3 (1) ◽

pp. 6 ◽

Cited By ~ 6

Author(s):

Patrick Arni ◽

Marco Caliendo ◽

Steffen Künn ◽

Klaus F Zimmermann

Keyword(s):

Evaluation Dataset ◽

Scientific Use File

Download Full-text

Evaluation Dataset and System for Japanese Lexical Simplification

10.3115/v1/p15-3006 ◽

2015 ◽

Cited By ~ 7

Author(s):

Tomoyuki Kajiwara ◽

Kazuhide Yamamoto

Keyword(s):

Evaluation Dataset

Download Full-text

A DICOM dataset for evaluation of medical image de-identification

Scientific Data ◽

10.1038/s41597-021-00967-y ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Michael Rutherford ◽

Seong K. Mun ◽

Betty Levine ◽

William Bennett ◽

Kirk Smith ◽

...

Keyword(s):

Health Information ◽

National Cancer Institute ◽

Medical Image ◽

Cancer Imaging ◽

Protected Health Information ◽

Dicom Standard ◽

Clinical Imaging ◽

X Ray ◽

Evaluation Dataset ◽

Data Elements

AbstractWe developed a DICOM dataset that can be used to evaluate the performance of de-identification algorithms. DICOM objects (a total of 1,693 CT, MRI, PET, and digital X-ray images) were selected from datasets published in the Cancer Imaging Archive (TCIA). Synthetic Protected Health Information (PHI) was generated and inserted into selected DICOM Attributes to mimic typical clinical imaging exams. The DICOM Standard and TCIA curation audit logs guided the insertion of synthetic PHI into standard and non-standard DICOM data elements. A TCIA curation team tested the utility of the evaluation dataset. With this publication, the evaluation dataset (containing synthetic PHI) and de-identified evaluation dataset (the result of TCIA curation) are released on TCIA in advance of a competition, sponsored by the National Cancer Institute (NCI), for algorithmic de-identification of medical image datasets. The competition will use a much larger evaluation dataset constructed in the same manner. This paper describes the creation of the evaluation datasets and guidelines for their use.

Download Full-text

HOSTED—England’s Household Transmission Evaluation Dataset: preliminary findings from a novel passive surveillance system of COVID-19

International Journal of Epidemiology ◽

10.1093/ije/dyab057 ◽

2021 ◽

Author(s):

J A Hall ◽

R J Harris ◽

A Zaidi ◽

S C Woodhall ◽

G Dabrera ◽

...

Keyword(s):

Surveillance System ◽

Household Composition ◽

Passive Surveillance ◽

Household Contacts ◽

Household Transmission ◽

The North ◽

Passive Surveillance System ◽

Evaluation Dataset ◽

Households With Children ◽

Lower Transmission

Abstract Background Household transmission of SARS-CoV-2 is an important component of the community spread of the pandemic. Little is known about the factors associated with household transmission, at the level of the case, contact or household, or how these have varied over the course of the pandemic. Methods The Household Transmission Evaluation Dataset (HOSTED) is a passive surveillance system linking laboratory-confirmed COVID-19 cases to individuals living in the same household in England. We explored the risk of household transmission according to: age of case and contact, sex, region, deprivation, month and household composition between April and September 2020, building a multivariate model. Results In the period studied, on average, 5.5% of household contacts in England were diagnosed as cases. Household transmission was most common between adult cases and contacts of a similar age. There was some evidence of lower transmission rates to under-16s [adjusted odds ratios (aOR) 0.70, 95% confidence interval (CI) 0.66–0.74). There were clear regional differences, with higher rates of household transmission in the north of England and the Midlands. Less deprived areas had a lower risk of household transmission. After controlling for region, there was no effect of deprivation, but houses of multiple occupancy had lower rates of household transmission [aOR 0.74 (0.66–0.83)]. Conclusions Children are less likely to acquire SARS-CoV-2 via household transmission, and consequently there was no difference in the risk of transmission in households with children. Households in which cases could isolate effectively, such as houses of multiple occupancy, had lower rates of household transmission. Policies to support the effective isolation of cases from their household contacts could lower the level of household transmission.

Download Full-text

WeDGeM: A Domain-Specific Evaluation Dataset Generator for Multilingual Entity Linking Systems

Lecture Notes in Computer Science - Web Information Systems Engineering – WISE 2017 ◽

10.1007/978-3-319-68786-5_18 ◽

2017 ◽

pp. 221-228

Author(s):

Emrah Inan ◽

Oguz Dikenelli

Keyword(s):

Entity Linking ◽

Domain Specific ◽

Evaluation Dataset ◽

Specific Evaluation

Download Full-text

Associating Natural Language Comment and Source Code Entities

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6382 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8592-8599

Author(s):

Sheena Panthaplackel ◽

Milos Gligoric ◽

Raymond J. Mooney ◽

Junyi Jessy Li

Keyword(s):

Software Development ◽

Natural Language ◽

Open Source ◽

Source Code ◽

Initial Step ◽

Binary Classifier ◽

Sequence Labeling ◽

Evaluation Dataset ◽

Revision Histories

Comments are an integral part of software development; they are natural language descriptions associated with source code elements. Understanding explicit associations can be useful in improving code comprehensibility and maintaining the consistency between code and comments. As an initial step towards this larger goal, we address the task of associating entities in Javadoc comments with elements in Java source code. We propose an approach for automatically extracting supervised data using revision histories of open source projects and present a manually annotated evaluation dataset for this task. We develop a binary classifier and a sequence labeling model by crafting a rich feature set which encompasses various aspects of code, comments, and the relationships between them. Experiments show that our systems outperform several baselines learning from the proposed supervision.

Download Full-text

Forschungsdaten der Rentenversicherung zur Rehabilitation. Ein Scientific Use File im Längsschnittformat des Forschungsdatenzentrums der Ren- tenversicherung (FDZ-RV

Soziale Welt ◽

10.5771/0038-6073-2021-2-237 ◽

2021 ◽

Vol 72 (2) ◽

pp. 237-251

Author(s):

Anja Bestmann ◽

Renate Grell ◽

Ute Kirst-Budžak

Keyword(s):

Scientific Use File

Das Forschungsdatenzentrum der Rentenversicherung gibt seit 2012 Scientific Use Files (SUF) zu abgeschlossenen Rehabilitationen und bewilligten Rentenleistungen im Längsschnittdatenformat an wissenschaftliche Einrichtungen heraus. Der Reha-Längsschnitt-Scientific Use File umfasst eine Stichprobe von über drei Millionen Personen, die auf einer Vollerhebung beruht. Thematisch deckt der Datenkörper neben soziodemographischen Angaben und Informationen zum beruflichen Hintergrund der Versicherten auch Merkmale zum Rentenversicherungsverhältnis ab. Das Hauptaugenmerk liegt aber auf der medizinischen Rehabilitation, den Leistungen zur Teilhabe am Arbeitsleben und den bewilligten Rentenanträgen. Ein fixes 11-Jahresbeobachtungsfenster des Beitragsverlaufes mit Angaben zum Erwerbsstatus und monatsgenauem Beitragsverlauf ermöglicht detaillierte Längsschnittanalysen. Der Reha-Längsschnitt-Scientific Use File verfügt über ein breitgefächertes Analysepotenzial: Er bietet beispielsweise die Möglichkeit, Reha- und Erwerbsverläufe im Zeitverlauf nachzuzeichnen. Typologisierende Fragestellungen mit krankheits- oder Reha-spezifischem Fokus profitieren von der genauen Abbildung der Rehabilitationsart und den ICD-kodierten Diagnosen. Die Stichprobe des Scientific Use File ist für die aktiv Versicherten der gesetzlichen Rentenversicherung repräsentativ. Rückschlüsse auf die soziodemographische und beitragsbezogene Zusammensetzung der Grundgesamtheit können überdies mithilfe der im Datensatz befindlichen Demographiekohorten gewonnen werden.

Download Full-text

Requirements for Training and Evaluation Dataset of Network and Host Intrusion Detection System

Advances in Intelligent Systems and Computing - New Knowledge in Information Systems and Technologies ◽

10.1007/978-3-030-16184-2_51 ◽

2019 ◽

pp. 534-546 ◽

Cited By ~ 4

Author(s):

Petteri Nevavuori ◽

Tero Kokkonen

Keyword(s):

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

Evaluation Dataset

Download Full-text

Return to Work aus einer zeitlich befristeten Erwerbsminderungsrente

Das Gesundheitswesen ◽

10.1055/a-0883-5276 ◽

2019 ◽

Vol 82 (11) ◽

pp. 894-900 ◽

Cited By ~ 3

Author(s):

Elena Köckerling ◽

Odile Sauzet ◽

Bettina Hesse ◽

Michael Körner ◽

Oliver Razum

Keyword(s):

Return To Work ◽

Scientific Use File

Zusammenfassung Ziel der Studie Zurzeit wird in Deutschland keine Statistik darüber geführt, wie viele Personen mit einer zeitlich befristeten Erwerbsminderungsrente (EM-Rente) ein Return to Work (RTW) realisieren. Ziel dieser Studie ist es zu prüfen, wie viele Personen, die 2006 zum ersten Mal eine zeitlich befristete EM-Rente erhalten haben, ein RTW erreichten und welche soziodemografischen, gesundheitlichen und beruflichen Merkmale diese Personen aufweisen. Methodik Der Scientific Use File „Abgeschlossene Rehabilitation 2006–2013 im Versicherungsverlauf“ des Forschungsdatenzentrums der Deutschen Rentenversicherung wurde ausgewertet. Als RTW wurde gewertet, wenn eine Person nach Berentung in einem der 7 Folgejahre an 183–365 Tagen mindestens einer Halbtagsbeschäftigung nachging und dafür wenigstens 8,50 Euro/Stunde erhielt. Die Entwicklungen der Kohorte wurden deskriptiv ausgewertet. Der Zusammenhang von soziodemografischen, gesundheitlichen und beruflichen Merkmalen der Personen und dem RTW wurde durch Cox-Regressionen ermittelt. Ergebnisse Von der Ausgangskohorte (N=9.789) erreichten in dem Beobachtungszeitraum 5,9% ein RTW. Von diesen erreichten ca. 25% ein RTW in jedem Folgejahr. Im Beobachtungszeitraum verstarben 10,6%, 9,1% gingen in Altersrente und 1,4% erhielten eine unbefristete EM-Rente. Die Regressionsanalyse zeigt, dass soziodemografische, gesundheitliche und berufliche Merkmale der EM-Rentner/innen in signifikantem Zusammenhang mit einem RTW stehen: EM-Rentner/innen im Alter von 18–39 Jahren, mit einer somatischen Rentenbewilligungsdiagnose, einer medizinischen Rehabilitation oder einer Erwerbstätigkeit vor der EM-Rente haben die höchste Wahrscheinlichkeit auf ein RTW. Schlussfolgerung Die Ergebnisse zeigen, dass nur wenige Personen ein RTW aus der EM-Rente erreichen. Es könnte einerseits geschlussfolgert werden, dass die Kriterien für eine zeitliche Befristung von EM-Renten geschärft werden sollten, um dadurch z. B. den Arbeitsaufwand für Wiederbegutachtungen zu verringern. Andererseits könnte auch ein deutlicher Unterstützungsbedarf der EM-Rentner/innen bei der Realisierung eines RTW und ein dafür notwendiger Forschungsbedarf postuliert werden.

Download Full-text

Shapelet Discovery by Lazy Time Series Classification

Computational Intelligence and Neuroscience ◽

10.1155/2020/1978310 ◽

2020 ◽

Vol 2020 ◽

pp. 1-19

Author(s):

Wei Zhang ◽

Zhihai Wang ◽

Jidong Yuan ◽

Shilei Hao

Keyword(s):

Time Series ◽

Training Dataset ◽

Class Label ◽

Considerable Research ◽

Evaluation Dataset ◽

Differential Ability ◽

Global And Local ◽

Insight Into ◽

Specific Evaluation ◽

Feature Frequency

As a representation of discriminative features, the time series shapelet has recently received considerable research interest. However, most shapelet-based classification models evaluate the differential ability of the shapelet on the whole training dataset, neglecting characteristic information contained in each instance to be classified and the classwise feature frequency information. Hence, the computational complexity of feature extraction is high, and the interpretability is inadequate. To this end, the efficiency of shapelet discovery is improved through a lazy strategy fusing global and local similarities. In the prediction process, the strategy learns a specific evaluation dataset for each instance, and then the captured characteristics are directly used to progressively reduce the uncertainty of the predicted class label. Moreover, a shapelet coverage score is defined to calculate the discriminability of each time stamp for different classes. The experimental results show that the proposed method is competitive with the benchmark methods and provides insight into the discriminative features of each time series and each type in the data.

Download Full-text