Repository logo
  • English
  • Deutsch
  • Español
  • Français
  • Log In
    New user? Click here to register.Have you forgotten your password?
Repository logo
  • Communities & Collections
  • Research Outputs
  • Fundings & Projects
  • People
  • Statistics
  • English
  • Deutsch
  • Español
  • Français
  • Log In
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. Universidad Católica San Pablo
  3. Publicaciones
  4. Digital surveillance in Latin American diseases outbreaks: information extraction from a novel Spanish corpus
 
  • Details
Options

Digital surveillance in Latin American diseases outbreaks: information extraction from a novel Spanish corpus

Journal
BMC Bioinformatics
ISSN
1471-2105
Date Issued
2022-12-23
Author(s)
Antonella Dellanzo
Viviana Cotik
Daniel Yunior Lozano Barriga
Jonathan Jimmy Mollapaza Apaza
Daniel Palomino
Fernando Schiaffino
Alexander Yanque Aliaga
Ochoa Luna, José Eduardo  
Departamento de Ciencia de la Computación  
DOI
http://dx.doi.org/10.1186/s12859-022-05094-y
Abstract
Background

In order to detect threats to public health and to be well-prepared for endemic and pandemic illness outbreaks, countries usually rely on event-based surveillance (EBS) and indicator-based surveillance systems. Event-based surveillance systems are key components of early warning systems and focus on fast capturing of data to detect threat signals through channels other than traditional surveillance. In this study, we develop Natural Language Processing tools that can be used within EBS systems. In particular, we focus on information extraction techniques that enable digital surveillance to monitor Internet data and social media.
Results

We created an annotated Spanish corpus from ProMED-mail health reports regarding disease outbreaks in Latin America. The corpus has been used to train algorithms for two information extraction tasks: named entity recognition and relation extraction. The algorithms, based on deep learning and rules, have been applied to recognize diseases, hosts, and geographical locations where a disease is occurring, among other entities and relations. In addition, an in-depth analysis of micro-average F1 metrics shows the suitability of our approaches for both tasks.
Conclusions

The annotated corpus and algorithms presented could leverage the development of automated tools for extracting information from news and health reports written in Spanish. Moreover, this framework could be useful within EBS systems to support the early detection of Latin American disease outbreaks.
Project(s)
Análisis de datos masivos en redes sociales para detección de tendencias estratégicas en asuntos relacionados a política y salud  
Subjects

Investigación científ...

Proyectos

Análisis masivo datos...

Sentiment analysis

Social network analys...

Deep learning

Política

Salud

File(s)
Loading...
Thumbnail Image
Name

descarga.png

Size

4.22 KB

Format

PNG

Checksum

(MD5):0fcc63066f6e9e61bd12b2bbdc53b618

Repository logo
Universidad Católica San Pablo
Campus San Lázaro - Quinta Vivanco s/n
Urb. Campina paisajista, Arequipa
+51 54 605630 | +51 54 605600
institucional@ucsp.edu.pe
Mesa de partes
Telefono para comunicarse con las
distintas áreas de la Universidad.
+51 54 605630
Lunes a viernes de 9:00 a 17:00 horas

COPYRIGHT © 2025 Universidad Católica San Pablo