Graph-based Topic Extraction Using Centroid Distance of Phrase Embeddings on Healthy Aging Open-ended Survey Questions

Dijana Kosmajac, Kirstie Smith, Vlado Keselj, Susan Kirkland

Producción científica: Capítulo en Libro/Reporte/Acta de conferenciaContribución a la conferencia

Resumen

Open-ended questions are a very important part of research surveys. However, they can pose a challenge when it comes to processing since manual processing requires a labour-intensive human effort. Automation of the task requires application of NLP methods since free text does not ensure standardized structure. To tackle this problem, we present a solution for topic discovery and analysis of open-ended survey items. We use graph-based representation of the text that adds structure and enables easier manipulation and keyphrase retrieval. Additionally, we use pre-trained fastText aligned word vectors to cluster similar phrases even if they are written in different languages. The goal is to produce topic word and phrase representatives that are easy to interpret by a domain expert. We compare the method with traditional LDA and two state-of-the-art algorithms: BTM and WNTM. The resulting keyphrases representing topics are more intuitive to the domain experts than the ones obtained by reference topic models in similar experimental settings.

Idioma originalEnglish
Título de la publicación alojadaProceedings - 20th IEEE International Conference on Data Mining Workshops, ICDMW 2020
EditoresGiuseppe Di Fatta, Victor Sheng, Alfredo Cuzzocrea, Carlo Zaniolo, Xindong Wu
EditorialIEEE Computer Society
Páginas621-628
Número de páginas8
ISBN (versión digital)9781728190129
DOI
EstadoPublished - nov. 2020
Evento20th IEEE International Conference on Data Mining Workshops, ICDMW 2020 - Virtual, Sorrento, Italy
Duración: nov. 17 2020nov. 20 2020

Serie de la publicación

NombreIEEE International Conference on Data Mining Workshops, ICDMW
Volumen2020-November
ISSN (versión impresa)2375-9232
ISSN (versión digital)2375-9259

Conference

Conference20th IEEE International Conference on Data Mining Workshops, ICDMW 2020
País/TerritorioItaly
CiudadVirtual, Sorrento
Período11/17/2011/20/20

Nota bibliográfica

Publisher Copyright:
© 2020 IEEE.

ASJC Scopus Subject Areas

  • Computer Science Applications
  • Software

Huella

Profundice en los temas de investigación de 'Graph-based Topic Extraction Using Centroid Distance of Phrase Embeddings on Healthy Aging Open-ended Survey Questions'. En conjunto forman una huella única.

Citar esto

Kosmajac, D., Smith, K., Keselj, V., & Kirkland, S. (2020). Graph-based Topic Extraction Using Centroid Distance of Phrase Embeddings on Healthy Aging Open-ended Survey Questions. En G. Di Fatta, V. Sheng, A. Cuzzocrea, C. Zaniolo, & X. Wu (Eds.), Proceedings - 20th IEEE International Conference on Data Mining Workshops, ICDMW 2020 (pp. 621-628). Artículo 9346328 (IEEE International Conference on Data Mining Workshops, ICDMW; Vol. 2020-November). IEEE Computer Society. https://doi.org/10.1109/ICDMW51313.2020.00088