Semantic Representations for Interactive Text Mining

  • Milios, Evangelos (PI)

Proyecto: Proyecto de Investigación

Detalles del proyecto

Description

Key limitations of today's knowledge workers, whose job involves handling or using information, include (a) the amount of text they have to read and digest, and (b) the amount of time they spend searching for, gathering and organizing information in text form. Examples of text-intensive tasks on specialized corpora include: literature search on a given topic for compilation of a systematic review; high-recall retrieval of patents, court decisions or incident reports in customer service or online communities; search and browsing of electronic medical records or health-related listserver content for tacit knowledge embedded in free text; and annotation of papers with research topics. Examples of informal text such as social media include rumour detection and propagation, dynamic topic detection and tracking, and analysis of interviews in sociology research. Core research problems underlying these use cases include:

(1) Semantic retrieval of documents, addressing vocabulary mismatch across related documents;

(2) The exploitation of semi-structured knowledge bases, such as Wikipedia, as well as weakly organized domain-specific corpora;

(3) Handling the dynamic nature of the text data, including concept drift, and flexibly handling shorter or longer time frames;

(4) The need for the human-in-the-loop text mining, to guide the algorithms towards producing relevant results for the individual user. This requires interactive visualizations and algorithms open to user interaction.

Semantic relatedness methods have been proposed based on word and document embeddings derived from unsupervised training of various deep network architectures on tasks such as word or sentence prediction in large text corpora. Such embeddings have demonstrated advances to the state of the art on a number of supervised downstream natural language processing tasks. However, a gap exists between semantic text representations based on embeddings, which are dense numeric vectors, and human intuition, whose elicitation requires interactive visual interfaces to involve a non-technical user effectively. The proposed research will aim to fill this gap by focusing on explainable, as opposed to black box, machine learning algorithms and representations. Taking this one step further, we will build on interactivity to achieve explainability, allowing the human to efficiently steer the machine learning towards meaningful results.

Overall, we will aim for the next-generation visual text analytics systems that build on the capabilities of modern word, term and document embeddings based on deep networks to capture semantics better than the bag-of-words representations, without losing the intuitive nature of word- and term-based visualizations. The proposed research will be a contribution to the emerging research area of explainable deep networks, specialized to interactive machine learning for supporting knowledge workers.

EstadoActivo
Fecha de inicio/Fecha fin1/1/20 → …

Financiación

  • Natural Sciences and Engineering Research Council of Canada: US$ 26.377,00

ASJC Scopus Subject Areas

  • Artificial Intelligence
  • Information Systems
  • Information Systems and Management
  • Management Information Systems