String-based and Unification-based Methodology for Text mining and Processing

  • Keselj, Vlado (PI)

Project: Research project

Project Details

Description

The economic and social impact of information technology in the current time is probably most direct and significant in the area of the so-called social media and Internet in a wider sense. The ratio of people contributing accessible content to the Web has gone from a minor fraction to a large portion, likely a majority. It has been established that this flood of "big data" on one side can increase our knowledge and efficiency, leading to a greater social benefit, but on another side we are frequently lost and drowning in irrelevant information while missing relevant information. The proliferation of mobile technology, which is available anywhere but also limited in screen size, user input rate and processing power, has only put more demands on more precise information retrieval, querying, and channelling relevant information.Here, we propose to develop three different core natural language processing methodologies that will make a strong contribution to solving this information management problem. Beside the theoretical results, we develop several tools that actually implement designed solutions, and we also apply these methodologies and tools to specific application areas.Our approach can be divided into three levels based on the level of language processing: (1) Common N-Gram analysis (CNG), (2) Regular Expression based and finite state processing (RegEx), and (3) unification-based processing and matching of typed feature structures (Unif), with a goal of harmonizing these techniques. The N-gram model, regular expressions, and unification-based grammars are well-understood in NLP. The novelty of our approach is in more specific methodologies developed on top of these models: new n-gram profiling and distance functions, and visualisation; iterative regular expression substitutions, and stochastic unification-based matching and subgraph isomorphism methodology.

StatusActive
Effective start/end date1/1/14 → …

Funding

  • Natural Sciences and Engineering Research Council of Canada: US$18,111.00

ASJC Scopus Subject Areas

  • Artificial Intelligence
  • Development