Succinct Data Structures with Applications to Large Data Sets

  • He, Meng (PI)

Proyecto: Proyecto de Investigación

Detalles del proyecto

Description

The problem of efficiently storing and retrieving information is an essential topic in computer science. During the past decades, various techniques have been developed to index data so that the useful information can be retrieved almost instantaneously by performing queries for keywords or phrases. In recent years, as the size of the data has grown rapidly, many techniques that were useful for small, older systems have become infeasible for large, modern applications because they occupy too much storage. Most of this space is not the raw data, but structural information added to improve search efficiency. Succinct data structures were proposed to address this problem, so that the information in large systems can be retrieved quickly, but the space requirement is little more than that of the raw data. In order to provide theoretical and practical solutions to modern systems that process large data sets such as web search engines, geographic information systems and bioinformatics applications, this program will extend the research on succinct data structures, and start new research directions on this subject. The proposed research will use succinct data structures to develop new solutions to fundamental problems in algorithms and computational geometry, such as text search and range search. It will start a new research direction that uses cache-oblivious model to improve the I/O efficiency of succinct data structures for applications that deal with large data sets stored in external memory. It will also start a new line of research by designing succinct data structures for bioinformatics applications and text databases by addressing useful types of searches performed in these systems, such as approximate search. In addition, algorithm engineering will be performed to study the efficiency of our solutions in practice, and code will be contributed to software libraries that deal with succinct data structures, to make them more complete and hence more useful for software development.

EstadoActivo
Fecha de inicio/Fecha fin1/1/13 → …

Financiación

  • Natural Sciences and Engineering Research Council of Canada: US$ 21.359,00

ASJC Scopus Subject Areas

  • Computer Science(all)
  • Mathematics (miscellaneous)