Detalles del proyecto
Description
The social media web sites, such as Twitter and Facebook, contain hundreds of millions of accountscorresponding to about the same number of people, who post information about their interests, preferences,opinions, information needs, and similar. The company LeadSift mines these data streams in real-time in orderto extract targeted sales leads -- expressed user intents with commercial value. The textual streams mined aretypically so-called microblogs consisting of only about hundred to two hundred characters of noisy text. Wepropose to enhance the accuracy of extracted leads through the use of semantic role labeling. Overlayingsentence structure, which identifies dependencies between words and their syntactic role in the sentence withsemantic meaning of words conforming a phrase allows for precise identification and definition of patterns thatcould be perceived as leads. The most significant research challenges are associated with the length limitationsof microblogging posts, typically leading to grammatical freedom, and misspelled words, which in turnsignificantly reduces the quality of analysis, as well as with ambiguity and context of thoughts expressed in anatural language, easy to grasp by humans, but tough for algorithms. Due to the nature of real-time mining, theapproach that will be developed will need to be very efficient. The resulting benefit to the company should be asignificant increase in the quality of social leads generated from social media.
Estado | Activo |
---|---|
Fecha de inicio/Fecha fin | 1/1/12 → … |
Financiación
- Natural Sciences and Engineering Research Council of Canada: US$ 25.013,00
ASJC Scopus Subject Areas
- Computer Science(all)
- Development