- XSEEK : an Intelligent Search Engine for Semi-Structured Data
We are developing a search engine for databases. We are identifying a spectrum of problem space for supporting keyword search on structured/semi-structured data, ranging from evaluation framework, generating high-quality results, to
helping users analyze results, and developing techniques to
address the open challenges. More
Information about XSEEK
SIGMOD 2009 Tutorial: Keyword Search on Structured and Semi-structured Data. Slides: pptx/
ppt
ICDE 2011 Tutorial: Keyword based Search and Exploration on Databases. Slides:
pptx
- Querying Incomplete and Inconsistent Web Databases
We are developing techniques for querying web databases in the
presence of the imprecise nature of user queries as well as inconsistence in
the data. More Information about the Project
- ExpertNet: Collaboration Network for Intelligent Social Computing
We are developing computational
foundations and quantitative frameworks to model, optimize, and search
collaborative social networks to expedite problem-solving and innovation. More Information
about ExpertNet
- SWAN: Smart Workflow Management
We are developing techniques for workflow management, including workflow
modeling, provenance reasoning, workflow search, and optimization, for both
scientific workflows and business processes, for regular workflows as well
as ad-hoc workflows. More
Information about SWAN and its sub-project
SmartFlow for managing
ad-hoc workflows specifically.
Overview:
Traditionally information extraction systems are implemented as a pipeline of
special-purpose processing modules, which necessitates extraction to be
re-applied from scratch to the entire text corpus whenever the data, processing
modules, or extraction goals change. we propose an innovative paradigm for
information extraction: the parse trees that are output by natural language
processing on textual documents are stored in a database, and then extraction is
expressed as queries using our proposed structured query language on databases.
Such a paradigm have several advantages:
- avoiding writing special-purpose extraction programs,
- leveraging query optimization in databases,
- allowing incremental extraction upon changes.
Furthermore, to allow ordinary users to easily perform information extraction or
keyword search on corpus without learning the structured query language, we are
investigating techniques that automatically generate structured queries based on
the user keyword query and its pseudo-relevance feedback to obtain high-quality
results.
Publications:
TKDE'12,
ICDE'10 (demo),
ICDE'06
- Completed Projects
- XML Stream Processing
- XML Databases
- XML Constraints
- Querying Linguistic Databases
A Complete List of
Publications