Fall 2009
Instructor: Yi
Chen ( yi
at asu.edu )
Time: Monday & Wednesday 3:30PM - 4:45PM
Location:
BYAC 260
Office hours: Tuesday 2:40-3:30
and 5-6pm, or by appointment, BY 562
Description & Objective Format Topics and Schedule Project Grading
Description & Objective
This course will discuss the recent advances in data management to
handle applications where traditional databases are not suitable for. A
traditional database, typically a relational database system or an
object-oriented database system, has several assumptions. First, data conforms
to a fixed schema. Second, data is locally stored, clean and consistent. Third,
data can be queried using a structured query language (for example, SQL). As
web data continues to grow at an explosive pace, we are facing more and more
data that does not fit into a traditional database. For example, web data obtained
from independent sources requires a flexible data representation format such as
XML. Data obtained from integration or extracted from text documents may be
error prone and inconsistent. A web user is generally not able to formulate a
precise query using a structured query language. Furthermore, in publish
subscribe systems and sensor network, the assumption that data is locally
stored has been discarded. As we relax these assumptions, new research
challenges arise. In this course, we will explore in depth the research
problems on semi-structured data management and its applications.
What You Can Get Out of the
Course
The goals of this course are to gain a better understanding of advanced data
management techniques, especially how to store, query, share, and interpret
data across the World-Wide Web. You will also get opportunities to learn skills
to survey, analyze and criticize research papers, obtain hands-on experience on
database projects and participate in research with other students.
Prerequisites :
Background on relational databases and programming ability in Java, C,
or C# are required.
Format
The course is organized around several themes. For each theme, we read and
discuss the selected papers in the literature . There
will be no required textbooks for this class though you can refer to the following
book for additional reading.
The course consists of two lectures a week, class discussions, paper
reading and reviews, and a project. Your responsibilities include:
(The schedule is subject to change. Please check it
frequently.)
Course Overview (8/24)
XML Introduction (2.5weeks)
8/26, 8/31,
9/2, 9/9, 9/14, 9/16 XML data
model, DOM and SAX interfaces, query languages XPath
and XQuery language as specified in W3C
Hand out HW1,
Project
Searching
XML Data using Keywords (2.5 weeks)
9/21 Cohen et al. XSearch: A Semantic Search Engine for XML
VLDB 03
|
Parab, |
Sainath |
9/23 Li et al. Schema-Free XQuery
VLDB 04
|
Rowe, |
Steven |
9/28 Guo et al.
XRANK: ranked
keyword search on XML documents SIGMOD 03
|
Motamarri, |
Lakshminarayana |
(Reference: Brin
and Page: The Anatomy of a Large-Scale Hypertextual Web Search Engine .)
9/30 Bao et al. Effective XML Keyword Search with Relevance Oriented Ranking. In ICDE 09
|
Ramanathan Babu, |
Prabhu |
Keyword Search on Relational
Databases (2 weeks)
10/5 Bhalotia et al. Keyword Searching and Browsing in Databases using BANKS. In ICDE, 02.
|
Chandra, |
Abhishikth |
10/7 Sayyadian et al. Efficient Keyword Search Across Heterogeneous Relational Databases In ICDE 07
|
Sudha, |
Chiranjeevi Vishnu Saran |
10/12 Talukdar et al. Learning
to create data-integrating queries. In VLDB 08
|
Bonala, |
Teja |
Indexing and Querying XML Data (1.5 weeks)
10/14 Kaushik et al Exploiting Local Similarity for Indexing Paths in Graph-Structured Data , ICDE 02
|
Tikves, |
Sukru |
10/19 Shanmugasundaram et al Relational Databases for Querying XML Documents: Limitations and Opportunities . VLDB 99.
|
Nagendra, |
Mithila |
10/21 Zhang et al On Supporting Containment Queries in Relational Database Management Systems SIGMOD 01
|
Mojtahedi, |
Mahsa |
Hand out HW 2
Querying XML
Streams (1 week)
10/26 Zachary G. Ives, Alon Y. Levy, Daniel
S. Weld. Efficient Evaluation of Regular Path Expressions on Streaming XML
Data. Technical
Report UW-CSE-2000-05-02, University of Washington.
|
Shah, |
Imran |
10/28 Altinel and Franklin Efficient Filtering of XML Documents for Selective Dissemination of Information , VLDB 00
|
Dukhande, |
Swapnil |
11/2
Chen et al An
Efficient XPath Query Processor for XML Streams
ICDE 06
Sundaresan, Sivasubramanian
Workflow
Management (1 week)
11/4 Beeri et al. Querying Business Processes
. VLDB 06
|
Kumar, |
Archana |
11/9 Shao et al. Efficient Ticket Routing by Resolution Sequence Mining.
|
Cedeno, |
Juan |
Information Extraction, Probabilistic and Uncertain
Databases (1.5 weeks)
11/16 Gupta & Sarawagi. Creating Probabilistic Databases from Information Extraction Models . VLDB 06
|
An, |
Ho |
11/18 Dalvi & Suciu Efficient Query Evaluation on Probabilistic Databases VLDB 04
|
Natarajan, |
Sivaramakrishnan |
11/23 Qi et al. Integrating and querying taxonomies with QUEST
in the presence of conflicts. SIGMOD 07
|
Alwahishi, |
Rabeia |
Student Project Demo (2 weeks)
11/25, 11/30, 12/2, 12/7
Every group will make a project demo.
Sample
project topics will be discussed in the class. You can propose your own project
that is closely related to the course and discuss it with the instructor first.
The project consists of three parts. First, you need to submit a half-page
project proposal. Next, you need to give a midterm project presentation/report
stating the problem, existing literature and proposed algorithm. Finally, you
need to demo the project to the class and submit a project report detailing the
proposed solution.
Grading
Class
attendance and discussion: 10%
Paper
presentation: 20%
Paper reviews: 15%
Homework: 20%
Project: 35%