CSE 511: Semi-Structured Data Management

Fall 2009


Instructor:   Yi Chen     ( yi at asu.edu )

 

Time:   Monday & Wednesday 3:30PM - 4:45PM

Location:   BYAC 260
Office hours:   Tuesday 2:40-3:30 and 5-6pm, or by appointment,  BY 562

 

Description & Objective   Format    Topics and Schedule    Project    Grading



Description & Objective


This course will discuss the recent advances in data management to handle applications where traditional databases are not suitable for. A traditional database, typically a relational database system or an object-oriented database system, has several assumptions. First, data conforms to a fixed schema. Second, data is locally stored, clean and consistent. Third, data can be queried using a structured query language (for example, SQL). As web data continues to grow at an explosive pace, we are facing more and more data that does not fit into a traditional database. For example, web data obtained from independent sources requires a flexible data representation format such as XML. Data obtained from integration or extracted from text documents may be error prone and inconsistent. A web user is generally not able to formulate a precise query using a structured query language. Furthermore, in publish subscribe systems and sensor network, the assumption that data is locally stored has been discarded. As we relax these assumptions, new research challenges arise. In this course, we will explore in depth the research problems on semi-structured data management and its applications.

What You Can Get Out of the Course
The goals of this course are to gain a better understanding of advanced data management techniques, especially how to store, query, share, and interpret data across the World-Wide Web. You will also get opportunities to learn skills to survey, analyze and criticize research papers, obtain hands-on experience on database projects and participate in research with other students.

Prerequisites :
Background on relational databases and programming ability in Java, C, or C# are required.  

 



Format

The course is organized around several themes. For each theme, we read and discuss the selected papers in the literature . There will be no required textbooks for this class though you can refer to the following book for additional reading.


The course consists of two lectures a week, class discussions, paper reading and reviews, and a project. Your responsibilities include:


Topics and Schedule

(The schedule is subject to change. Please check it frequently.)

Course Overview (8/24)


XML Introduction  (2.5weeks)

8/26, 8/31, 9/2, 9/9, 9/14, 9/16 XML data model, DOM and SAX interfaces, query languages XPath and XQuery language as specified in  W3C

Hand out HW1, Project

 

Searching XML Data using Keywords  (2.5 weeks)
9/21  Cohen et al.
XSearch: A Semantic Search Engine for XML   VLDB 03 

Parab,

Sainath


9/23   Li et al. Schema-Free XQuery VLDB 04

Rowe,

Steven


9/28
  Guo et al. XRANK: ranked keyword search on XML documents SIGMOD 03

Motamarri,

Lakshminarayana

         (Reference: Brin and Page: The Anatomy of a Large-Scale Hypertextual Web Search Engine .)

 

9/30  Bao et al. Effective XML Keyword Search with Relevance Oriented Ranking. In ICDE 09

Ramanathan Babu,

Prabhu

 


Keyword Search on Relational Databases (2 weeks)  

10/5  Bhalotia et al. Keyword Searching and Browsing in Databases using BANKS. In ICDE, 02.

Chandra,

Abhishikth

 

10/7 Sayyadian et al. Efficient Keyword Search Across Heterogeneous Relational Databases In ICDE 07

Sudha,

Chiranjeevi Vishnu Saran

 

10/12 Talukdar et al. Learning to create data-integrating queries. In VLDB 08

Bonala,

Teja

 

 

Indexing and Querying XML Data  (1.5 weeks)

10/14 Kaushik et al Exploiting Local Similarity for Indexing Paths in Graph-Structured Data , ICDE 02

Tikves,

Sukru

 

10/19  Shanmugasundaram et al Relational Databases for Querying XML Documents: Limitations and OpportunitiesVLDB 99.

Nagendra,

Mithila

 

10/21  Zhang et al On Supporting Containment Queries in Relational Database Management Systems SIGMOD 01

Mojtahedi,

Mahsa

 

Hand out HW 2

 

 

Querying XML Streams (1 week)
10/26 Zachary G. Ives, Alon Y. Levy, Daniel S. Weld.  Efficient Evaluation of Regular Path Expressions on Streaming XML Data.  Technical Report UW-CSE-2000-05-02, University of Washington.

Shah,

Imran

 

 

10/28  Altinel and Franklin Efficient Filtering of XML Documents for Selective Dissemination of Information , VLDB 00

Dukhande,

Swapnil

 

 

11/2   Chen et al An Efficient XPath Query Processor for XML Streams ICDE 06
Sundaresan, Sivasubramanian

 

Workflow Management (1 week)
11/4 Beeri et al. Querying Business Processes . VLDB 06

Kumar,

Archana

 

11/9 Shao et al. Efficient Ticket Routing by Resolution Sequence Mining.

Cedeno,

Juan

 

Information Extraction, Probabilistic and Uncertain Databases (1.5 weeks)
11/16  Gupta & Sarawagi. Creating Probabilistic Databases from Information Extraction Models . VLDB 06         

An,

Ho

 

11/18   Dalvi & Suciu Efficient Query Evaluation on Probabilistic Databases VLDB 04

Natarajan,

Sivaramakrishnan


11/23  Qi et al.  Integrating and querying taxonomies with QUEST in the presence of conflicts. SIGMOD 0
7

Alwahishi,

Rabeia

 

Student Project Demo (2 weeks)

11/25, 11/30, 12/2, 12/7 Every group will make a project demo.


Project

Sample project topics will be discussed in the class. You can propose your own project that is closely related to the course and discuss it with the instructor first. The project consists of three parts. First, you need to submit a half-page project proposal. Next, you need to give a midterm project presentation/report stating the problem, existing literature and proposed algorithm. Finally, you need to demo the project to the class and submit a project report detailing the proposed solution.



Grading

Class attendance and discussion: 10%                        

Paper presentation: 20%
    Paper reviews: 15%
    Homework: 20%  
    Project: 35%