CSE 591: Semantic Web Mining -- Fall 2005

Asst. Prof. Hasan Davulcu
Meeting Times: TuTh 10:40 - 11:55 pm     Location: BYAC 220
Office Hours: TuTh 2:00 - 3:00pm     Location: BY 564

Introduction

According to a Nature article the World Wide Web doubles in size approximately every 8 months.
There are approximately 20 million content areas in the Web. According to L. Giles and S. Lawrence
article:
"85% of users use search engines to find information. Consumers use search engines to
locate and buy goods or to research many decisions (such as choosing a vacation
destination, medical treatment or election vote). However, the search engines are currently
lacking in comprehensive and timeliness, and do not index sites equally. The current state
of search engines can be compared to a phone book which is updated irregularly, is biased
toward listing more popular information, and has most of the pages ripped out "
Though the Web is rich with information, gathering and making sense of this data is difficult because
the documents of the Web is largely unorganized. The biggest challenge in the next several decades
 is how to effectively and efficiently dig out a machine-understandable and queriable information and
knowledge layer, called Semantic Web , from unorganized, human-readable Web data.
 

What will you learn in this course ?

Prerequisites

This is an advanced course intended for graduate students with some background in databases,
compilers and automata theory. Some exposure to HTML and XML is also desirable. Also,
very good programming skills in Java, C++ and some scripting languages (such as Perl) is
necessary to complete the course projects. Students with special interest and background in AI,
databases, data mining, information retrieval, machine learning, NLP are encouraged to join. 

Text Book

Other Recommended Books

Project

Each/group of student(s) must complete a project of their choice. I will propose some interesting projects at
MyASU but students are encouraged to come up with relevant project proposals and discuss them with me. The
project should yield a working prototype implementation. The project will be graded in two parts. First,
students must submit a written proposal for the their project detailing the problem formulation and
solution requirements. Next, students must submit a final project report detailing the proposed
solution (an algorithm) and a system architecture. We will also allocate slots for students to present
their project and demonstrate their solutions.

Grading

Project proposal
10%
Final Project Report
30%
Homework and Quiz
30%
Final Exam (Open Book)
30%

Course Materials

The course will be conducted as a series of lectures by the instructor to cover the background material
for the following reading list, students' presentations from the following list, invited speakers and
problem solving sessions on the students' projects. Students are encouraged to meet with me as frequent
as they need to make progress on their projects.

An Extended Reading List and Software Packages

Semantic Web and RDF
Regular Expressions, Web Agents
HMMs and Information Extraction
Rule Mining, ILP, F-Logic, Description Logic
Information Integration
Planning for Data Gathering

DAML, OIL, RuleML, XML
, XSLT, XPATH, HTML
Ontologies, Learning, Editing
Text Classification
Applications: BioInformatics
Applications: E-Commerce

List of Projects

Please refer to the Course HomePage at MyASU for a list of projects and relevant pointers for reading.

Related Links

http://www.cs.utexas.edu/users/mfkb/related.html