CSE 591: Semantic Web Mining -- Spring 2012

Assoc. Prof. Hasan Davulcu
Meeting Times: Mon, Wed 2:00 - 3:15 pm     Location: ECGG 347
Office Hours: Mon, Thurs 3:30 - 4:30pm     Location: BY 564


According to a Nature article the World Wide Web doubles in size approximately every 8 months.
There are approximately 20 million content areas in the Web. According to L. Giles and S. Lawrence
"85% of users use search engines to find information. Consumers use search engines to
locate and buy goods or to research many decisions (such as choosing a vacation
destination, medical treatment or election vote). However, the search engines are currently
lacking in comprehensive and timeliness, and do not index sites equally. The current state
of search engines can be compared to a phone book which is updated irregularly, is biased
toward listing more popular information, and has most of the pages ripped out "
Though the Web is rich with information, gathering and making sense of this data is difficult because
the documents of the Web is largely unorganized. The biggest challenge in the next several decades
 is how to effectively and efficiently dig out a machine-understandable and queriable information and
knowledge layer, called Semantic Web , from unorganized, human-readable Web data.

What will you learn in this course ?


This is an advanced course intended for graduate students with some background in databases,
compilers and automata theory. Some exposure to HTML and XML is also desirable. Also,
very good programming skills in Java, C++ and some scripting languages (such as Perl) is
necessary to complete the course projects. Students with special interest and background in AI,
databases, data mining, information retrieval, machine learning, NLP are encouraged to join. 

Text Book

Other Recommended Books


Each/group of student(s) must complete a project of their choice. I will propose some interesting projects at
MyASU but students are encouraged to come up with relevant project proposals and discuss them with me. The
project should yield a working prototype implementation. The project will be graded in two parts. First,
students must submit a written proposal for the their project detailing the problem formulation and
solution requirements. Next, students must submit a final project report detailing the proposed
solution (an algorithm) and a system architecture. We will also allocate slots for students to present
their project and demonstrate their solutions.


Project proposal
Final Project Report
Homework and Quiz
Final Exam (Open Book)

Course Materials

The course will be conducted as a series of lectures by the instructor to cover the background material
for the following reading list, students' presentations from the following list, invited speakers and
problem solving sessions on the students' projects. Students are encouraged to meet with me as frequent
as they need to make progress on their projects.

An Extended Reading List and Software Packages

Semantic Web and RDF
Web Agents - Regular Expressions
Information Extraction - Shallow and Deep Parsing, NLP
Relational Rule Mining, ILP, F-Logic, Description Logic
Information Integration
Ontologies - Creation, Mapping, Merging
Text Classification
Text Clustering and Summarization
Topic Detection
Sentiment Analysis
Streaming Data - Trend Detection
Recommendation Sytems
Applications: BioInformatics
Applications: E-Commerce

List of Projects

Please refer to the Course HomePage at MyASU for a list of projects and relevant pointers for reading.

Related Links