I. PAPER READING
Research papers of data mining in the follow categories will
be studied in-depth by individual students (including the data mining aspects
of Bioinformatics):
- Data and application
security
- Data mining and privacy
- Streaming data extraction
and mining
- Dealing with large data
(either row-wise or column-wise)
- Overfitting data in
classification (ensemble methods)
- Association rules (FP-trees,
Efficient implementation - e.g., MAFIA)
- Mining semi-structure or
structured data
- Adversarial classification
- Semi-supervised learning
- Clustering data in forms of
scenarios
Where to start (to be revised):
- Many conferences such as
AAAI, ICML, IJCAI, SIGKDD, ICDE, SIGMOD, VLDB, ICDM, PAKDD, PKDD, ECML
- You can use websites such
as KDNggets, Citeseer, Google Scholar
- Another way to start
is browsing the journals or conference proceedings of your specilty (if
you don't know yours yet, you should know that of your adviser's; the
chief point is to start understanding what your research interests are.)
- Journals such as AIJ, MLJ,
DMKD, JAIR, JMLR, IEEE TKDE, Bioinformatics, Genome Research, Nature,
Science
When is it due?
- The report is submitted in
hardcopy by 5:00pm, the link of your slides is submitted via myASU.
What are needed for submission?
- A paper reading report
includes summary, critique, and suggestions
- A set of powerpoint slides
that explains the technical details in-depth (e.g., for a 20 minutes
presentation, one may need roughly 15- 20 slides)
- Some selected reports will be
presented in class (we don't know yet who will present - everyone should
be prepared to present if he or she is chosen to present)
- Late penalty (as discussed
in class) applies after the deadline
How are presentation slides evaluated?
Each paper presentation is evaluated in five categories: time
control, organization, clarity, preparation, and discussion.
Notes about paper presentation:
- Everyone is expected to read
the listed papers to be presented.
- Presentation duration: 20-25
minutes including 5 minutes for discussion. Three questions with
solutions suitable for a quiz and in-class discussion should be
submitted to the instructor before the presentation with a hard
copy of the presentation slides.
II. PROJECTS
- Categories of projects (a
list of examples is given in the course plan page)
- Solving a suitable
problem from design to implementation
- Working on a
specialized algorithm to make it publicly accessible
- Challenging problems
and applications of data mining: identification and possible solutions
- Due on Febrary 12, Monday.
Project proposal (no more than 3 pages in 12 point font), you should
make clear the following:
1. what you're going to do
2. why it is a useful project
3. how the project should be evaluated in
your view
4. what will be the values of your
project when it's done
The proposal serves as a rough outline
of your final project report.
A good beginning is half the
success.
- Applications (as
discussed in class)
- Data repositories
- Organizations (NIH,
TGen, ...)
- Submission and deadlines: In
all submissions, a copy is required and please email the links to
your presentation slides using digital drop box at myASU before the
deadline. We will host all slides at myASU for student
access. You can keep revising your softcopy after the copy and the link
are submitted. The following schedule is tentative (please also check
myASU for updates).
- Project
Proposal: due on 2/12/2007.
- Progress
Report: due on 4/11/2007. Be brief. It is about
the progress you have made and difficulties encountered with key
references.
- Presentation Slides:
due on 4/22/2007, (for about 5
minutes presentation in powerpoint (5 slides maximum), including
title, problem statement, approach, and key results). A hard copy is due
in class in addition to your uploading the link at myASU by the midnight.
Everybody should be prepared to present in class.
- Final Report due
on 5/2/07 by 5:00pm in hard copy (be concise and
self contained) everything about your project with references.
You can submit it earlier than the
deadline.
- Project Presentations
- The purpose of
project presentation is to share the projects among us so that we all
know what others are doing. Many projects may be related to what you are
doing or are interested in.
- The slides should be
accessible via the Web. Please include the URL of your slides on the
cover page.
- Participation and
presenting projects in class is a key element for learning and counted
toward class participation.
- Final Report and/or Demo
- Please include
Problem Statement, Approach, Results, Findings, and New Problems, based
on your proposal and progress report. For example, you may try to answer
some of the following questions in your report
- What is the
interesting problem your project try to solve
- Why should I take
this approach
- What is novel
- What do I produce
that add the value to the existing work or literature
- What are the
significant results (negative results can also be significant)
- Does my work go
beyond a mere reimplementation of some existing work
- What are the lessons
learned
- About the final
report
- The length (or
number of pages) is immaterial. In fact, a concise technical report is
most preferred.
- It is a concise
and self-contained write-up such that another student can read it and
repeat the work.
- It should at a
minimum include (1) the description of your project (2) the technical
details (3) significance, usefulness, or impact (4) findings or
results (5) future work if any, (6) important references.
- For an
implementation related project, you need to include a brief manual of
how-to-use/development.
- You should try to
convey all your efforts on the project in the report in a simple manner.