I. PAPER READING

Research papers of data mining in the follow categories will be studied in-depth by individual students (including the data mining aspects of Bioinformatics):

  1. Data and application security
  2. Data mining and privacy
  3. Streaming data extraction and mining
  4. Dealing with large data (either row-wise or column-wise)
  5. Overfitting data in classification (ensemble methods)
  6. Association rules (FP-trees, Efficient implementation - e.g., MAFIA)
  7. Mining semi-structure or structured data
  8. Adversarial classification
  9. Semi-supervised learning
  10. Clustering data in forms of scenarios

Where to start (to be revised):

  • Many conferences such as AAAI, ICML, IJCAI, SIGKDD, ICDE, SIGMOD, VLDB, ICDM, PAKDD, PKDD, ECML
    • You can use websites such as KDNggets, Citeseer, Google Scholar
    • Another way to start is browsing the journals or conference proceedings of your specilty (if you don't know yours yet, you should know that of your adviser's; the chief point is to start understanding what your research interests are.)
  • Journals such as AIJ, MLJ, DMKD, JAIR, JMLR, IEEE TKDE, Bioinformatics, Genome Research, Nature, Science

When is it due?

  • The report is submitted in hardcopy by 5:00pm, the link of your slides is submitted via myASU.

What are needed for submission?

  • A paper reading report includes summary, critique, and suggestions
  • A set of powerpoint slides that explains the technical details in-depth (e.g., for a 20 minutes presentation, one may need roughly 15- 20 slides)
  • Some selected reports will be presented in class (we don't know yet who will present - everyone should be prepared to present if he or she is chosen to present)
  • Late penalty (as discussed in class) applies after the deadline

How are presentation slides evaluated?

    Each paper presentation is evaluated in five categories: time control, organization, clarity, preparation, and discussion.

Notes about paper presentation:

  • Everyone is expected to read the listed papers to be presented.
  • Presentation duration: 20-25 minutes including 5 minutes for discussion. Three  questions with solutions suitable for a quiz and in-class discussion should be submitted to the instructor before the presentation with a  hard copy of the presentation slides.

II. PROJECTS

  • Categories of projects (a list of examples is given in the course plan page)
    • Solving a suitable problem from design to implementation
    • Working on a specialized algorithm to make it publicly accessible
    • Challenging problems and applications of data mining: identification and possible solutions
  • Due on Febrary 12, Monday. Project proposal (no more than 3 pages in 12 point font), you should make clear the following:

        1. what you're going to do
        2. why it is a useful project
        3. how the project should be evaluated in your view
        4. what will be the values of your project when it's done

        The proposal serves as a rough outline of your final project report.

        A good beginning is half the success.

  • Where to get data?
    1. Applications (as discussed in class)
    2. Data repositories
    3. Organizations (NIH, TGen, ...)
  • Submission and deadlines: In all submissions, a copy is required and please email the links to your presentation slides using digital drop box at myASU before the deadline. We will host all slides at myASU for student access. You can keep revising your softcopy after the copy and the link are submitted. The following schedule is tentative (please also check myASU for updates).
    • Project Proposal: due on 2/12/2007.
    • Progress Report:  due on 4/11/2007. Be brief. It is about the progress you have made and difficulties encountered with key references.
    • Presentation Slides: due on 4/22/2007, (for about 5 minutes presentation in powerpoint (5 slides maximum), including title, problem statement, approach, and key results). A hard copy is due in class in addition to your uploading the link at myASU by the midnight. Everybody should be prepared to present in class.  
    • Final Report due on 5/2/07 by 5:00pm in hard copy (be concise and self contained) everything about your project with references.

You can submit it earlier than the deadline.

  • Project Presentations
    • The purpose of project presentation is to share the projects among us so that we all know what others are doing. Many projects may be related to what you are doing or are interested in.
    • The slides should be accessible via the Web. Please include the URL of your slides on the cover page.
    • Participation and presenting projects in class is a key element for learning and counted toward class participation.
  • Final Report and/or Demo
    • Please include Problem Statement, Approach, Results, Findings, and New Problems, based on your proposal and progress report. For example, you may try to answer some of the following questions in your report
      • What is the interesting problem your project try to solve
      • Why should I take this approach
      • What is novel
      • What do I produce that add the value to the existing work or literature
      • What are the significant results (negative results can also be significant)
      • Does my work go beyond a mere reimplementation of some existing work
      • What are the lessons learned
    • About the final report
      • The length (or number of pages) is immaterial. In fact, a concise technical report is most preferred.
      • It is a concise and self-contained write-up such that another student can read it and repeat the work.
      • It should at a minimum include (1) the description of your project (2) the technical details  (3) significance, usefulness, or impact (4) findings or results (5) future work if any, (6) important references.
      • For an implementation related project, you need to include a brief manual of how-to-use/development.
      • You should try to convey all your efforts on the project in the report in a simple manner.