Appendix A
Data Mining and Knowledge Discovery Sources
 
We provide here some pointers related to machine learning (ML), data mining and knowledge discovery (KD) for the reader's convenience. For periodically updated, added or deleted links, the reader should visit the web page:
http://www.public.asu.edu/~huanliu/Fsbook
 

 
OTHER WEB SITE LINKS
[TOOLS   JOURNALS]
Legends:
 
B - Bibliographies ; D - Data sets ; I - Information on events or conferences ; L - Links to KDD sites ; N - Newsletters or mailing List ; S - Software or programs ; P - Papers ; 
 



Name Site and Contact  Resource



The KD Mine http://www.kdnuggets.com/index.html  
G. Piatetsky-Shapiro gps@kdnuggets.com
_DILNSP
 
A recommended site for comprehensive information in KDD. Contains links to E-newsletter, software, data sets, publications and other related site

 
The Data Mine http://www.cs.bham.ac.uk/~anp/TheDataMine.html   
A. Parke anp@cs.bham.ac.uk
B_IL_SP
 
A major server of KDD and OLAP information. Provide a repository to papers, bibliographies and links to conferences, software and web sites. 

 
UCI ML Site  http://www.ics.uci.edu/ ~mlearn/  
UC Irvine jmuramat@ics.uci.edu  
B_IL_SP
 
A comprehensive ML site. Popular for its large repository of standard DATA SETS and ML programs for experimental evaluations.

 
MLnet ML Archive 
at GMD 
http://www.gmd.de/ml-archive 
ML Group of GMD ml-archive@gmd.de 
BDIL_SP 
 
It offers a wide collections of ML information, data sets, programs and links to other ML resources. 

 
CMU A.I. Repository http://www.cs.cmu.edu/Groups/AI/html/repository.html  
ftp://ftp.cs.cmu.edu/user/ai/  
CMU AI.Repository@cs.cmu.edu 
B_IL_SP 
 
Established to collect ,les, pgms &; publications of interest to A.I. researchers, educators, students and practitioners. Web sites links to LISP, PROLOG & SCHEME. 

 
Machine Learning 
Online
http://mlis.www.wkap.nl/mach/ml_links.htm   
J. Schlimmer Schlimme@eecs.wsu.edu 
BDILNSP 
 
A comprehensive links to ML resources over the WWW. The links are sorted into various categories, like software, bibliographies, data sets, conferences & people, etc. 

 
a. ML Folks 
b. David Aha's ML & CBR resources 
http://www.aic.nrl.navy.mil/~aha/people.html  
http://www.aic.nrl.navy.mil/~aha/research/machine-learning.html http://www.aic.nrl.navy.mil/~aha/research/case-based-reasoning.html 
D. Aha aha@aic.nrl.navy.mil 
BILNSP 
 
a. The Navy Center for Applied Research in  A.I. (NCARAI).  
b. Aha's page provide information, programs, papers, bibliographies, and researchers in different areas of ML and CBR. 

 
Society for A.I. and Statistics  http://www.vuse.vanderbilt.edu/~dfisher/ai-stats/society.html  
D. Fisher dfisher@vuse.vanderbilt.edu 
B_ILN_P 
 
A list of links to other A.I. and Statistics related sites. Also contains publications & researches to topics in A.I. and statistics. Include the AI-Stats E-mailing list.

 
WWW Virtual Library on 
Statistics 
http://www.stat.ufl.edu/vlib/statistics.html                                
Mike Conlon mconlon@stat.ufl.edu 
BDILNSP 
 
An up-to-date online resource to statistical software, data sets and links, etc. 

 
PC Webopaedia DM Page http://www.pcwebopedia.com/data_mining.htm  
Sandy Bay Software, Inc. webmaster@sandybay.com 
__IL___ 
 
Provide definitions and contains news, articles and links to useful sites in Data Mining applications. 

 
Evaluation of Intelligent Systems  http://eksl-www.cs.umass.edu/eis/                        
D. Jensen jensen@cs.umass.edu 
_DILN__ 
 
An online resource that provides -one stop shopping for managers, system builders, researchers, and users who wish to study the empirical behavior of information systems. 

 
Publications of the A.I. Research 
Group Iowa State 
http://www.cs.iastate.edu/~honavar/publist.html  
V. Honavar honavar@cs.iastate.edu  
______P 
 
An archive of publications for researches in A.I. 

 
Univ. of Portsmouth 
ML Archive
http://www.sis.port.ac.uk/ml-algs/ 
S. Thompson sgt@sis.port.ac.uk 
_D__S_ 
 
Various LISP interfaced programs of popular ML algorithms, like AQ, Cobweb, CN2, FOIL, ID3, c4.5, knn, etc., and data sets in LISP format. 

 
OFAI's Online ML Resources  http://www.ai.univie.ac.at/oefai/ml/ml-resources.html  
G. Widmer gerhard@ai.univie.ac.at 
_DILNSP 
 
Site's Name Site and Contact Resource A comprehensive links to ML resources, like journals, papers, software, ..., etc. over the WWW. 

 
IBM Almaden Research Center - 
Quest DM project 
http://www.almaden.ibm.com/cs/quest/index.html  
P. Greissl greissl@vnet.ibm.com  
_D___SP 
 
Research in DM related to IBM Intelligent Miner. Also contains Synthetic Data Generation Codes for Associations, Sequential Patterns and Classification. 

 
The WWW  Virtual Library 
page on A.I. 
http://www.cs.reading.ac.uk/people/dwc/ai.html  
J. Bowen J.P.Bowen@reading.ac.uk 
_DILNSP 
 
An online resource to A.I. programs, software, data sets, bibliographies, links,  ...., etc., in the WWW.

 
ML group of LIACC http://www.ncc.up.pt/liacc/ML/  
P.Brazdil & J.Gama statlog-adm@ncc.up.pt 
_D_L_SP 
 
Research on evaluation &; characterization of learning systems using various ML methods. StatLog project - A comparative studies of different ML, neural and statistical classification algorithms. Contains about 20 different algorithms and data sets. 

 
KDD at GTE  http://info.gte.com/~kdd/kdd-at-gte.html  
GTE Laboratory absweb@gte.com 
_____S_ 
 
It contains predictive modeling techniques using a multi-strategy approach, combining techniques such as neural networks, decision trees, clustering & nearest neighbor classifiers. Tools include KEFIR, CHAMPS.

 

 ELECTRONIC NEWSLETTERS, PAGES AND JOURNALS
[TOOLS    OTHERS]
 



E-Newsletter Site and Moderator Frequency



Knowledge Discovery Nuggets http://www.kdnuggets.com/subscribe.html  
G. Piatetsky-Shapiro gps@kdnuggets.com 
Weekly
 
An mailing list focusing on Data Mining and KDD research & applications. 

 
Machine Learning List  http://www.ics.uci.edu/~mlearn/MLList.html 
M Pazzani (UCI) ml@ics.uci.edu 
Monthly 
 
 
A mailing list focusing on the scientific study of Machine Learning.

 
AI and Statistics 
Mailing List
http://www.vuse.vanderbilt.edu/~dfisher/ai-stats/mailing-list.html D. Fisher ai-stats@watstat.uwaterloo.ca  Monthly
 
A mailing list providing information on AI and Statistics

 
DBWorld http://www.informatik.uni-trier.de/~ley/db/dbworld.html 
R. Ramakrishnan dbworld@cs.wisc.edu
Weekly
 
A mailing list providing messages of general interest to the Database community

 
Journal of Intelligent 
Data Analysis
http://www.elsevier.com/locate/ida  
A. Family Elsevier Science Inc. editor@ida-ij.com 
Quarterly
 
An E-Journal to examine issues related to the research & appls of AI techniques in data analysis.

 
Journal of AI Research  http://www.jair.org/ 
Michael Wellman wellman@umich.edu 
Half Yearly
 
The journal includes research articles, technical notes, survey & expository in A.I.

 
Journal on Data 
Mining & Knowledge Discovery
http://www.research.microsoft.com  
Usama Fayyad fayyad@microsoft.com
Quarterly
 
The journal consolidate papers in both the research & practice of KDD, surveys of imp techniques & appl papers.

 
Kuwer Machine 
Learning Journal 
http://mlis.www.wkap.nl/  
Thomas G. Dietterich Services@wkap.nl
Quarterly
 
A Journal on researches in robotics, computer & info sci, AI, Machine Learning, expert syst & cognitive science.

 
IEEE Trans KDE: SI 
Data Mining
http://www.computer.org:80/tkde/  
Farokh B. Bastani fbastani@uh.edu 
Bi-Quarterly
 
An archival journal on the advances in Knowledge and Data Engineering 

 
IEEE Data 
Engineering Bulletin 
http://www.research.microsoft.com/research/db/debull/  
David Lomet lomet@microsoft.com 
Quarterly
 
An Ebullition focusing on the design, implementation, model'g, theory and application of DB systems & their technology.

 

SOME PUBLICLY AVAILABLE TOOLS
[OTHERS    JOURNALS]

 



Site Name Site and Contact Category



MLC++ http://www.sgi.com/Technology/mlc/source.html 
R. Kohavi mlc@postofc.corp.sgi.com
MPS - {Cls, Cst, Vis, Dev} & Pre - {Fs, Fd}.
 
A public ML Library in C++ (main engine behind MineSet). Various inducers and pre-processing wrappers are available.

 
MOBAL  http://nathan.gmd.de/projects/ml/mobal/mobal.html  
ML Group GMD mobal@gmd.de
MPS & ILP Learning, Knowledge Acquisition
 
An enhanced public version of the GMD ML and knowledge acquisition system for 1st-order KBS development. Uses multi-strategy ML methods for automated knowledge acquisition.

 
TOOLDIAG  http://www.inf.ufes.br/~thomas/www/home/tooldiag.html  
T. Rauber thomas@inf.ufes.br
MPS {Cls} 
 
A set of public tools (in C) for statistical pattern recognition of multivariate numerical data for classification

 
DBMiner  http://db.cs.sfu.ca/DBMiner  
H. Jiawei han@cs.sfu.ca 
MPS {Cls, Asc, Sum, Vis}
 
 An interactive mining tool for multiple level knowledge in DB.

 
Emerald  http://aic.gmu.edu/kiconemerald.html  
A.I. Center, George Mason Univ. jwnek@aic.gmu.edu 
MPS {Cls, Cst, Sum} 
 
A research system of 5 different ML programs: AQ, INDUCE, CLUSTER, SPARC & INDUCE. 

 
Kepler http://nathan.gmd.de/projects/ml/kepler-englisch.html  
S. Wrobal stefan.wrobel@gmd.de
 MPS {Cst, Cls - Dt, Nn} 
 
A multi paradigm, multi-purpose DM research system, extensible through a "plug-in" interface. It is builds on prev work on systems such as Mobal and Explora 

 
Weka (2.2)  http://www.cs.waikato.ac.nz/~ml  
ML Gp at Waikato wekasupport@cs.waikato.ac.nz
MPS {Cls, Sum}
 
A software workbench which integrates many different ML tools within a common framework and a uniform GUI.

 
Sipina-W  http://eric.univ-lyon2.fr/~ricco/sipina.html  
ftp://hp-eric.univ-lyon2.fr/pub/sipina  
D. Zighed zighed@diogene.univ-lyon2.fr 
Cls - Dt 
 
It includes an ANALYSIS Module consisting of CART (Two-ing Rule, Gini index), Elisee (Chi-2), Quinlan's ID3 and C4.5, ChAID and SIPINA; and 2 new methods - QR MDL (MDL principle) and WDTaiqm (Bayesian).

 
OC1 http://www.cs.jhu.edu/~salzberg/announce-oc1.html  
ftp://ftp.cs.jhu.edu/pub/oc1  
S. Salzberg salzberg@cs.jhu.edu
Cls - Dt 
 
"Oblique Classifier 1" is a Dt induction system designed for applications where the instances have continuous feature values. 

 
C4.5  ftp://ftp.cs.su.oz.au/pub/ml/  
http://www.rulequest.com/  
J.R. Quinlan quinlan@rulequest.com 
Cls - Dt 
 
The classical decision tree induction tool. Latest patches include a module for converting a decision tree to a set of rules. A commercial upgraded version for Windows environment is also available.

 
FOIL  ftp://ftp.cs.su.oz.au/pub/  
J. Quinlan quinlan@rulequest.com 
Cls - Rd 
 
FOIL reads extensional specifications of a set of relational data and produces Horn clause defi nitions (relationship).

 
NEuroNet  http://www.neuronet.ph.kcl.ac.uk/neuronet/software/software.html  
C. Hinton neuronet@kcl.ac.uk 
Cl - Nn 
 
An on-line repository of information and available software on neural networks. 

 
AutoClass C AutoClass III  http://ic-www.arc.nasa.gov/ic/projects/bayes-group/group/autoclass 
/autoclass-cprogram.html  
W. Taylor taylor@ptolemy.arc.nasa.gov 
Cls, Cst 
 
Site's Name Site and Contact Category Auto Class is an unsupervised Bayesian classification system that seeks a maximum posterior probability classification.

 
Iris (Descartes) http://allanon.gmd.de/and/and.html 
G. L. Andrienko gennady@nathan.gmd.de 
Vis 
 
Iris is a research prototyping tool supporting the automated visualization and interactive manipulation of spatially (map) referenced data. It is marketed as Descartes. 

 
TiMBL  http://ilk.kub.nl/software.html  
ILK Research Group Timbl@kub.nl
Cls - Dt, Rd 
 
An implementation of several memory based learning techniques for discrete data. A representation of the training set is explicitly stored in memory, and new cases are classified by extrapolation from the most similar stored cases. 

 
SNNS  ftp://ftp.informatik.uni-stuttgart.de/pub/SNNS/  
http://www.lans.ece.utexas.edu/winsnns.html  
A. Zell zell@informatik.uni-stuttgart.de
Cls - Nn, Cst, Vis
 
A simulation environment for research on and applications of neural networks.