ASU logo
ASU Sunburst
  • Home
  • Projects
  • Publications
  • CV
  • Links

Recent papers in (biomedical) text mining / NLP

See here for a collection (currently more than 300 papers on biomedical text mining).


Natural language processing tool kits

JULIE Lab NLP Tool Suite --- http://www.julielab.de/content/view/122/179/

Extensions: wrapper for UIMA framework.
OpenNLP --- http://opennlp.sourceforge.net/
Stanford NLP Group --- http://nlp.stanford.edu/software/index.shtml
Offers a parser, POS tagger, NER tool, and other tools.

Machine learning software and tool kits

Java-ML --- http://java-ml.sourceforge.net/
Targets developers with "a readily usable and easily extensible API."
- recent publication in JMLR: Abeel et al., JMLR 10:931-934, 2009
mySVM --- http://www-ai.cs.uni-dortmund.de/SOFTWARE/MYSVM/index.html
Java implementation of a support vector machine. Uses the optimization algorithm of SVMlight.
Cool feature: comes with a DB version that runs inside Oracle 8.1+.
SVMlight --- http://svmlight.joachims.org/
Probably the most often used implementation of SVM, written in C.
Cool feature: check out SVMstruct for learning structured output -- for instance multi-class problems, HMMs, sequence alignment.
Extensions: wrappers/interfaces for Java, Perl, Matlab; access to kernels written in Java; algorithm for approximately training TSVMs; training algorithm for large datasets that can be faster than SVMlight (called SVMperf).
WEKA --- http://www.cs.waikato.ac.nz/ml/weka/
Huge library of data mining tools, written in Java: pre-processing, classification, regression, clustering, association rules, visualization.