Rosé, C. P., Roque, A., Bhembe, D., & VanLehn, K. (2003). A hybrid text classification approach for analysis of student essays. In J. Burstein & C. Leacock (Eds.), Proceedings of the HLT-NAACL 03 Workshop: Building Educational Applications Using Natural Language Processing (pp. 68-75). Edmonton, Alberta, Canada: Association for Computational Linguistics.

 

We present CarmelITC, a novel hybrid text classification approach for analyzing essay answers to qualitative physics questions, which builds upon work presented in (Rosé et al., 2002a). CarmelITC leanrs to classify units of text based on features extracted from a syntactic analysis of that text as well as on a Naive Bayes classification of that text. We explore the tradeoffs between symbolic and "bag of words" approaches. Our goal has been to combine the strengths of both of these approaches while avoiding some of the weaknesses. our evaluation demonstrates that the hybrid CarmelITC approach outperforms two "bag of words" approaches, namely LSA and a Naive Bayes, as well as a purely symbolic approach.

 

Full PDF (110)