Pappuswamy, U., Bhembe, D., Jordan, P. W., & VanLehn, K. (2005). A supervised clustering method for text classification. In A. Gelbukh (Ed.), Proceedings of Computational Linguistics and Intelligent Text Processing:  6th International Conference, CICLing: Vol. 3406. (pp. 704 - 714). Springer-Verlag.

This paper describes a supervised three-tier clustering method for classifying students  essays of qualitative physics in the Why2-Atlas tutoring system. Our main purpose of categorizing text in our tutoring system is to map the students  essay statements into principles and misconceptions of physics. A simple  bag-of-words  representation using a naïve-bayes algorithm to categorize text was unsatisfactory for our purposes of analyses as it exhibited many misclassifications because of the relatedness of the concepts themselves and its inability to handle misconceptions. Hence, we investigate the performance of the k-nearest neighborhood algorithm coupled with clusters of physics concepts on classifying students  essays. We use a three-tier tagging schemata (cluster, sub-cluster and class) for each document and found that this kind of supervised hierarchical clustering leads to a better understanding of the student s essay.

For a PDF full article version, click here (398KB).