Názov:Improving LSA word weights for document classification
Vedúci:RNDr. Kristína Malinovská, PhD.
Kµúčové slová:natural language processing, document classification, gradient descent, LSA
Abstrakt:Latent semantic analysis may perform poorly on document classification tasks, because it selects the most representative, but not the most discriminative features. We propose a new method eLSA, which introduces another layer of weights w' that are trained with gradient descent. We experimentally show, that learning of eLSA converges, and that it achieves higher accuracy than LSA. We also use eLSA to analyze common weighting schemes and identify words, which are underweight or overweight in these schemes.

Súbory diplomovej práce: