A probabilistic approach to document classification