A probabilistic relational approach for web document clustering