In this paper, we investigate the impact of optimal hyper-parameter configuration in relational topic models. The main goal is to validate the hypothesis that single-objective Bayesian Optimization (BO) can discover a hyper-parameter setting that leads a set of relational topic models to simultaneously ensure good prediction capabilities and significant topics from a qualitative perspective. Our research, as a result of a comparative analysis performed on 7 state-of-the-art models, 5 performance measures and 3 datasets, has highlighted three main findings: (1) the majority of relational topic models are not able to offer a good trade-off between classification capabilities and topic interpretability; (2) single-objective optimization of hyper-parameters, targeted on maximizing the F1-Measure, is able to create topics that are also optimal with respect to the Kullback Leibler divergence measure; (3) the Pareto frontiers across several performance metrics reveals that the most promising trade-off between the performance metrics can be obtained by Constrained Relational Topic Models.
Terragni, S., Candelieri, A., Fersini, E. (2023). The role of hyper-parameters in relational topic models: Prediction capabilities vs topic quality. INFORMATION SCIENCES, 632(June 2023), 252-268 [10.1016/j.ins.2023.02.076].
The role of hyper-parameters in relational topic models: Prediction capabilities vs topic quality
Terragni S.;Candelieri A.;Fersini E.
2023
Abstract
In this paper, we investigate the impact of optimal hyper-parameter configuration in relational topic models. The main goal is to validate the hypothesis that single-objective Bayesian Optimization (BO) can discover a hyper-parameter setting that leads a set of relational topic models to simultaneously ensure good prediction capabilities and significant topics from a qualitative perspective. Our research, as a result of a comparative analysis performed on 7 state-of-the-art models, 5 performance measures and 3 datasets, has highlighted three main findings: (1) the majority of relational topic models are not able to offer a good trade-off between classification capabilities and topic interpretability; (2) single-objective optimization of hyper-parameters, targeted on maximizing the F1-Measure, is able to create topics that are also optimal with respect to the Kullback Leibler divergence measure; (3) the Pareto frontiers across several performance metrics reveals that the most promising trade-off between the performance metrics can be obtained by Constrained Relational Topic Models.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.