Cluster-weighted models (CWMs) are an important class of machine learning models that are commonly used for modelling complex datasets. However, they are known to suffer from reduced computing efficiency and estimator accuracy when dealing with high-dimensional data. Previous work has proposed a parsimonious technique that can improve CWMs’ performance in the high-dimensional data paradigm. However, this method has a setback for very high-dimensional data, where the dimensionality is greater than 100. In this paper, we propose a new hybridised method that incorporates a dimensionality reduction technique called T-distributed stochastic neighbour embedding (TSNE) to enhance the parsimonious CWMs in high-dimensional space. Additionally, we introduce a novel heuristic for detecting the hidden components of the underlying mixture model, which can be used with the popular R package FlexCWM. We evaluated the performance of the proposed method using two real datasets and found that it improves clustering power when compared to both the parsimony methods and the TSNE methods combined with CWMs in the high-dimensional data setting. Our results suggest that the proposed method can improve the efficiency and accuracy of CWMs in dealing with high-dimensional data, making it a valuable tool for data scientists and statisticians.

Olobatuyi, K., Parker, M., Ariyo, O. (2024). Cluster weighted model based on TSNE algorithm for high-dimensional data. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 17(3), 261-273 [10.1007/s41060-023-00422-8].

Cluster weighted model based on TSNE algorithm for high-dimensional data

Olobatuyi, K
;
2024

Abstract

Cluster-weighted models (CWMs) are an important class of machine learning models that are commonly used for modelling complex datasets. However, they are known to suffer from reduced computing efficiency and estimator accuracy when dealing with high-dimensional data. Previous work has proposed a parsimonious technique that can improve CWMs’ performance in the high-dimensional data paradigm. However, this method has a setback for very high-dimensional data, where the dimensionality is greater than 100. In this paper, we propose a new hybridised method that incorporates a dimensionality reduction technique called T-distributed stochastic neighbour embedding (TSNE) to enhance the parsimonious CWMs in high-dimensional space. Additionally, we introduce a novel heuristic for detecting the hidden components of the underlying mixture model, which can be used with the popular R package FlexCWM. We evaluated the performance of the proposed method using two real datasets and found that it improves clustering power when compared to both the parsimony methods and the TSNE methods combined with CWMs in the high-dimensional data setting. Our results suggest that the proposed method can improve the efficiency and accuracy of CWMs in dealing with high-dimensional data, making it a valuable tool for data scientists and statisticians.
Articolo in rivista - Articolo scientifico
Cluster-weighted model; Expectation maximisation; FlexCWM; High-dimensional data; Parsimonious technique; T-distributed stochastic neighbour embedding;
English
1-lug-2023
2024
17
3
261
273
reserved
Olobatuyi, K., Parker, M., Ariyo, O. (2024). Cluster weighted model based on TSNE algorithm for high-dimensional data. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 17(3), 261-273 [10.1007/s41060-023-00422-8].
File in questo prodotto:
File Dimensione Formato  
Olobatuyi-2024-International Journal of Data Science and Analytics-VoR.pdf

Solo gestori archivio

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Tutti i diritti riservati
Dimensione 2.41 MB
Formato Adobe PDF
2.41 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/500189
Citazioni
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 4
Social impact