We address the problem of performing Principal Component Analysis over a family of probability measures on the real line, using the Wasserstein geometry. We present a novel representation of the 2-Wasserstein space, based on a well known isometric bijection and a B-spline expansion. Thanks to this representation, we are able to reinterpret previous work and derive more efficient optimization routines for existing approaches. As shown in our simulations, the solution of these optimization problems can be costly in practice and thus pose a limit to their usage. We propose a novel definition of Principal Component Analysis in the Wasserstein space that, when used in combination with the B-spline representation, yields a straightforward optimization problem that is extremely fast to compute. Through extensive simulation studies, we show how our PCA performs similarly to the ones already proposed in the literature while retaining a much smaller computational cost. We apply our method to a real dataset of mortality rates due to Covid-19 in the US, concluding that our analyses are consistent with the current scientific consensus on the disease.

Pegoraro, M., Beraha, M. (2021). Fast PCA in 1-D Wasserstein Spaces via B-splines Representation and Metric Projection. In Proceedings of the AAAI Conference on Artificial Intelligence (pp.9342-9349). Association for the Advancement of Artificial Intelligence [10.1609/aaai.v35i10.17126].

Fast PCA in 1-D Wasserstein Spaces via B-splines Representation and Metric Projection

Beraha, M
2021

Abstract

We address the problem of performing Principal Component Analysis over a family of probability measures on the real line, using the Wasserstein geometry. We present a novel representation of the 2-Wasserstein space, based on a well known isometric bijection and a B-spline expansion. Thanks to this representation, we are able to reinterpret previous work and derive more efficient optimization routines for existing approaches. As shown in our simulations, the solution of these optimization problems can be costly in practice and thus pose a limit to their usage. We propose a novel definition of Principal Component Analysis in the Wasserstein space that, when used in combination with the B-spline representation, yields a straightforward optimization problem that is extremely fast to compute. Through extensive simulation studies, we show how our PCA performs similarly to the ones already proposed in the literature while retaining a much smaller computational cost. We apply our method to a real dataset of mortality rates due to Covid-19 in the US, concluding that our analyses are consistent with the current scientific consensus on the disease.
paper
Dimensionality Reduction/Feature Selection; Learning with Manifolds
English
35th AAAI Conference on Artificial Intelligence / 33rd Conference on Innovative Applications of Artificial Intelligence / 11th Symposium on Educational Advances in Artificial Intelligence - FEB 02-09, 2021
2021
Proceedings of the AAAI Conference on Artificial Intelligence
9781577358664
2021
35
10
9342
9349
https://ojs.aaai.org/index.php/AAAI/article/view/17126
reserved
Pegoraro, M., Beraha, M. (2021). Fast PCA in 1-D Wasserstein Spaces via B-splines Representation and Metric Projection. In Proceedings of the AAAI Conference on Artificial Intelligence (pp.9342-9349). Association for the Advancement of Artificial Intelligence [10.1609/aaai.v35i10.17126].
File in questo prodotto:
File Dimensione Formato  
Pegoraro-Beraha-2021-AAAI-VoR.pdf

Solo gestori archivio

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Tutti i diritti riservati
Dimensione 478.41 kB
Formato Adobe PDF
478.41 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/545382
Citazioni
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 2
Social impact