�Feature allocation models generalize classical species sampling models by allowing every observation to belong to more than one species, now called features. Under the popular Bernoulli product model for feature allocation, we assume n observable samples and we consider the problem of estimating the expected number Mn of hitherto unseen features that would be observed if one additional individual was sampled. The interest in estimating Mn is motivated by numerous applied problems where the sampling procedure is expensive, in terms of time and/or financial resources allocated, and further samples can be only motivated by the possibility of recording new unobserved features. We consider a nonparametric estimator ˆMn of Mn which has the same analytic form of the popular Good-Turing estimator of the missing mass in the context of species sampling models. We show thatˆMn admits a natural interpretation both as a jackknife estimator and as a nonparametric empirical Bayes estimator. Furthermore, we give provable guarantees for the performance of ˆMn in terms of minimax rate optimality, and we provide with an interesting connection betweenˆMn and the Good-Turing estimator for species sampling. Finally, we derive non-asymptotic confidence intervals forˆMn, which are easily computable and do not rely on any asymptotic approximation. Our approach is illustrated with synthetic data and SNP data from the ENCODE sequencing genome project.

Ayed, F., Battiston, M., Camerlenghi, F., Favaro, S. (2019). A good-turing estimator for feature allocation models. ELECTRONIC JOURNAL OF STATISTICS, 13(2), 3775-3804 [10.1214/19-EJS1614].

A good-turing estimator for feature allocation models

Camerlenghi, F;
2019

Abstract

�Feature allocation models generalize classical species sampling models by allowing every observation to belong to more than one species, now called features. Under the popular Bernoulli product model for feature allocation, we assume n observable samples and we consider the problem of estimating the expected number Mn of hitherto unseen features that would be observed if one additional individual was sampled. The interest in estimating Mn is motivated by numerous applied problems where the sampling procedure is expensive, in terms of time and/or financial resources allocated, and further samples can be only motivated by the possibility of recording new unobserved features. We consider a nonparametric estimator ˆMn of Mn which has the same analytic form of the popular Good-Turing estimator of the missing mass in the context of species sampling models. We show thatˆMn admits a natural interpretation both as a jackknife estimator and as a nonparametric empirical Bayes estimator. Furthermore, we give provable guarantees for the performance of ˆMn in terms of minimax rate optimality, and we provide with an interesting connection betweenˆMn and the Good-Turing estimator for species sampling. Finally, we derive non-asymptotic confidence intervals forˆMn, which are easily computable and do not rely on any asymptotic approximation. Our approach is illustrated with synthetic data and SNP data from the ENCODE sequencing genome project.
Articolo in rivista - Articolo scientifico
Feature allocation model; Good-Turing estimator; Minimax rate optimality; Missing mass; Non-asymptotic uncertainty quantification; Nonparametric empirical Bayes; SNP data; �Bernoulli product model;
Bernoulli product model, feature allocation model, Good-Turing estimator, minimax rate, optimality, missing mass, non-asymptotic uncertainty quantification, nonparametric empirical Bayes, SNP data
English
3775
3804
30
Ayed, F., Battiston, M., Camerlenghi, F., Favaro, S. (2019). A good-turing estimator for feature allocation models. ELECTRONIC JOURNAL OF STATISTICS, 13(2), 3775-3804 [10.1214/19-EJS1614].
File in questo prodotto:
File Dimensione Formato  
19-EJS1614.pdf

Solo gestori archivio

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Dimensione 360.42 kB
Formato Adobe PDF
360.42 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/258186
Citazioni
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 3
Social impact