Thompson sampling for Performance Marketing and delayed conversions

Gigli, M

The present work deals with the algorithmic optimisation of return on investment in digital Performance Marketing. Among the parameters that can be tuned, an advertiser must decide how to split the total allocated budget among campaigns and set the bids for the automated auctions that regulate appearance on web pages: this bid / budget selection problem has been recently cast as a Multi-armed Bandit problem. However, several shortcomings were identified in the state of the art, which limit practical applications. This dissertation is devoted to tackling these limitations. First, Search Engine Marketing (SEM) campaigns present a hierarchical structure: a daily budget is assigned to a campaign, but bids can be finetuned across the ads that make up the campaign. This freedom is not leveraged by the current state of the art, which focuses on deciding one bid per campaign: the algorithm is here extended to handle this hierarchical structure. Then, leveraging domain knowledge, a contextual bandit model is proposed to greatly reduce the cost of exploration by achieving high learning efficiency. This is especially important if a sliding window mechanism is used to deal with non-stationarity. Further, the model is interpretable, which eases elicitation of Bayesian prior distributions upon parameters, and it can be optimised via local, as opposed to global, methods. To compare the proposed algorithm to the state of the art, a simulation environment was built, exploiting what is disclosed about a major SEM platform. Extensive numerical experiments on both synthetic and real-world data show that, on average, the proposed parametric bandit gains more conversions than state-of-the-art bandits. Gains in performance are particularly high when an optimisation algorithm is needed the most (tight budget, many ad groups, few clicks or small conversion rate, rapidly changing markets or short time horizons). Since the proposed approach needs Markov chain Monte Carlo, a lightweight approximate alternative is given and tested as well, namely Bootstrapped Thompson Sampling. Another limitation of present state-of-the-art methods is ignoring the delays that occur between clicking on an ad and further, stronger, signs of interest, as contacts or purchases. To address this issue, a thorough review of the literature on bandits with delayed rewards was conducted and is reported in this dissertation. A setting emerged which, although simpler in nature with respect to the full bid / budget selection problem, shows several of its defining characteristics: a continuous space of actions and partially observable rewards. Rewards are partially observable since, when users decide to buy, a clear signal is sent to the advertiser, while no signal is ever sent when users decide not to buy: these users cannot be distinguished from those that will buy in the future. In this setting, the state-of-the-art approach presents room for improvement. It discards useful information on the entity of delays; it employs a heuristic which proved brittle with respect to delayed rewards and model misspecification in other related settings; this heuristic depends on a hyperparameter, which requires some knowledge of the distribution of delays to be tuned; it requires calculations which are hard to generalise to more complex settings. For these reasons, a new approximate Bayesian algorithm was developed. It extends Bootstrapped Thompson sampling to this setting, using an Expectation Maximisation model in lieu of a Maximum Likelihood Estimate; it can accommodate a general delay distribution, without requirements on a parametric form or light tails. The proposed approach was compared to the state of the art on a manifold of delay distributions and on real data. Tests show a significant improvement in performance in the great majority of cases, and comparable behaviour in the remaining ones.

Il presente lavoro di tesi tratta dell’ottimizzazione algoritmica della spesa nel Performance Marketing. Tra i parametri su cui si può agire, bisogna decidere come suddividere un budget totale e impostare le puntate per le aste automatiche che decidono quali annunci vengono mostrati: questo problema è stato recentemente espresso nel formalismo dei banditi multi-braccio. Tuttavia, sono stati identificati diversi limiti nello stato dell’arte, che ostacolano l’applicazione pratica. Questa tesi si propone di affrontare questi limiti. Innanzitutto, le campagne Search Engine Marketing (SEM) presentano una struttura gerarchica: un budget giornaliero è assegnato a una campagna, ma le puntate possono essere regolate sui singoli annunci che formano la campagna. Questa libertà non viene sfruttata dallo stato dell’arte, che si limita a una puntata per campagna: l’algoritmo viene qui esteso a questa struttura gerarchica. Successivamente, facendo leva su conoscenza di dominio viene proposto un bandito contestuale, per ridurre di molto il costo dell’esplorazione a causa di una più alta efficienza nell’apprendimento. Questo tratto è particolarmente importante se si fa fronte alla non-stazionarietà scartando i dati più vecchi di una certa soglia. Inoltre, il modello è interpretabile, il che facilita l’elicitazione delle distribuzioni a priori Bayesiane sui parametri, e l’ottimizzazione può essere condotta con metodi locali, invece che globali. Per confrontare l’algoritmo proposto con lo stato dell’arte, è stato sviluppato un ambiente di simulazione ad hoc, sfruttando le caratteristiche note della principale piattaforma SEM. Estesi esperimenti numerici su dati sia sintetici che reali mostrano che, in media, il bandito parametrico proposto ottiene più conversioni dello stato dell’arte. L’aumento di prestazioni è particolarmente evidente nei casi in cui c’è maggior bisogno di ottimizzazione: budget scarso, molti ad group, pochi click o bassi tassi di conversione, mercati in rapida evoluzione o orizzonti temporali stretti. Poiché l’approccio proposto necessita di tecniche Markov chain Monte Carlo, è stata studiata e testata anche un’alternativa approssimata che necessita di meno risorse computazionali: il Bootstrapped Thompson sampling. Un altro limite dello stato dell’arte è che vengono ignorati i ritardi tra il click su un annuncio e segnali più forti di interesse, come contatti o acquisti. Per affrontare questo problema, è stata condotta e riportata una approfondita review della letteratura sui banditi con feedback ritardato. È emerso uno scenario che, benché più semplice rispetto al problema dell’ottimizzazione di puntate e budget, ne condivide alcuni tratti caratteristici: uno spazio continuo per le azioni e un feedback osservabile solo parzialmente. Quest’ultimo aspetto è dovuto al fatto che la decisione di acquistare costituisce un segnale chiaro, mentre nessun segnale viene inviato in caso contrario. In questo scenario, lo stato dell’arte presenta margini di miglioramento: ignora l’entità dei ritardi; sfrutta un’euristica che si è mostrata fragile sia rispetto ai ritardi che rispetto a imprecisioni nel modello statistico; questa euristica dipende da un iperparametro, la cui scelta richiede una certa conoscenza della distribuzione dei ritardi; richiede calcoli che è difficile generalizzare a casi più complessi. Per queste ragioni, è stato sviluppato un nuovo algoritmo estendendo il Bootstrapped Thompson sampling, usando un modello di Expectation Maximisation in luogo di uno stimatore di massima verosimiglianza; questo algoritmo non pone requisiti sulla distribuzione dei ritardi. L’approccio proposto è stato confrontato con lo stato dell’arte su una serie di distribuzioni e su dati reali. Gli esperimenti mostrano un significativo miglioramento nelle prestazioni nella grande maggioranza dei casi e un comportamento comparabile nei casi rimanenti.

(2024). Thompson sampling for Performance Marketing and delayed conversions. (Tesi di dottorato, Università degli Studi di Milano-Bicocca, 2024).