Climate Change: Behavioral Responses from Extreme Events and Delayed Damages

Understanding how to sustain cooperation in the climate change global dilemma is crucial to mitigate its harmful consequences. Damages from climate change typically occurs after long delays and can take the form of more frequent realizations of extreme and random events. These features generate a decoupling between emissions and their damages, which we study through a laboratory experiment. We find that some decision-makers respond to global emissions, as expected, while others respond to realized damages also when emissions are observable. On balance, the presence of delayed/stochastic consequences did not impair cooperation. However, we observed a worrisome increasing trend of emissions when damages hit with delay.


Introduction
Although scientists have convincingly established a causal link between greenhouse gas emissions and global climate change (IPCC, 2014), the way in which citizens perceive the issue may simply be through the experience of damages. News headlines are generally on the consequences of extreme events such as record temperatures, hurricanes or flooding that are outcomes of pollution and affect specific geographical areas. Another peculiar feature of climate change is the lag built into the earth system between the polluting actions and the system's reaction in terms of climate-related human impacts. Both these features imply a decoupling between polluting actions and their consequences. An usually unspoken argument among politicians and climate change experts is that it will likely take one or more major disasters to motivate citizens and nations to jump start mitigation efforts. Suffering environmental stress may be what can trigger citizens into action to stop climate change more than national plans contemplating changes in emissions. This conjecture motivates our behavioral study.
We focus on the ability to reach ambitious mitigation policies through voluntaristic actions when no binding treaty is in place, such as for example with the scheduling of periodic encounters after the Paris Agreement (Tollefson, 2016). More precisely, we design a climate change game as a N -person voluntary public bad game where decision-makers repeatedly interact under a long-run horizon (Dutta and Radner, 2004;Calzolari et al., 2016). Each decision-maker decides on a level of emissions, which brings individual benefits from production and consumption but generates a negative externality to everyone in terms of climate damages. Cooperation entails limiting the level of emissions. Through a laboratory experiment we vary how damages occur across treatments and study its influence on the ability to cooperate. The damage function is one of the fundamental elements for evaluating alternative policies to cope with climate change (Nordhaus, 2010) and has been the focus of a recent debate calling for a need to rethink the way damage functions are designed within Integrated Assessments Models (Wagner and Weitzman, 2015;Stern, 2015). Here we target two critical dimensions of damage functions -the random and delayed relation between polluting actions and their consequences -because they could both affect the behavioral ability of decision-makers to cooperate. All our specifications of damage vary its riskiness or timing but keep constant its overall level in terms of expected present value. We do so to make easier the empirical comparison across treatments. In a Stochastic treatment the damage takes the form of a random accident, whose probability increases in the level of global emissions. This treatment models the consequences of emissions in terms of extreme events, like flooding, droughts, or hurricanes. The aim is not to capture a global catastrophe but instead low probability-high impact events that hit a country. We contrast this setting with a Control treatment where the damage 2 from climate change occurs deterministically in proportion of global emissions. In a Delay treatment the damage is deterministic but hits decision-makers with a delay of two rounds -unlike the other two treatments where current damages depend on current emissions.
While some aspects of the field nicely map into our experiment, we made three major simplifications in order to facilitate participants' understanding of the task and to ease the empirical identification of the effects of the different treatments. First, we model climate damages as a flow externality that linearly increases in emissions, although a more accurate function would be a stock externality with possible non-linearities between emissions and damages (Burke et al., 2015;Dannenberg et al., 2015). A previous experiment showed a negative empirical effect of pollution persistence on the empirical levels of aggregate cooperation (Calzolari et al., 2016). 1 Second, we consider a limited number of players. Third, we include the deep income inequalities that exist in the field (Nordhaus, 2010;Tavoni et al., 2011) by having two types of participants, rich and poor, who simply differ in their private benefits from emissions.
In all our treatments, monitoring is perfect. After each round of play decision-makers can observe individual emission choices and damages of everyone else. These are propitious circumstances for cooperation to emerge. Under a long-run horizon -like the one considered here -the mitigation of damages may in fact realize under the threat of a punishment activated with the observation of an unexpected increase in others' emissions (the folk theorem, e.g. Fudenberg and Maskin, 1986). Such theoretical result would assume that all individuals follow strategies based on the observation of actions, i.e. emissions.
However, individuals may in practice adopt strategies that react to experienced damages rather than actions. The reason may be behavioral, either related to salience or the cognitive costs to process information. On the one hand, damages directly influence payoffs and thus could be more salient to the decision-maker. On the other hand, even when observable, actions have to be interpreted in terms of motivating intentions, particularly when decision-makers form heterogeneous beliefs.
To sum up, greenhouse gas emissions generate delayed, random damages and hence actions (emissions) can be decoupled from their consequences (damages). What motivates this study is the possibility that some decision-makers rely more on experienced damages than actions, which calls for an empirical analyses of how different damage specifications could produce different outcomes in terms of mitigation.
The major result of our experiment concerns the strategies employed by participants in sustaining a cooperative mitigation. We show that participants react both to emis-sions and damages. In particular, some participants react to the emissions of others, as suggested by a canonical trigger strategy. Other participants, instead, react only to the extreme events or to the realized damages. A third group of participants respond to both emissions of others and individual damages. In Section 7 we conjecture on how the presence of these different types of individuals can relate to the differences in the overall cooperation levels we detect, in particular the withstanding levels of cooperation with stochastic and delayed damages and the increasing trend of emissions in the latter treatment.
The paper proceeds as follow. Section 2 places the contribution within the context of the literature about experiments on climate change and lung-run cooperation. Section 3 presents the formal setup and experimental design. Section 4 puts forward some theoretical considerations about equilibrium predictions. Section 5 explains how the experiment was run. Section 6 describes the main results about aggregate emissions and strategies, while Section 7 discusses the results, some policy implications and concludes.

Related Literature
We contribute to two branches of the literature, one on climate change and another about sustaining long-run cooperation.
There exists a small but growing experimental literature on mitigation policies for climate change. 2 Some experiments model climate change as a problem of sustaining cooperation when facing an emission thresholds that may activate a catastrophe, while others, including the present one, model it with an incremental damage from pollution.
Among the former category, the pioneering study is Milinski et al. (2008), who show that a higher probability of a catastrophe reduces emissions in the presence of a known tipping point. This result becomes weaker if the location of the tipping point is random, and more so in case of ambiguity Dannenberg, 2012, 2014;Dannenberg et al., 2015).
Income inequality and the ability to communicate also affect the frequency of avoiding a catastrophe: Tavoni et al. (2011) show that success is more likely in groups making choices that reduce inequality and able to communicate.
The experiments with a gradual impact of pollution on damages are relatively more recent. Sherstyuk et al. (2016) compares overlapping generations versus long-lived agents and reports that cooperation is harder to sustain for overlapping generations; Pevnitskaya and Ryvkin (2013) contrasts finite and indefinite horizons and find that participants learn to cooperate faster in the former setting, although they experience a last round drop; finally, Calzolari et al. (2016) study pollution persistence in a dynamic setting and show that it does not hamper cooperation per se but report a declining trend of cooperation for higher stocks of pollution. The novelty in our experimental design is to decouple actions and their consequences on damages, which in most studies are instead associated and indistinguishable. Our aim is to uncover the behavioral responses in a setting that replicates these key features present in the field.
The contribution of our paper to the vast literature about sustaining cooperation in repeated games rests on the distinction and observability of actions (emissions) and their consequences (damages). When the "shadow of the future looms sufficiently large", cooperative outcomes can be obtained, possibly also the socially optimal outcome, with strategies punishing actions that deviate from a cooperative norm (Friedman, 1971;Dal Bó and Fréchette, 2017). Beginning with Green and Porter (1984), Abreu et al. (1990), Fudenberg et al. (1994), and Dutta (1995), the standard folk theorem has been extended to the case in which decision-makers do not perfectly observe others' actions, either because actions are observed with delay, as in our Delay treatment, or because observability only refers to an imperfect signal, such as the accident realization in our Stochastic treatment.
Applying these results, we experimentally show that although the temptation to deviate from cooperation is generally stronger for strategies based on damages than emissions, cooperation could still be sustained when participants value sufficiently the payoffs from future interactions. Some experimental papers on cooperation are related to our study. Bereby-Meyer and Roth (2006) study a repeated game with observable actions where outcomes can be either deterministic or probabilistic, depending on treatments. Relying on the psychological concept of "reinforcement" (Robbins, 1971), they report how a deterministic environment, granting a systematic reinforcement in the learning process, fosters cooperation as compared with the partial reinforcement available with random outcomes. Fudenberg et al. (2012) study the effects on cooperation of errors in implementing intended actions. They show considerable diversity in strategies, as we document in our analysis, and that successful strategies are "lenient" and "forgiving": unexpected actions are not immediately punished, with attempts to restore cooperation. Camera and Casari (2009) manipulate monitoring of individual histories and aggregate information on past cooperation that selectively add and remove the possibility to retaliate or adopt various punishment strategies. Finally, Nicklisch et al. (2016) experimentally find that when participants can jointly reduce the probability of a common stochastic damage, cooperation is enhanced. We confirm and extend this result to the extent that our stochastic dam-5 ages are individual and participants have the possibility to observe emissions as well. In both cases participants appear to assess others' behavior with an ex post perspective, i.e. considering also the realization of outcomes.

Experimental Design
We model climate change as a repeated social dilemma under three treatments -Control, Delay, and Stochastic -that vary the form taken by damages from the pollution externality. In a group of N = 4 decision-makers, everyone simultaneously takes a decision in every round t = 1, 2, . . . over how much to emit, e i = (1, 2, . . . 18). Individual payoffs are the difference between a benefit and a damage function: where E = N j=1 e j is the global emissions. The benefit of an extra unit of emissions is private as it falls entirely on the decisionmaker, while only 1/N of the damage does. Hence, emissions generate a negative externality on others in the group. There are four modifications with respect to the usual public good experiment, which make our framework similar to the model of Dutta and Radner (2004) as for payoffs. 3 First, the game is framed as a public bad where the public project is fully provided by default and every unit of emission corresponds to moving contributions away from the group account into the private account. Second, the theoretical benchmarks of the one-shot Nash equilibrium and the socially optimal emission are not on the boundary of the action space, which is a desirable feature of an experimental design (Laury and Holt, 2008). Our benefit function is non-linear in emissions, as additional units have a lower return, while the damage function is linear. As we will see, this generates an interior Nash at 12, which is far from the upper bound of 18 and allows for anti-social behavior. Moreover, the socially optimal level of emission is at 3. Third, to mimic GDP inequality in the world arena, we introduced payoff heterogeneity within the group, with rich decision-makers enjoying a higher return from the private account (i.e. the benefit function) than poor ones while suffering identical levels of damages. More precisely, the benefit function is, for a level of emission e(t) at time t: The parameter a i is set at 40.05 for half of the group members (rich) and 8.01 for the others (poor). This asymmetry in a i could capture technological differences in carbon intensity leading decision-makers to achieve different benefits for the same level of emission. 4 Fourth, we implement a long-run horizon to capture the long life of state entities and of the climate change problem. In the lab, the interaction is indefinite and is implemented through a random stopping rule. After every round there is a random draw: an additional round is played with probability δ = 0.92 and the sequence stops with probability (1 − δ).
As a consequence, the length of a sequence is variable and nobody knows when the last round will take place. The "shadow of the future" remains the same as the rounds proceed because the continuation probability δ is constant and common knowledge. Such probability can be interpreted as the discount factor of a risk-neutral decision-maker who lives forever.
While the benefit function is identical in all treatments, the damage function is treatment-specific (Table 1). In the Control treatment damages from global emissions are deterministic and hit immediately in the same round of emissions according to the following damage function: where the parameter c1 = 33.375 determines the magnitude of the damage for each unit of emissions. Damages are proportional to emissions to keep the design simple. 5 In the Delay treatment, the damages are also deterministic but hit with a delay of two rounds. As a consequence, there will be no damages in the first two rounds: 4 Both types of decision-makers have the same emission capacity. To ensure rich and poor decisionmakers have the same social optimum and stage-game Nash equilibrium (see Section 4) and ease empirical comparisons, the gap between rich and poor decision-makers is modeled as a gap in private benefits (Equation 2). While this is a strong simplification, the experiment roughly reflects stylized facts from IPCC (2014) and the RICE model (Nordhaus, 2010). Rich decision-makers mirror high income countries with a per capita GNI above $12,745 (World Bank threshold in 2010), whose GHG emissions amounted to 18.7Gt in 2010 (IPCC, 2014). Instead, poor decision-makers approximately resemble countries with a per capita GNI lower than $12,745: upper-middle income countries' emissions were quite close to high income countries' emissions (18.3Gt), emissions from low and lower-middle income countries were instead lower (11.3Gt). When focusing on the regions of the RICE model, rich regions have an average GNI per capita 4.8 times higher than poor regions. Poor regions are Africa, China, Eurasia, India, and Other Asia (N = 5, average GNI per capita=$7,125.9); rich regions are EU, Japan, Latin America, Middle East, Russia, USA, and Other High Income (N = 7, average GNI per capita=$34,085).
5 As already mentioned, in the field, damages are likely to be a convex function of temperatures (Burke et al., 2015) but in theoretical models others have also employed a linear approximation (Dutta and Radner, 2004). Moreover, pollution persistence is not included in the model to simplify the design.

7
The damage parameter c2 = 39.432 is set taking as reference the value in the Control treatment, c2 = δ 2 c1, so to keep the same present value for the damage generated by one unit of emissions. 6 Finally, in the Stochastic treatment, damages hit immediately but at random: a fixed accident of magnitude K = 830 may hit one or more decision-makers with a probability which linearly increases at a constant rate of about 1 percentage point for every unit of global emissions (α= 0.01005): The accident's probability ranges from a minimum of 0.0402 if everyone emits 1 through 0.7236 if everyone emits 18. 7 By design there is no way to reduce accident's risk to zero and, no matter how high emissions are, the accident always remains uncertain. All group members share an identical risk of suffering an accident as the probability depends on the global rather than individual emissions. However, there are independent draws for each decision-makers to determine if an accident occurs. Hence, the damage level will be identical across group members only in event of zero or N accidents and will differ in all other random events. In expectation, the marginal damage from a unit of emissions is similar to the Control treatment, α × N × K = c1. 8 There are many alternative ways to incorporate the randomness of climate change into the design. Through the Stochastic treatment we aim to model extreme events rather than global catastrophes. While a global catastrophe causes similarly losses to all players, extreme events such as hurricanes tend to hit areas asymmetrically. This original feature sets this study apart from the previous climate change experiments. Furthermore, it facilitates a cleaner empirical identification of individual-level effects: a common shock to all participants would limit the variation of impacts and hence restrict the possibility to identify individual strategies, which is a main goal.
In the Stochastic treatment, before the climate game, we elicited the risk preferences of all participants following the design of Karle et al. (2015). In particular, participants were administrated two tasks, one in the gain domain and the other in the loss domain. In the former task, participants had to make six binary choices. Each decision was between a 50-50 lottery yielding either 0 or 3e, and a certain amount (0.3, 0.6, 0.9, 1.2, 1.5, or 1.8e). The latter task was similar except for that the certain amount was always 0e and the lottery either paid 3e or involved a loss (-0.3, -0.6, -0.9, -1.5, -2.1, or -3e). One of these twelve decisions was randomly drawn at the end of the session, and participants were paid accordingly. Participants did not receive any feedback on the lotteries outcomes until the end of the session.
The present expected value of a decision-maker i's current and future payoffs is,

Theoretical Benchmarks
This Section provides the theoretical benchmarks which will be useful to evaluate and interpret the experimental results. We proceed in three steps. In step one we identify the socially optimal level of emissions and in step two we present the level of emissions in the one-shot equilibrium with decentralized choices. The contrast between the two levels of emissions highlights the social dilemma dimension of the climate game. In step three we characterize some relevant equilibria of the repeated game. According to the standard folk theorem (Friedman, 1971), when the shadow of the future looms sufficiently large and monitoring is perfect, decision-makers can adopt strategies that support cooperative outcomes, possibly also the socially optimal one. These strategies can take the form of grim triggers where decision-makers contemplate permanent punishment when they observe a deviation from a cooperative norm. The punishment is collective because in our setting it is impossible to target a single decision-maker. As usual, a multiplicity of equilibria arise, hence coordination is a relevant empirical issue. We assume that all decision-makers are risk neutral.
If decision-makers cooperate maximizing the unweighted sum of individual presentvalued payoffs, then they set a time-invariant socially optimal emission e * * = 3, where the marginal benefit from the individual emission, 100/e i , equals to the marginal damage caused on the whole group. In the Control treatment the marginal group's damage is N × c1 N . In the other two treatments -given our parametrization -the marginal group's damage is equal, in expectation, to the level in the Control treatment and hence the socially optimal emission is also at e * * = 3.
When decision-makers act independently there always exists an equilibrium in which the level of emissions in any round corresponds to the Nash equilibrium of the one-shot stage-game. Here, each decision-maker equates the marginal benefit from her individual emission to the individual marginal damage, which is a fraction 1 N of the group's damage. As in standard public goods games, this condition does not depend on others' emissions, hence the one-shot Nash equilibrium emission e * =12 is unique, and is equal to N ×e * * in all treatments.
We now turn to other levels of emission that can be supported in the repeated game, distinguishing between strategies based on the observation of others' emissions and strategies based on damages suffered. 9 "Observational Equilibria". Here we consider equilibria supported by strategies that are based on the past individual emissions of all N decision-makers as observed at the end of each round, which are the most common class of strategies in the Folk theorem literature. Observability allows decision-makers to use trigger strategies that contemplate a punishment upon observing levels of emission that are interpreted as deviations. One can easily prove the following.
Remark 1. In all treatments, if decision-makers are "observational", they can support in equilibrium any level of individual emission between e * * = 3 and e * = 12.
A proof of Remark 1 hinges on the canonical grim trigger strategies. Consider the Control treatment first. Suppose N − 1 decision-makers rely on a strategy that contemplates emitting a low level 3 ≤ê < 12 if this is what happened in the past rounds and, instead, a permanent reversion to the Nash equilibrium emission e * if they observe an individual emission different fromê. One can show that at any round t, a decision-maker prefers to keep low emissionsê instead of (the optimal deviation) e * . The present value payoffs of emittingê is, Alternatively, the payoff for emission e * is the sum of the current round payoff when everyone else emitsê and the future rounds payoffs when everyone else enters the punishment mode and also emits e * , The payoff Π i in Expression 7 is always larger than that in Expression 8 if the decisionmaker is sufficiently patient. With the parameter values set in the experiment, the condition that guarantees this preference if we want to support the socially optimal outcomê e = e * * is a discount factor δ above the critical threshold δ ≈ 0.3. Such condition is well satisfied in the experiment given that δ = 0.92. 10 The same reasoning applies to the Stochastic treatment because it is isomorph to the Control treatment. The proof for the Delay treatment is also similar, except that the cost of punishment hits the deviator only after three rounds. In fact, in the round of the deviation the other decision-makers are taken by surprise and start punishing the following round, with consequences accruing after two additional rounds. Although here the punishment is less effective because it hits with delay, this treatment admits the socially optimal outcome as an equilibrium for a discount factor higher than δ ≈ 0.7, which is larger than in the other treatments but still smaller than δ = 0.92.
"Experiential Equilibria". We now consider all decision-makers who follow strategies exclusively based on realized damages rather than global emissions. In the Control treatment, the distinction between observational and experiential equilibria is immaterial since there is no decoupling between actions and damages.
In the Stochastic treatment, the experiential strategy is based on the realized accidents.
We first consider the case of decision-makers keeping track of all realized accidents in their group, and then we briefly move to the case of decision-makers keeping track only of their own accidents. 11 Unlike with observational" decision-maker, "experiential" decision-makers can never be sure that a deviation has effectively occurred in the group. We define A(t) as the event in which at least one accident occurred in round t and P r(A|Ê) as its probability for a given level of global emissionsÊ =ê × N . A cooperative outcome can be sustained with a punishment mode that lasts for a finite number T of rounds (instead of being permanent). When no accidents have occurred, decision-makers emitê. Instead, when at least one accident has occurred in the group, they temporarily emit e * for the next T rounds, regardless of additional accidents; this is the so-called "quasi-punishment" 10 Emission levels larger than 12 should not occur in equilibrium because individually and collectively dominated by e * .
11 Recall that in our experiment others' accidents are observable, and so are individual emissions. If decision-makers keep track of all individual realized damages, the game is one with "imperfect public information" (Fudenberg and Tirole, 1992). If instead decision-makers disregard the realizations of others' accidents, then the Stochastic treatment becomes a complex game of "imperfect private monitoring" where the possibility to obtain cooperation via a Folk theorem argument is limited (Yamamoto, 2012). phase. 12 On the equilibrium path the expected payoff is, Here we construct the experiential equilibria as if monitoring was imperfect due to unobserved emission, although in the experiment they were observable. Under imperfect monitoring, decision-makers do not know for sure if the realization of an accident is the consequence of a deviation or not and this will trigger some high emissions e * also along the equilibrium path (the term in curly brackets). Moreover, a deviation may go "unnoticed" unless it triggers an accident, in which case it induces a continuation payoff that is the same as the one along the equilibrium path (the same curly brackets just described).
Notwithstanding this reduced incentive power of punishments, one can show that there exists a (decreasing) function T (ê), such that π i is larger than the payoff associated with a deviation if T ≥ T (ê). By keeping emissions low atê < e * , decision makers keep the probability to trigger the quasi-punishment low. The longer the punishment phase, the more efficient is the emission level that can be implemented (ê = e * for T ≥ 10).
Hence, the type of cooperation reached by experiential decision-makers contemplates higher emissions in some rounds triggered by the realization of stochastic accidents. Even if in equilibrium decision-makers are able to sustain a level of individual emissionê without accidents, they will switch to T rounds of quasi-punishment with higher emissions after the realization of any accident and end up with an average level of emissions well abovê e. This implies that, even in the most cooperative scenario, experiential decision makers will emit on average more than the socially optimal emission e * .
When decision-makers consider their own accidents only and disregard those of the others, what matters is the probability P r(A i |Ê) that decision-maker i experiences an accident. The enhanced difficulty in sustaining cooperation is that quasi-punishment rounds are here asynchronous because different decision-makers care for different and independent accidents. A quasi-punishment phase may thus trigger accidents to other decision-makers who, in turn, activate their own quasi-punishments propagating even higher emissions. Although an explicit derivation of an equilibrium would considerably complicate the analysis, one can see that the difficulty to jointly identify and react to deviations makes cooperation weaker although not impossible. What is relevant to us is that in any case decision-makers would individually react by increasing emissions when an individual accident has occurred (Yamamoto, 2012).
Finally, consider the Delay treatment. Here, experiential decision-makers can realize that a deviation occurred by simply inspecting the current damage. However, since this observed deviation refers to two preceding rounds, their punishment begins with a delay with respect to observational decision-makers. More precisely, a deviation is detected two rounds later so that in the second, third and forth rounds after deviation the other decision-makers' emission are still lower atê. Only from the fifth round onward the deviator is hit by the punishment and all decision-makers revert to the Nash stage-game emissions e * . The logic for the possibility to support emissions more cooperative than e * is the same as with no delayed damages, except for the "diluted" efficacy of the punishment.
Observational decision-makers can support the socially optimal outcome in the Delay treatment if sufficiently patient, with a threshold value for δ now being δ = 0.84.
We can now summarize the following theoretical results.
Remark 2. Experiential decision-makers react to damages and, although they are slower to react to deviations than observational decision-makers, they may still be able to cooperate reducing emissions. (i) In the Stochastic treatment, they increase emissions after realized individual or collective accidents. (ii) In the Delay treatment, they increase emissions reacting to damages of two-rounds previous emissions.

Experimental Procedures
We have run 9 sessions at the University of Bologna, with a total of 180 participants.
Procedures aimed at ensuring that all participants had a good level of understanding of the instructions. To this end, in every session we recruited 25 participants but only 20 were actually performing the main task: the selection was based on a quiz about the instruction. 13 There was a sequence of "dry runs" played against robots that were varying their emission level round after round. A session comprised three or four sequences of interaction with monetary incentives. 14 After every sequence, all participants were rematched with completely different people to play the next sequence (perfect stranger 13 The excluded participants had to do a side task with a flat payment of 0.50e per round plus a show-up fee of 5e. 14 Participants were recruited for up to three hours and a half. For long sessions (more than two hours and forty minutes), we informed participants that the current sequence of interaction was the last one and that the experiment would end within thirty minutes. In this case the exact termination moment was random, as we explained to the participants, with a random draw between 1 and 30. In one session of the Delay treatment, the session was terminated during the second sequence due to time constraints. Since long sessions were randomly and unexpectedly interrupted, this should have no impact on participants' behavior. We did not conduct any ex post debriefing to limit the duration of the session. 13 matching protocol).

Notes:
Data from sessions in the Control treatment have also been analyzed in a related paper (Calzolari et al., 2016, Immediate treatment).
Average emissions are computed as the mean of the individual emissions in a group in a sequence.

Results
We report six main results, some about aggregate outcomes (Results 1-2) and others about the strategies followed by participants (Results 3-6).

Aggregate Results
Result 1 (Aggregate cooperation). Delayed damages lower aggregate emissions and Stochastic damages do it to a marginal extent. Support for Result 1 comes from Figure 1 and Table 2. Figure 1 shows that the average emission is 7.9 in Delay, which is statistically significantly less than 9.4 in Control both according to a non-parametric test (Wilcoxon-Mann-Whitney test: p-value= 0.011, N C = 55, N D = 45) and OLS regressions ( Table 2, Table 2 is a group in a sequence and we control for sequences order and length. After checking for heterogeneous responses, we will discuss these observations in the concluding Section. Individual emissions can range from 1 through 18. The vertical segments represent the 95% confidence interval. The red-upper and the green-lower horizontal lines respectively indicate the Nash individual emission of the stage-game (e * = 12) and the socially optimal level of individual emissions (e * * = 3). Notes: Results from OLS regressions are reported. The unit of observation is a group in a sequence. Variables "Delay" and "Stochastic" are dummies respectively taking value 1 in the Delay and Stochastic treatments, and 0 in the Control treatment. The variable "Length of past sequence" counts the number of rounds in the previous sequence; in sequence 1 it is set to 12.5. * p < 0.1, ** p < 0.05, *** p < 0.01.
Result 2 (Time trends). With delayed damages, emissions exhibit a steadily increasing trend over the rounds. No clear trend emerges in Control and Stochastic treatments.
Support for Result 2 comes from Figure 2 and Table 3. Figure 2 illustrates the emissions trend within a sequence. The Delay treatment starts with emissions that are remarkably lower than in Control ( Figure 1 and  Figure 2 shows an upward tendency that is not statistically significant (Table 3,   16 To reconcile the apparent differences in average emissions reported in Figures 1 and 2 recall that the indefinite horizon naturally generates a declining number of observations (e.g. in our Control treatment, in round 23 there are five groups only). Therefore, observations in the last rounds "weight" much less than those in the first rounds when calculating overall average emissions. Notes: Results from OLS regressions are reported. The unit of observation is a participant's emission choice in a round. Standard errors are clustered at the level of a group in a sequence. The variable "Length of past sequence" counts the number of rounds in the previous sequence; in sequence 1 it is set to 12.5. The variable "Mistakes in the quiz" counts the number of mistakes that a participant made in the quiz on the instructions. The variable "Limited liability" is a dummy taking value 1 if the emission decision was made under limited liability, and 0 otherwise. The dummy "Risk averse in the gain domain" is equal to 1 if the participant chose the lottery against the certain positive amount less than three times. The dummy "Risk seeking in the gain domain" is equal to 1 if the participant chose the lottery against the certain positive amount more than three times. Dummies "Risk averse in the loss domain" and "Risk seeking in the loss domain" are similarly defined. All risk dummies neglect whether the participant violated single crossing. * p < 0.1, ** p < 0.05, *** p < 0.01.
Two noteworthy patterns emerge from the data about inequality and risk preferences.
On average, poor participants emit more than rich ones in every treatment (9.7 vs. 9.1 in Control, 8.1 vs. 7.6 in Delay, and 9.1 vs. 7.9 in Stochastic), but the difference in aggregate behavior is statistically significant only in the Delay and Stochastic treatments according to non-parametric tests (two-sided sign tests: Control: p-value= 0.892, N R = N P = 55, Delay: p-value= 0.073, N R = N P = 45, Stochastic: p-value= 0.036, N R = N P = 45). The evidence is more mixed when using OLS regressions (

Strategies of the Representative Participant
Our experimental design allows to go beyond the aggregate results about cooperation and to shed light on the type of strategies followed by participants in the repeated game. Here we study the strategies of the representative participants (Results 3-4) and in Section 6.3 we provide a simple classification of the individuals to further corroborate and specify the findings. The main theme of analysis is how a participant who may want to cooperate in reducing emissions reacted to a perceived defection. We begin with the study of observational strategies.
Result 3 (Observational strategies). In the Control treatment, the representative participant responds to a perceived defection with a temporary increase in emissions. Figure 3 and Table 4. Data from the Control treatment suggest that when the representative participant observed high emissions by others in the group, she switched from a cooperative to a punishment mode. As seen in Section 4, an appropriately defined trigger strategy can sustain a fully cooperative equilibrium in our setting. While previous experiments with two players and two moves have already Recall that, following a defection of some opponent, a trigger strategy involves a shift to a punishment mode with higher emissions for some number of subsequent rounds. For the Control treatment, the finding emerges from an OLS regression that explains individual emission choices using regressors that trace the strategy and a set of controls (Table 4, col. 1 and 2). Controls include dummies for round, sequence, participant, and limited liability, as well as the length of the past sequence. In the regression model, we assume that a defection occurs if the emissions of the other three group members are on average equal or above 12, but we have checked other levels (see Table A.12 in Appendix). This analysis sheds light on the type of strategies employed by the representative participant generalizes that of Camera and Casari (2009) to N players and a multi-level action space. Although the way to code regressors in order to trace strategies is subject to some discretion, the approach has the advantage to detect whether participants followed theoretically wellknown strategies, such as grim trigger or tit-for-tat.

Support for Result 3 comes from
The regressors that code the strategy aim to trace the response of the representative participant in the rounds that follow a perceived defection. We mostly focus to the four rounds after a defection by including four "Lag" regressors, which have a value of 1 only in one round following a defection and 0 otherwise. For example, the "Lag 1" regressor takes value 1 only in the round after the defection (0 otherwise). The "Lag 2" regressor takes value 1 only in the second round following a defection (0 otherwise). Similarly for the "Lag 3" and "Lag 4" regressors. However, we also consider a "grim trigger" regressor labeled "Any previous round", which has a value of 1 in all rounds following a defection and 0 otherwise. The pulse pattern of response to an observed defection suggests a temporary downward shift in cooperation levels immediately after a defection. The lag 1 regressor is significantly different from zero, while the estimated coefficients of all other strategy regressors, including the grim trigger one, are not significantly different from zero (Table 4).
A pattern along the lines of Result 3 emerges also from the analyses of observational 18 If at least one of the five strategy regressors estimated in Table 4 has a positive coefficient, then this could be the consequence of a representative participant switching from a cooperative to a punishment mode. We can illustrate this by the following example: a representative participant who punishes for exactly three rounds following a perceived defection generates estimated positive coefficients for the Lag 1, Lag 2, and Lag 3 regressors. 20 strategies in the other treatments. The finding comes from a similar estimation procedure carried out for the Delay and Stochastic treatments using the same emission threshold of

19
In the Delay treatment, the representative participant immediately increases emissions after an observed defection in a statistically significant way (Lag 1, Table 5, col. 1); also in the Stochastic treatment there is a statistically significant immediate response (Table   6, col. 1). The main differences between Control and the other treatments seem to be (i) the presence of a more permanent punishment to a defection, as estimated by the "Any previous round" regressor (Tables 5 and 6, col. 1); (ii) a moderated pulse response to defections in the Delay and Stochastic treatments as compared with the Control treatment. Figure 3 illustrates these pulse pattern of responses to an observed defection through the solid lines in panels (b) and (c) labeled "Observational".    Notes: Own emission change comes from regressions coefficients summing up the estimated coefficient for "Any previous round" and "Lag X". For Control, see Table 4 col. 2; for Delay, see Table 5 col. 3; for Stochastic, see Table 6 col. 3.
Result 4 (Experiential strategies). The strategy of the representative participant responds both to the observed actions as well as to the experienced damage. Figure 3 and Tables 5-6. These findings emerge from the Stochastic and Delay treatments, where one can possibly decouple these observational and experiential strategies. Let's begin with the evidence from the Stochastic treatment,

Support for Result 4 comes from
where the empirical distinction between the two classes of strategies is more intuitive. We exploit the presence of random accidents, which determined a large shock on the current earnings, and tracked the reaction to them in terms of emissions of the representative participant. The empirical frequency of accidents in a group was as follow: in 2% of cases everyone in the group experienced an accident and in 21% of cases nobody experienced an accident. The mode was of one accident in the group in the round (38%). The data suggests that the representative participant increased emissions immediately after experiencing an accident (Table 6, col. 2). The reaction was statistically significant but temporary, i.e. limited to Lag 1. As already mentioned, the estimate of observational strategies shows a strong immediate reaction (Lag 1) and a smaller but permanent effect (Any previous round, Table 6, col. 1). We also performed a joint estimate of observational and experiential strategies and the patterns do not change substantially, with coefficients slightly smaller in magnitude (Table 6, col. 3). The two classes of strategies are illustrated with the two (solid and dashed) lines in Figure 3 panel (c). Hence, the representative participant is responding with higher emissions both to others' actions when higher than a threshold and also to personal payoffs shocks.
A similar pattern emerges from the Delay treatment. Disentangling experiential and observational strategies is statistically more difficult in this design. We limit our focus to just the two rounds following a defection, plus a grim trigger regressor, to minimize the chances of confounding the reaction to actions or to damages (Table 5). In an experiential strategy, a defection occurs if the experienced damage in a round was the outcome of a (previous) average emission in the group above 12, but a robustness check has been performed for other threshold levels (available upon request). Notice that both specifications of experiential strategy in Delay and Stochastic treatments measure an impact on payoffs that is the consequence of both own and others emission choices.
The data suggests that the representative participant increased emissions after experiencing high damages (Table 5, col. 2). The coefficient of the Lag 1 regressor for the damage is statistically significant and suggests an immediate reaction, with a smaller coefficient for the Lag 2 regressor and an insignificant coefficient for the grim trigger regressor. As already mentioned, the stand-alone estimate of observational strategies using a threshold of 12 yielded statistically significant and positive coefficients for all regressors (Table 5, col. 1). Also in the Delay treatment the reaction to an observed defection seems more permanent than in the Control treatment. When both experiential and observational strategies are jointly estimated, the patterns do not substantially change (Table 5, col. 3). The two classes of strategies are illustrated in Figure 3 (solid and dashed lines) panel (b). Hence, in the Delay treatment the representative participant is responding with higher emissions both to others' actions and also to damage higher than a threshold.  Table 4, col. 4). However, when jointly estimating an observational strategy of a trigger type together with the reaction to losses, the net effects are drastically different (Table 4, col 5): the sign of the statistically significant coefficients become negative (Lag 3 and Any previous round), and remain so also when summed up with the coefficient of "Any previous round" with the various lags. When taken together, these two regressions support Result 5 and show that without controlling for the use of a trigger strategy we would have drawn the wrong conclusions about the behavioral effects of a loss.
The reason is that the more canonical response due to a punishment for high emissions of others quantitatively dominates the behavioral response to losses, at least if we focus on the round immediately following the event.
In the Delay treatment the findings are analogous. An estimate that tracks the reaction to the experience of negative round-payoffs shows a permanent increase in emissions (positive coefficient for Any previous round in Table 5, col. 4). However, when jointly estimating an observational strategy of a trigger type together with the reaction to losses, the net effects are drastically different (Table 5, col. 5): we observe a statistically significant negative coefficient for the Lag 1 regressor, which remains negative also when summed up with the coefficient of "Any previous round". Also in this treatment, these two regressions support Result 5.

Strategies at the Individual Level
The empirical evidence on strategies from Section 6.2 is compatible with everyone responding to both emissions and damages, and to the presence of two separate types of decision-makers, those who respond exclusively to emissions, and those that respond exclusively to damages. The theoretical and empirical implications of these two scenarios are rather different, which is why we also carried out a classification of individuals.
Our theoretical considerations in Section 4 reflect a scenario with homogeneous decisionmakers in terms of strategy adoption. The presence of heterogeneity in behavior may require a significant period of learning to envisage other decision-makers' strategies and to build cooperation. One could expect that during this learning process in the Delay and the Stochastic treatments, where experience and observation may be decoupled for some decision-makers, initial emissions are kept cautiously low. At the same time, the learning process may not converge fast enough and the coexistence of experiential and observational decision-makers in the same group may induce spiraling emissions.
We now explain how we classified the participants. The algorithm we used aims at identifying strategies of a "trigger" type where an individual deterministically transitions from a cooperative mode to a punishment mode in the round following an event that is considered a defection. The definition of defection depends on the class of strategy, either experiential or observational, and is associated to a given threshold. The algorithm defines as defection either an observed average action of others above a threshold or the experience of damage, which takes the form of a random accident in the Stochastic treatment, or of a damage level beyond a threshold in the Delay treatment. We check whether each individual's behavior is compatible with an observational trigger strategy and / or an experiential trigger strategy. The unit of observation is a participant in a sequence.
In the Control treatment we cannot distinguish between observational and experiential strategies because there is no decoupling between emission actions and damages.
Nonetheless, when basing our counting on observational strategies about 36% (79) of the individuals can be classified. An individual belongs to the observational strategy category if her emission in the round immediately following a defection is strictly higher than in the previous round, when taking an average over all instances of defections in a sequence.
Moreover, in the earliest instance of defection, the individual must have increased emis-sions in the following round. This definition applies to all treatments. A defection occurs if the average emission by the other three members of the group is above a given threshold.
This threshold is individual-specific and is identified looking at the participant's behavior when making the largest emission increment over a single round, e(t)−e(t−1). To be considered belonging to the observational trigger strategy, this emission increment must be in response to an emission increment of the other three group's members in the previous round (strictly positive on average). This category is meant to capture individuals following grim trigger or T -round punishment strategies, although the conditions are neither necessary nor sufficient. Some individuals could also follow strategies other than trigger.
The algorithm always places an individual in a sequence of just one or two rounds in the "unclear" category because the data are too sparse. Individuals with constant emissions over time, or emissions that monotonically decline also belong to the "unclear" category.
Similarly, an individual belongs to the category of experiential trigger strategy if her emission in the round immediately following a defection is strictly higher than in the previous round when taking an average over all instances of defections in a sequence. Here, however, the definition of defection is tied to the personal level of damage. In the Delay treatment, the definition of a defection event follows an analogous rule as in observational strategies but using damages. The threshold that an individual employs to define a defection could be different between experiential and observational strategies. 20 Again, we identified the threshold of the experiential strategy by looking at the participant's behavior when making the largest emission increment over a single round, e(t) − e(t − 1).
To be classified as following an experiential strategy, the individual must have performed this jump in emission in response to a strictly positive damage increment over the previous round. In the Stochastic treatment, a defection event occurs every time the individual experiences an accident. 21 The outcomes of this classification algorithm are illustrated in Figure 4 and discussed below.
Result 6 (Heterogeneous strategies). In the Control and Delay treatments, some participants react exclusively to others' actions, some exclusively to changes in payoffs and a third group reacts to both payoffs and actions.
Among those participants who use trigger strategies, some exclusively respond to actions, others exclusively respond to damages, and another set responds to both actions and damages. These three sets of participants are roughly similar in size. Figure 4 illustrates that 44%-46% of the classified individuals fall into the observational strategy category in the Delay (N = 42) and the Stochastic (N = 43) treatments, respectively.
19%-29% into the experiential strategy (N = 18 and N = 27, respectively). About 38%-25% of classified individuals belong to both categories. We test whether there are differences in strategy adoption between the Delay and Stochastic treatments using a Probit regression and report no significant effects (Table   7). Although the games and the classification algorithm are in part treatment-specific, we find similar shares of participants who can be classified as observational or as experiential (p-values of Stochastic dummy are p = 0.503 and p = 0.123, respectively).
No systematic difference between rich and poor emerges in the type of strategy adopted.
Instead, the lower is the level of rule understanding about the experiment (Mistakes in the quiz), the more likely it is that the participant an experiential strategy. Such regularity does not appear for the adoption of observational strategies, but instead there is a positive and significant effect of the length of the current sequence. To evaluate this evidence one must adjust for the inclusion in Table 7 of all the unclassified individuals. When removing them, we find the higher is the level of rule understanding, the more likely it is that a participant follows an observational strategy (Probit regression, p-value= 0.073, N = 189). Moreover, the coefficient of the length of the current sequence looses significance: its effect in Table 8 most likely originates from the fact that participants in longer sequences are easier to classify.  We conclude our analyses by studying how within-group heterogeneity in strategies affects the group's cooperation. Table 8 reports estimates from an OLS regression where the dependent variable is the average group emission in a sequence and the main covariates are the number of experiential and observational participant in the group. The regression also controls for sequences length and order. We find that a bigger number of experiential participants is associated with a significantly higher average group emission. Instead, the number of observational participants does not affect the average group emission in a statically significant way. These patterns emerge both in the Delay and Stochastic treatments.

Discussion and Concluding Remarks
We show that two typical features of climate change, namely the delayed and uncertain damages originating from greenhouse gas emissions have relevant and unexpected conse- quences on mitigation behavior. These features, which are reproduced in our experiment, remove the tight link between the emission action and consequences in terms of damages.
Such link, which characterizes most studies on cooperation, facilitates learning about others' preferences, rationality level, and strategies, as well as about the rules of interaction (Bereby-Meyer and Roth, 2006).
With delayed or uncertain damages, coordinating on a mitigation policy may be harder because some decision-makers condition their actions on emissions while others on actual damages. Sustaining cooperation on climate issues without a binding international treaty requires informal punishments upon deviations from an agreement. From a policy point of view it is thus interesting whether decision-makers decide to react to emissions or to damages. In fact, most theories of long-run cooperation assume homogeneous decisionmakers who focus on actions. When instead decision-makers differ in their attitude to reactions, it is important to know if cooperation is at risk.
Here we design and carry out a laboratory experiment to study the ability of participants to cooperate under different damage functions and emissions-damages decoupling.
Clean evidence on these issues is hard to gather with observational data. We report two major sets of results. First, a sizable share of participants is of an experiential type, in the sense of conditioning their emission actions on the level of damages that they individually experience, despite having the possibility to observe also the emission choices of others. Experiential types simply employ personal payoffs as a rule of thumb for decisions.
This behavioral finding is novel and contrasts with customary assumptions in theoretical models of cooperation, where everyone does benefit from taking into account the most accurate and timely information available (observational type). A variety of reasons can explain why a decision-maker is experiential or observational. Our evidence suggests that the level of understanding of the situation is systematically worse for experiential than observational types. Experiential decision-makers may find it very costly to keep track and interpret all the information available and thus may fail to appreciate the exact causal connections or the strategies of the others.
Typically, groups are heterogeneous, with members of different types. That would also be very likely when dealing with many countries with widely different political regimes and institutions at the international level. In the experiment, those groups with more experiential types show significantly higher levels of emissions, a result that may originate from miscoordination. Consider the interaction between one experiential and one observational decision maker. A stochastic accident would cause the experiential player to increase emissions. The observational player would interpret it as a unilateral deviation, which triggers a punishment. This can ignite a spiral of emissions that unravels cooperation. Similarly, with delayed damages, an observational type may underestimate the reaction of others to a deviation from a cooperation agreement because the experiential type acts later in response to damages. These reasoning may provide an explanation for the increasing time trend of emissions shown in the Delay treatment, the consequence of unfortunate realizations of spiraling reactions. The Stochastic treatment, instead, does not exhibit a time trend. In this treatment, the risk of spiraling reactions is in fact lower because the reaction of experiential types to a deviation by an observational type is smooth at the group level due to the asymmetries of the individual accidents. We thus expect that a situation with global common shocks would be more prone to spiraling reactions.
As a second set of results, we find that, overall, the presence of delayed and stochastic damages did not impair cooperation among decision-makers at an aggregate level. Delay and stochastic damages not only induce some decision makers to act on personally experienced damages. They are also relevant for other dimensions, with some of them improving the level of cooperation. In particular, risk and loss aversion could lead to lower emission with random or delayed damages (the latter because of the uncertain termination date). In light of the higher emissions in groups with several experiential decision-makers, the aggregate emissions containment under stochastic and delayed damages may be the consequence of lower emissions in groups with most observational decision-makers.
With the usual caveats on the limits of external validity of experimental investigations, we believe these findings are relevant for policy design because they call for a careful consideration of how international cooperation emerges and is enforced. In particular, the different attitude towards realized damages of national decision-makers (and their public opinion) may prove a key factor for cooperation. For example, considering different pollutants that display their negative consequences at different times after emission, policy makers may be less concerned by the pollutant with more delayed damages when observing lower emissions as we did in the first rounds. This may well turn out to be a missed opportunity, if not a mistake, when many decision-makers rely on observed damages reacting with higher and higher emissions, as we also observed. Notes: The unit of observation is a group in a sequence. We consider the average emission over all rounds.    (1)

A.1 Regressions without Choices under Limited Liability
(3) Notes: Results from Tobit regressions are reported. The unit of observation is a participant emission decision in a round. Decisions in round 1 are dropped.
Observations are censored at 1 and 18. Standard errors are clustered at group level. The threshold refers to a given average emission level. * p < 0.1, ** p < 0.05, *** p < 0.01. (1) (3) Notes: Results from Tobit regressions are reported. The unit of observation is a participant emission decision in a round. Decisions in round 1 are dropped.
Observations are censored at 1 and 18. Standard errors are clustered at group level. The threshold refers to a given average emission level. * p < 0.1, ** p < 0.05, *** p < 0.01. (1) (3) Notes: Results from Tobit regressions are reported. The unit of observation is a participant emission decision in a round. Decisions in round 1 are dropped.
Observations are censored at 1 and 18. Standard errors are clustered at group level. The threshold refers to a given average emission level. * p < 0.1, ** p < 0.05, *** p < 0.01.