This study focuses on assessing inter-observer reliability (IOR) between two observers in the case of trichotomous and four-level animal-based welfare indicators assessed at individual level. The Body Condition Score (BCS) and Knee calluses (KNC) were chosen as trichotomous indicators; data were collected in fourteen intensively managed dairy goat farms in Italy (ITF1 to ITF7) and Portugal (PTF1 to PTF7) and in extensively managed dairy goat farms exploiting three alpine pastures (AP1, AP2 and AP3) in Italy. The Ear posture (EP) and Eye white (EW) were chosen as four-level indicators; data were collected in three intensively managed dairy cattle farms (F1, F2 and F3) in Italy. The performance of the most documented agreement indices was compared. In the case of trichotomous indicators, Scott’s π, Cohen’s K, Cohen’s KC, Cohen’s weighted K and Krippendorff’s α were affected by the paradox effect: when the concordance rate (P0) was high, they sometimes gave very low or even negative values (e.g. P0(BCS-ITF3) = 74%; Scott’s π = 0.05; Cohen’s K = 0.09; Krippendorff’s α = 0.06; P0(BCS-AP3) = 74%; Scott’s π = −0.12; Cohen’s K = Krippendorff’s α = −0.11). Bangdiwala’s B, Gwet’s γ(AC1) and Quatto’s weighted S were not affected by this phenomenon and provided values very close to P0 (e.g. P0(KNC-PTF1) = 88%; Bangdiwala’s B = Gwet’s γ(AC1) = 0.85; P0(BCS-AP1) = 82%; Bangdiwala’s B = Gwet’s γ(AC1) = 0.79). In the case of four-level indicators, Cohen’s K and Krippendorff’s α were not affected by the paradox behaviour. However, Cohen’s KC in some cases exceeded the observed P0 (e.g. P0(EP-F3) = 78%; Cohen’s KC = 1). Gwet’s γ(AC1) showed the best results for four-level indicators (e.g. P0(EP-F1) = 88%; Gwet’s γ(AC1) = 0.86), followed by Quatto’s S and Holley and Guilford’s G (e.g. P0(EP-F1) = 88%; Quatto’s S = Holley and Guilford’s G = 0.84). To evaluate IOR between two observers, Bangdiwala’s B, Gwet’s γ(AC1) and Quatto’s weighted S are suggested for trichotomous indicators, while Gwet’s γ(AC1), Quatto’s S and Holley and Guilford’s G are suggested for four-level indicators.

Torsiello, B., Giammarino, M., Quatto, P., Battini, M., Mattiello, S., Battaglini, L., et al. (2024). Evaluation of inter-observer reliability in the case of trichotomous and four-level animal-based welfare indicators with two observers. ITALIAN JOURNAL OF ANIMAL SCIENCE, 23(1), 938-960 [10.1080/1828051x.2024.2367681].

Evaluation of inter-observer reliability in the case of trichotomous and four-level animal-based welfare indicators with two observers

Quatto P.;
2024

Abstract

This study focuses on assessing inter-observer reliability (IOR) between two observers in the case of trichotomous and four-level animal-based welfare indicators assessed at individual level. The Body Condition Score (BCS) and Knee calluses (KNC) were chosen as trichotomous indicators; data were collected in fourteen intensively managed dairy goat farms in Italy (ITF1 to ITF7) and Portugal (PTF1 to PTF7) and in extensively managed dairy goat farms exploiting three alpine pastures (AP1, AP2 and AP3) in Italy. The Ear posture (EP) and Eye white (EW) were chosen as four-level indicators; data were collected in three intensively managed dairy cattle farms (F1, F2 and F3) in Italy. The performance of the most documented agreement indices was compared. In the case of trichotomous indicators, Scott’s π, Cohen’s K, Cohen’s KC, Cohen’s weighted K and Krippendorff’s α were affected by the paradox effect: when the concordance rate (P0) was high, they sometimes gave very low or even negative values (e.g. P0(BCS-ITF3) = 74%; Scott’s π = 0.05; Cohen’s K = 0.09; Krippendorff’s α = 0.06; P0(BCS-AP3) = 74%; Scott’s π = −0.12; Cohen’s K = Krippendorff’s α = −0.11). Bangdiwala’s B, Gwet’s γ(AC1) and Quatto’s weighted S were not affected by this phenomenon and provided values very close to P0 (e.g. P0(KNC-PTF1) = 88%; Bangdiwala’s B = Gwet’s γ(AC1) = 0.85; P0(BCS-AP1) = 82%; Bangdiwala’s B = Gwet’s γ(AC1) = 0.79). In the case of four-level indicators, Cohen’s K and Krippendorff’s α were not affected by the paradox behaviour. However, Cohen’s KC in some cases exceeded the observed P0 (e.g. P0(EP-F3) = 78%; Cohen’s KC = 1). Gwet’s γ(AC1) showed the best results for four-level indicators (e.g. P0(EP-F1) = 88%; Gwet’s γ(AC1) = 0.86), followed by Quatto’s S and Holley and Guilford’s G (e.g. P0(EP-F1) = 88%; Quatto’s S = Holley and Guilford’s G = 0.84). To evaluate IOR between two observers, Bangdiwala’s B, Gwet’s γ(AC1) and Quatto’s weighted S are suggested for trichotomous indicators, while Gwet’s γ(AC1), Quatto’s S and Holley and Guilford’s G are suggested for four-level indicators.
Articolo in rivista - Articolo scientifico
Agreement index; animal-based measure; bootstrap method; inter-observer reliability; three- and four-level indicators;
English
18-giu-2024
2024
23
1
938
960
open
Torsiello, B., Giammarino, M., Quatto, P., Battini, M., Mattiello, S., Battaglini, L., et al. (2024). Evaluation of inter-observer reliability in the case of trichotomous and four-level animal-based welfare indicators with two observers. ITALIAN JOURNAL OF ANIMAL SCIENCE, 23(1), 938-960 [10.1080/1828051x.2024.2367681].
File in questo prodotto:
File Dimensione Formato  
Torsiello-2024-Italian Journal of Animal Science-VoR.pdf

accesso aperto

Descrizione: This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/)
Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Creative Commons
Dimensione 2.05 MB
Formato Adobe PDF
2.05 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/550852
Citazioni
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
Social impact