Task-based chatbots are increasingly being used to deliver real services, yet assessing their reliability, security, and robustness remains underexplored, also due to the lack of large-scale, high-quality datasets. The emerging automated quality assessment techniques targeting chatbots often rely on limited pools of subjects, such as custom-made toy examples, or outdated, no longer available, or scarcely popular agents, complicating the evaluation of such techniques. In this paper, we present two datasets and the tool support necessary to create and maintain these datasets. The first dataset is RASA TASK-BASED CHATBOTS FROM GITHUB (TOFU-R), which is a snapshot of the Rasa chatbots available on GitHub, representing the state of the practice in open-source chatbot development with Rasa. The second dataset is BOT RASA COLLECTION (BRASATO), a curated selection of the most relevant chatbots for dialogue complexity, functional complexity, and utility, whose goal is to ease reproducibility and facilitate research on chatbot reliability.

Masserini, E., Clerissi, D., Micucci, D., Campos, J., Mariani, L. (2025). Towards the Assessment of Task-based Chatbots: From the TOFU-R Snapshot to the BRASATO Curated Dataset. In 2025 IEEE 36th International Symposium on Software Reliability Engineering (ISSRE) (pp.73-82). IEEE Computer Society [10.1109/issre66568.2025.00060].

Towards the Assessment of Task-based Chatbots: From the TOFU-R Snapshot to the BRASATO Curated Dataset

Masserini, Elena;Clerissi, Diego;Micucci, Daniela;Mariani, Leonardo
2025

Abstract

Task-based chatbots are increasingly being used to deliver real services, yet assessing their reliability, security, and robustness remains underexplored, also due to the lack of large-scale, high-quality datasets. The emerging automated quality assessment techniques targeting chatbots often rely on limited pools of subjects, such as custom-made toy examples, or outdated, no longer available, or scarcely popular agents, complicating the evaluation of such techniques. In this paper, we present two datasets and the tool support necessary to create and maintain these datasets. The first dataset is RASA TASK-BASED CHATBOTS FROM GITHUB (TOFU-R), which is a snapshot of the Rasa chatbots available on GitHub, representing the state of the practice in open-source chatbot development with Rasa. The second dataset is BOT RASA COLLECTION (BRASATO), a curated selection of the most relevant chatbots for dialogue complexity, functional complexity, and utility, whose goal is to ease reproducibility and facilitate research on chatbot reliability.
paper
Chatbot; ChatGPT; Dataset; GitHub; Rasa;
English
IEEE International Symposium on Software Reliability Engineering (ISSRE) - 21-24 October 2025
2025
2025 IEEE 36th International Symposium on Software Reliability Engineering (ISSRE)
9798350393026
2025
73
82
reserved
Masserini, E., Clerissi, D., Micucci, D., Campos, J., Mariani, L. (2025). Towards the Assessment of Task-based Chatbots: From the TOFU-R Snapshot to the BRASATO Curated Dataset. In 2025 IEEE 36th International Symposium on Software Reliability Engineering (ISSRE) (pp.73-82). IEEE Computer Society [10.1109/issre66568.2025.00060].
File in questo prodotto:
File Dimensione Formato  
Masserini-2025-Proceed Int Symposium Softw Reliability Engineer-VoR.pdf

Solo gestori archivio

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Tutti i diritti riservati
Dimensione 647.28 kB
Formato Adobe PDF
647.28 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/588562
Citazioni
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
Social impact