Data Science is increasingly applied for solving real-life problems, both in industry and in academic research, but mastering Data Science requires an interdisciplinary education that is still scarce on the market. Thus, there is a growing need for user-friendly tools that allow domain experts to directly apply data analysis methods to their datasets, without involving a Data Science expert. In this scenario, we present DSBot, an assistant that can analyze the user data and produce answers by mastering several Data Science techniques. DSBot understands the research question with the help of conversation interaction, produces a data science pipeline and automatically executes the pipeline in order to generate analysis. The strength of DSBot lies in the design of a rich domain specific language for modeling data analysis pipelines, the use of a suitable neural network for machine translation of research questions, the availability of a vast dictionary of pipelines for matching the translation output, and the use of natural language technology provided by a conversational agent. We benchmarked DSBot on two sets of 100 natural language questions and of 30 prediction tasks. We empirically evaluated the translation capabilities and the autoML performance of the system. In the translation task, it obtains a median BLEU score of 0.75. In prediction tasks, DSBot outperforms TPOT, an autoML tool, in 19 datasets out of 30.

Pido’, S., Pinoli, P., Crovari, P., Ieva, F., Garzotto, F., Ceri, S. (2023). Ask Your Data - Supporting Data Science Processes by Combining AutoML and Conversational Interfaces. IEEE ACCESS, 11, 45972-45988 [10.1109/ACCESS.2023.3272503].

Ask Your Data - Supporting Data Science Processes by Combining AutoML and Conversational Interfaces

Garzotto, Franca;
2023

Abstract

Data Science is increasingly applied for solving real-life problems, both in industry and in academic research, but mastering Data Science requires an interdisciplinary education that is still scarce on the market. Thus, there is a growing need for user-friendly tools that allow domain experts to directly apply data analysis methods to their datasets, without involving a Data Science expert. In this scenario, we present DSBot, an assistant that can analyze the user data and produce answers by mastering several Data Science techniques. DSBot understands the research question with the help of conversation interaction, produces a data science pipeline and automatically executes the pipeline in order to generate analysis. The strength of DSBot lies in the design of a rich domain specific language for modeling data analysis pipelines, the use of a suitable neural network for machine translation of research questions, the availability of a vast dictionary of pipelines for matching the translation output, and the use of natural language technology provided by a conversational agent. We benchmarked DSBot on two sets of 100 natural language questions and of 30 prediction tasks. We empirically evaluated the translation capabilities and the autoML performance of the system. In the translation task, it obtains a median BLEU score of 0.75. In prediction tasks, DSBot outperforms TPOT, an autoML tool, in 19 datasets out of 30.
Articolo in rivista - Articolo scientifico
Automated machine learning; data science; human-computer interaction; intelligent systems; natural language understanding; pipeline optimization; python;
English
2-mar-2023
2023
11
45972
45988
open
Pido’, S., Pinoli, P., Crovari, P., Ieva, F., Garzotto, F., Ceri, S. (2023). Ask Your Data - Supporting Data Science Processes by Combining AutoML and Conversational Interfaces. IEEE ACCESS, 11, 45972-45988 [10.1109/ACCESS.2023.3272503].
File in questo prodotto:
File Dimensione Formato  
Pidó-2023-IEEE Access-VoR.pdf

accesso aperto

Descrizione: This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Creative Commons
Dimensione 2.23 MB
Formato Adobe PDF
2.23 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/524293
Citazioni
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 1
Social impact