Building robust data pipelines often requires spe-cialized engineering skills, creating barriers for domain experts with limited coding expertise. We introduce Prompt2DAG, a modular prompting methodology that transforms natural language descriptions into executable Apache Airflow workflows by decomposing generation into three sequential stages: structured analysis, configuration generation, and code implementation. This approach aligns with established software engineering principles of separation of concerns and progressive refinement. Our evalu-ation across five different LLMs demonstrates that Prompt2DAG significantly outperforms conventional end-to-end generation, im-proving code quality (+78.4 %) and structural integrity (+43.2 %) of generated pipelines. Using a data enrichment case study, we show how this approach enables the development of high-quality workflows through natural language, effectively democratizing data pipeline development.

Alidu, A., Ciavotta, M., De Paoli, F. (2025). Prompt2DAG: A Modular Prompting Approach for Democratizing Data Pipeline Generation. In 2025 IEEE International Conference on Software Services Engineering (SSE) (pp.1-11). Institute of Electrical and Electronics Engineers Inc. [10.1109/SSE67621.2025.00010].

Prompt2DAG: A Modular Prompting Approach for Democratizing Data Pipeline Generation

Alidu A.;Ciavotta M.;De Paoli F.
2025

Abstract

Building robust data pipelines often requires spe-cialized engineering skills, creating barriers for domain experts with limited coding expertise. We introduce Prompt2DAG, a modular prompting methodology that transforms natural language descriptions into executable Apache Airflow workflows by decomposing generation into three sequential stages: structured analysis, configuration generation, and code implementation. This approach aligns with established software engineering principles of separation of concerns and progressive refinement. Our evalu-ation across five different LLMs demonstrates that Prompt2DAG significantly outperforms conventional end-to-end generation, im-proving code quality (+78.4 %) and structural integrity (+43.2 %) of generated pipelines. Using a data enrichment case study, we show how this approach enables the development of high-quality workflows through natural language, effectively democratizing data pipeline development.
paper
Apache Airflow; Data pipelines; Directed Acyclic Graphs (DAGs); Large Language Models (LLMs); Modular prompting; Pipeline democratization; Soft-ware engineering; Workflow generation;
English
2025 IEEE International Conference on Software Services Engineering, SSE 2025 - 07-12 July 2025
2025
2025 IEEE International Conference on Software Services Engineering (SSE)
9798331567897
2025
1
11
none
Alidu, A., Ciavotta, M., De Paoli, F. (2025). Prompt2DAG: A Modular Prompting Approach for Democratizing Data Pipeline Generation. In 2025 IEEE International Conference on Software Services Engineering (SSE) (pp.1-11). Institute of Electrical and Electronics Engineers Inc. [10.1109/SSE67621.2025.00010].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/588393
Citazioni
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
Social impact