The design of data enrichment pipelines is a complex task that can be simplified by leveraging Large Language Models (LLMs) for natural language-based creation of Directed Acyclic Graphs (DAGs). This approach aims to democratize data pipeline development by enabling DAG generation through natural language text inputs. Our study explores using LLMs to generate Apache Airflow DAGs. This approach streamlines workflow design and makes complex data enrichment more accessible, while preserving the flexibility and power of Airflow's ecosystem. We conducted preliminary experiments using the SemT framework to address data enrichment challenges at scale. SemT service model, based on the Explore-Design-Operate methodology, enables flexible data transformation pipelines that work across different environments.
Alidu, A., Ciavotta, M., De Paoli, F. (2026). LLM-Based DAG Creation for Data Enrichment Pipelines in SemT Framework. In Service-Oriented Computing – ICSOC 2024 Workshops ASOCA, AI-PA, WESOACS, GAISS, LAIS, AI on Edge, RTSEMS, SQS, SOCAISA, SOC4AI and Satellite Events, Tunis, Tunisia, December 3–6, 2024, Revised Selected Papers, Part I (pp.131-143). Springer Singapore [10.1007/978-981-96-7238-7_11].
LLM-Based DAG Creation for Data Enrichment Pipelines in SemT Framework
Alidu A.
;Ciavotta M.;De Paoli F.
2026
Abstract
The design of data enrichment pipelines is a complex task that can be simplified by leveraging Large Language Models (LLMs) for natural language-based creation of Directed Acyclic Graphs (DAGs). This approach aims to democratize data pipeline development by enabling DAG generation through natural language text inputs. Our study explores using LLMs to generate Apache Airflow DAGs. This approach streamlines workflow design and makes complex data enrichment more accessible, while preserving the flexibility and power of Airflow's ecosystem. We conducted preliminary experiments using the SemT framework to address data enrichment challenges at scale. SemT service model, based on the Explore-Design-Operate methodology, enables flexible data transformation pipelines that work across different environments.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


