The design of data enrichment pipelines is a complex task that can be simplified by leveraging Large Language Models (LLMs) for natural language-based creation of Directed Acyclic Graphs (DAGs). This approach aims to democratize data pipeline development by enabling DAG generation through natural language text inputs. Our study explores using LLMs to generate Apache Airflow DAGs. This approach streamlines workflow design and makes complex data enrichment more accessible, while preserving the flexibility and power of Airflow's ecosystem. We conducted preliminary experiments using the SemT framework to address data enrichment challenges at scale. SemT service model, based on the Explore-Design-Operate methodology, enables flexible data transformation pipelines that work across different environments.

Alidu, A., Ciavotta, M., De Paoli, F. (2026). LLM-Based DAG Creation for Data Enrichment Pipelines in SemT Framework. In Service-Oriented Computing – ICSOC 2024 Workshops ASOCA, AI-PA, WESOACS, GAISS, LAIS, AI on Edge, RTSEMS, SQS, SOCAISA, SOC4AI and Satellite Events, Tunis, Tunisia, December 3–6, 2024, Revised Selected Papers, Part I (pp.131-143). Springer Singapore [10.1007/978-981-96-7238-7_11].

LLM-Based DAG Creation for Data Enrichment Pipelines in SemT Framework

Alidu A.
;
Ciavotta M.;De Paoli F.
2026

Abstract

The design of data enrichment pipelines is a complex task that can be simplified by leveraging Large Language Models (LLMs) for natural language-based creation of Directed Acyclic Graphs (DAGs). This approach aims to democratize data pipeline development by enabling DAG generation through natural language text inputs. Our study explores using LLMs to generate Apache Airflow DAGs. This approach streamlines workflow design and makes complex data enrichment more accessible, while preserving the flexibility and power of Airflow's ecosystem. We conducted preliminary experiments using the SemT framework to address data enrichment challenges at scale. SemT service model, based on the Explore-Design-Operate methodology, enables flexible data transformation pipelines that work across different environments.
paper
AI applications; Apache Airflow; Data enrichment; Data Semantics; Directed Acyclic; Graphs; Large Language Models; Natural language processing; Pipelineor chestration; SemT-Model;
English
Workshops and other Satellite Events which were held in conjunction with the 22nd International Conference on Service-Oriented Computing, ICSOC 2024 - December 3–6, 2024
2024
Kallel, S; Raibulet, C; Bouassida Rodriguez, I; Faci, N; Bennaceur, A; Cheikhrouhou, S; Ben Ayed, L; Sellami, M; Nakagawa, EY; Ben Halima, R
Service-Oriented Computing – ICSOC 2024 Workshops ASOCA, AI-PA, WESOACS, GAISS, LAIS, AI on Edge, RTSEMS, SQS, SOCAISA, SOC4AI and Satellite Events, Tunis, Tunisia, December 3–6, 2024, Revised Selected Papers, Part I
9789819672370
23-lug-2025
2026
15833 LNCS
131
143
none
Alidu, A., Ciavotta, M., De Paoli, F. (2026). LLM-Based DAG Creation for Data Enrichment Pipelines in SemT Framework. In Service-Oriented Computing – ICSOC 2024 Workshops ASOCA, AI-PA, WESOACS, GAISS, LAIS, AI on Edge, RTSEMS, SQS, SOCAISA, SOC4AI and Satellite Events, Tunis, Tunisia, December 3–6, 2024, Revised Selected Papers, Part I (pp.131-143). Springer Singapore [10.1007/978-981-96-7238-7_11].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/588395
Citazioni
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
Social impact