Framework to Optimize Data Processing Pipelines Using Performance Metrics
[ 1 ] Politechnika Poznańska | [ 2 ] Instytut Informatyki, Wydział Informatyki i Telekomunikacji, Politechnika Poznańska | [ D ] doktorant | [ P ] pracownik
2020
rozdział w monografii naukowej / referat
angielski
- ETL workflow
- ML workflow
- workflow optimization
- cost model
- parallelization
EN Optimizing Data Processing Pipelines (DPPs) is challenging in the context of both, data warehouse architectures and data science architectures. Few approaches to this problem have been proposed so far. The most challenging issue is to build a cost model of the whole DPP, especially if user defined functions (UDFs) are used. In this paper we addressed the problem of the optimization of UDFs in data-intensive workflows and presented our approach to construct a cost model to determine the degree of parallelism for parallelizable UDFs.
11.09.2020
131 - 140
20
70