Depending on the amount of data to process, file generation may take longer.

If it takes too long to generate, you can limit the data by, for example, reducing the range of years.

Chapter

Download BibTeX

Title

Towards a Cost Model to Optimize User-Defined Functions in an ETL Workflow Based on User-Defined Performance Metrics

Authors

[ 1 ] Wydział Informatyki, Politechnika Poznańska | [ 2 ] Instytut Informatyki, Wydział Informatyki, Politechnika Poznańska | [ P ] employee

Scientific discipline (Law 2.0)

[2.3] Information and communication technology

Year of publication

2019

Chapter type

chapter in monograph / paper

Publication language

english

Keywords
EN
  • ETL workflow
  • ETL execution optimization
  • user-defined functions
  • cost model
  • parallelization
Abstract

EN Today’s ETL tools provide capabilities for developing custom code as user-defined functions (UDFs) to extend the expressiveness of standard ETL operators. However, a custom code of an UDF may execute inefficiently due to its poor implementation (e.g., due to the lack of using parallel processing or adequate data structures). In this paper we address the problem of the optimization of UDFs in data-intensive workflows and presented our approach to construct a cost model to determine the degree of parallelism for parallelizable UDFs.

Date of online publication

13.08.2019

Pages (from - to)

441 - 456

DOI

10.1007/978-3-030-28730-6_27

URL

https://link.springer.com/chapter/10.1007%2F978-3-030-28730-6_27

Book

Advances in Databases and Information Systems : 23rd European Conference, ADBIS 2019, Bled, Slovenia, September 8–11, 2019 : Proceedings

Presented on

23rd European Conference on Advances in Databases and Information Systems, ADBIS 2019, 8-11.09.2019, Bled, Slovenia

Ministry points / chapter

20

Ministry points / conference (CORE)

70

This website uses cookies to remember the authenticated session of the user. For more information, read about Cookies and Privacy Policy.