Depending on the amount of data to process, file generation may take longer.

If it takes too long to generate, you can limit the data by, for example, reducing the range of years.

Chapter

Download BibTeX

Title

Data Engineering for Data Science: Two Sides of the Same Coin

Authors

[ 1 ] Instytut Informatyki, Wydział Informatyki i Telekomunikacji, Politechnika Poznańska | [ P ] employee

Scientific discipline (Law 2.0)

[2.3] Information and communication technology

Year of publication

2020

Chapter type

chapter in monograph / paper

Publication language

english

Keywords
EN
  • data science
  • data analytics
  • data engineering
  • data management
  • data processing pipeline
Abstract

EN A de facto technological standard of data science is based on notebooks (e.g., Jupyter), which provide an integrated environment to execute data workflows in different languages. However, from a data engineering point of view, this approach is typically inefficient and unsafe, as most of the data science languages process data locally, i.e., in workstations with limited memory, and store data in files. Thus, this approach neglects the benefits brought by over 40 years of R&D in the area of data engineering, i.e., advanced database technologies and data management techniques. In this paper, we advocate for a standardized data engineering approach for data science and we present a layered architecture for a data processing pipeline (DPP). This architecture provides a comprehensive conceptual view of DPPs, which next enables the semi-automation of the logical and physical designs of such DPPs.

Date of online publication

11.09.2020

Pages (from - to)

157 - 166

DOI

10.1007/978-3-030-59065-9_13

URL

https://link.springer.com/chapter/10.1007/978-3-030-59065-9_13

Book

Big Data Analytics and Knowledge Discovery : 22nd International Conference, DaWaK 2020, Bratislava, Slovakia, September 14–17, 2020 : Proceedings

Presented on

22nd International Conference on Big Data Analytics and Knowledge Discovery DaWaK 2020, 14-17.09.2020, Bratislava, Slovac Republic

Ministry points / chapter

20

Ministry points / conference (CORE)

70

This website uses cookies to remember the authenticated session of the user. For more information, read about Cookies and Privacy Policy.