On Case-Based Reasoning for ETL Process Repairs: Making Cases Fine-Grained
[ 1 ] Politechnika Poznańska | [ 2 ] Instytut Informatyki, Wydział Informatyki i Telekomunikacji, Politechnika Poznańska | [ D ] phd student | [ P ] employee
2020
chapter in monograph / paper
english
- data source evolution
- ETL process repair
- case-based reasoning
EN Data sources (DSs) being integrated in a data warehouse frequently change their structures. As a consequence, in many cases, an already deployed ETL process stops its execution, generating errors. Since the number of deployed ETL processes may reach dozens of thousands and structural changes in DSs are frequent, being able to (semi-)automatically repair an ETL process after DS changes, would decrease ETL maintenance costs. In our approach, we developed the E-ETL framework, for ETL process repairs. In E-ETL, an ETL process is semi-automatically or automatically (depending on a case) repaired, so that it works with the changed DS. E-ETL supports two different repair methods: (1) user defined rules, (2) and Case-Based Reasoning (CBR). Having experimented with CBR, we learned that large cases do not frequently fit a given DS change, even though they include elements that could be applied to repair a given ETL process, and vice-versa - more complex DS changes cannot be handled by small cases. To solve this problem, in this paper, we contribute algorithms for decomposing detected structural changes in DSs. The purpose of the decomposition is to divide a set of detected structural DSs changes into smaller sets, to increase the probability of finding a suitable case by the CBR method.
12.08.2020
235 - 249
20
70