Abstract:
Extract-Transform-Load (ETL) workflows are data centric
workflows responsible for transferring, cleaning, and loading data from their
respective sources to the warehouse. Previous research has identified graphbased
techniques that construct the blueprints for the structure of such
workflows. In this paper, we extend existing results by explicitly incorporating
the internal semantics of each activity in the workflow graph. Apart from the
value that blueprints have per se, we exploit our modeling to introduce rigorous
techniques for the measurement of ETL workflows. To this end, we build upon
an existing formal framework for software quality metrics and formally prove
how our quality measures fit within this framework.
Note: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.