Abstract:
Extraction-Transformation-Loading (ETL) tools are pieces of software
responsible for the extraction of data from several sources, their cleansing,
customization and insertion into a data warehouse. In this paper, we delve into
the logical design of ETL scenarios and provide a generic and customizable
framework in order to support the DW designer in his task. First, we present a
metamodel particularly customized for the definition of ETL activities. We
follow a workflow-like approach, where the output of a certain activity can
either be stored persistently or passed to a subsequent activity. Also, we
employ a declarative database programming language, LDL, to define the
semantics of each activity. The metamodel is generic enough to capture any
possible ETL activity. Nevertheless, in the pursuit of higher reusability and
flexibility, we specialize the set of our generic metamodel constructs with a
palette of frequently-used ETL activities, which we call templates. Moreover,
in order to achieve a uniform extensibility mechanism for this library of
built-ins, we have to deal with specific language issues. Therefore, we also
discuss the mechanics of template instantiation to concrete activities. The
design concepts that we introduce have been implemented in a tool, ARKTOS II,
which is also presented.
Note:This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.