Abstract:
Extraction-Transformation-Loading (ETL) tools are pieces of software
responsible for the extraction of data from several sources, their cleansing,
customization and insertion into a data warehouse. In this paper, we delve into
the logical design of ETL scenarios and provide a generic and customizable
framework in order to support the DW designer in his task. First, we present a
metamodel particularly customized for the definition of ETL activities. We
follow a workflow-like approach, where the output of a certain activity can
either be stored persistently or passed to a subsequent activity. Also, we
employ a declarative database programming language, LDL, to define the
semantics of each activity. The metamodel is generic enough to capture any
possible ETL activity. Nevertheless, in the pursuit of higher reusability and
flexibility, we specialize the set of our generic metamodel constructs with a
palette of frequently-used ETL activities, which we call templates. Moreover,
in order to achieve a uniform extensibility mechanism for this library of
built-ins, we have to deal with specific language issues. Therefore, we also
discuss the mechanics of template instantiation to concrete activities. The
design concepts that we introduce have been implemented in a tool, ARKTOS II,
which is also presented.
Note:This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
Abstract:
Data visualization is one of the major issues of database research and OLAP,
being a decision support technology, is clearly in the center of this effort.
Still, so far, visualization has not been incorporated in the abstraction
levels of DBMS architecture (conceptual, logical, physical), neither has it
been formally treated in this context. In this paper we start by reconsidering
the separation of the aforementioned abstraction levels to take visualization
into consideration. Then, we present the Cube Presentation Model (CPM), a novel
presentational model for OLAP screens. The proposal lies on the fundamental
idea of separating the logical part of a data cube computation, from the
presentational part of the client tool. Then, CPM can be naturally mapped on
the Table Lens, which is an advanced visualization technique from the
Human-Computer Interaction area, particularly tailored for cross-tab reports.
Based on the particularities of Table Lens, we propose automated proactive
support to the user for the interaction with an OLAP screen. Finally, we
discuss implementation and usage issues in the context of an academic prototype
system (CubeView) that we have implemented.
Note: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
Abstract:
Data visualization is one of the big issues of database research.
OLAP as a decision support technology is highly related to the
developments of data visualization area. In this paper we
demonstrate how the Cube Presentation Model (CPM), a novel
presentational model for OLAP screens, can be naturally mapped
on the Table Lens, which is an advanced visualization technique
from the Human-Computer Interaction area, particularly tailored
for cross-tab reports. We consider how the user interacts with an
OLAP screen and based on the particularities of Table Lens, we
propose an automated proactive users support. Finally, we discuss
the necessity and the applicability of advanced visualization
techniques in the presence of recent technological developments.
Note: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
Abstract:
Extraction-Transformation-Loading (ETL) and Data Cleaning tools are pieces of
software responsible for the extraction of data from several sources, their
cleaning, customization and insertion into a data warehouse. To deal with the
complexity and efficiency of the transformation and cleaning tasks we have
developed a tool, namely ARKTOS, capable of modeling and executing practical
scenarios, by providing explicit primitives for the capturing of common tasks.
ARKTOS provides three ways to describe such a scenario, including a graphical
point-and-click front end and two declarative languages: XADL (an XML variant),
which is more verbose and easy to read and SADL (an SQL-like language) which
has a quite compact syntax and is, thus, easier for authoring.
Note: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
Abstract:
Extraction-Transformation-Loading (ETL) tools are pieces of software
responsible for the extraction of data from several sources, their cleansing,
customization and insertion into a data warehouse. Literature and personal
experience have guided us to conclude that the problems concerning the ETL
tools are primarily problems of complexity, usability and price. To deal with
these problems we provide a uniform metamodel for ETL processes, covering the
aspects of data warehouse architecture, activity modeling, contingency
treatment and quality management. The ETL tool we have developed, namely
ARKTOS, is capable of modeling and executing practical ETL scenarios by
providing explicit primitives for the capturing of common tasks.
provides three ways to describe an ETL scenario: a graphical point-and-click
front end and two declarative languages: XADL (an XML variant), which is more
verbose and easy to read and SADL (an SQL-like language) which has a quite
compact syntax and is, thus, easier for authoring.
Note: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
Abstract:
Extract-Transform-Load (ETL) workflows are data centric
workflows responsible for transferring, cleaning, and loading data from their
respective sources to the warehouse. Previous research has identified graphbased
techniques that construct the blueprints for the structure of such
workflows. In this paper, we extend existing results by explicitly incorporating
the internal semantics of each activity in the workflow graph. Apart from the
value that blueprints have per se, we exploit our modeling to introduce rigorous
techniques for the measurement of ETL workflows. To this end, we build upon
an existing formal framework for software quality metrics and formally prove
how our quality measures fit within this framework.
Note: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
Note: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
Abstract:
Extraction-Transformation-Loading (ETL) tools are pieces of
software responsible for the extraction of data from several
sources, their cleansing, customization and insertion into a data
warehouse. In this paper, we focus on the problem of the
definition of ETL activities and provide formal foundations for
their conceptual representation. The proposed conceptual model is
(a) customized for the tracing of inter-attribute relationships and
the respective ETL activities in the early stages of a data
warehouse project; (b) enriched with a 'palette' of a set of
frequently used ETL activities, like the assignment of surrogate
keys, the check for null values, etc; and (c) constructed in a
customizable and extensible manner, so that the designer can
enrich it with his own re-occurring patterns for ETL activities.
Note: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
Abstract:
Extract-Transform-Load (ETL) workflows are data centric workflows
responsible for transferring, cleaning, and loading data from their respective
sources to the warehouse. In this paper, we build upon existing graph-based
modeling techniques that treat ETL workflows as graphs by (a) extending the
activity semantics to incorporate negation, aggregation and self-joins, (b)
complementing querying semantics with insertions, deletions and updates, and (c)
transforming the graph to allow zoom-in/out at multiple levels of abstraction (i.e.,
passing from the detailed description of the graph at the attribute level to more
compact variants involving programs, relations and queries and vice-versa).
Note: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
Abstract:
Extraction-Transformation-Loading (ETL) tools are pieces of software responsible for the extraction of data from several sources, their cleansing, customization and insertion into a data warehouse. In this paper, we focus on the logical design of the ETL scenario of a data warehouse. Based on a formal logical model that includes the data stores, activities and their constituent parts, we model an ETL scenario as a graph, which we call the Architecture Graph. We model all the aforementioned entities as nodes and four different kinds of relationships (instance-of, part-of, regulator and provider relationships) as edges. In addition, we provide simple graph transformations that reduce the complexity of the graph. Finally, in order to support the engineering of the design and the evolution of the warehouse, we introduce specific importance metrics, namely dependence and responsibility, to measure the degree to which entities are bound to each other.
Note: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
Abstract:
Το άÏθÏο αυτό αφοÏά στο λογικό σχεδιασμό ΕΜΦ (Εξαγωγής-ΜετασχηματισμοÏ-ΦόÏτωσης) σεναÏίων για αποθήκες δεδομÎνων. Με βάση Îνα τυπικό λογικό μοντÎλο που αποτελείται από σημεία αποθήκευσης δεδομÎνων, διεÏγασίες και τα συστατικά τους μÎÏη, Îνα ΕΜΦ σενάÏιο μοντελοποιείται ως γÏάφος, που ονομάζεται ΓÏάφος ΑÏχιτεκτονικής. Όλες οι Ï€ÏοαναφεÏθείσες οντότητες αποτελοÏν τους κόμβους του γÏάφου και τα Ï„ÎσσεÏα διαφοÏετικά είδη σχÎσεων που Îχουν Î¼ÎµÏ„Î±Î¾Ï Ï„Î¿Ï…Ï‚ (όπως σχÎσεις στιγμιότυπου, μÎÏους, ÏÏθμισης και παÏοχής) τις ακμÎÏ‚. Με σκοπό να υποστηÏιχτεί ο σχεδιασμός και η εξÎλιξη της ΑΔ, οÏίζονται συγκεκÏιμÎνες μετÏήσεις σπουδαιότητας: η εξάÏτηση και η υπευθυνότητα, για τον υπολογισμό του Î²Î±Î¸Î¼Î¿Ï ÎºÎ±Ï„Î¬ τον οποίο είναι συνδεδεμÎνες Î¼ÎµÏ„Î±Î¾Ï Ï„Î¿Ï…Ï‚ οι οντότητες του σεναÏίου.
Note: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
Abstract:
In our days knowledge extraction methods are able to
produce artifacts (also called patterns) that concisely rep-
resent data. Patterns are usually quite heterogeneous and
require ad-hoc processing techniques. So far, little empha-
sis has been posed on developing an overall integrated en-
vironment for uniformly representing and querying dif-
ferent types of patterns. Within the larger context of mod-
elling, storing, and querying patterns, in this paper, we:
(a) formally de¯ne the logical foundations for the global
setting of pattern management through a model that cov-
ers data, patterns and their intermediate mappings; (b)
present a pattern speci¯cation language for pattern man-
agement along with safety restrictions; and (c) intro-
duce queries and query operators and identify interest-
ing query classes.
Note: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
Abstract:
It is commonly agreed that multidimensional data cubes form the basic logical data model for OLAP applications. Still, there seems to be no agreement on a common model for cubes. In this paper we propose a logical model for cubes based on the key observation that a cube is not a self-existing entity, but rather a view over an underlying data set. We accompany our model with syntactic characterisations for the problem of cube usability. To this end, we have developed algorithms to check whether (a) the marginal conditions of two cubes are appropriate for a rewriting, in the presence of aggregation hierarchies and (b) an implication exists between two selection conditions that involve different levels of aggregation of the same dimension hierarchy. Finally, we present a rewriting algorithm for the cube usability problem.
Note: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
Abstract:
Η ΣÏγχÏονη Αναλυτική ΕπεξεÏγασία ΔεδομÎνων (On-Line Analytical Processing - OLAP) είναι μια τάση στην τεχνολογία των βάσεων δεδομÎνων, που στηÏίζεται στη θεώÏηση της πληÏοφοÏίας με πολυδιάστατο Ï„Ïόπο στο επίπεδο των πελατών. ΠαÏά την κοινή αποδοχή των πολυδιάστατων κÏβων σαν το κεντÏικό λογικό μοντÎλο για OLAP και την πληθώÏα των εÏευνητικών Ï€Ïοτάσεων, υπάÏχει μικÏή συμφωνία στην εÏÏεση μιας κοινής οÏολογίας και σημασιολογίας για το λογικό μοντÎλο δεδομÎνων. Στο άÏθÏο αυτό Ï€Ïοτείνεται Îνα επιπλÎον λογικό μοντÎλο για κÏβους, με βάση την παÏατήÏηση ότι Îνας κÏβος δεν είναι μια αυθÏπαÏκτη οντότητα, αλλά μια όψη πάνω σε Îνα υποκείμενο σÏνολο δεδομÎνων. Το Ï€Ïοτεινόμενο μοντÎλο είναι αÏκετά ισχυÏÏŒ στο να καλÏπτει όλες τις συνηθισμÎνες Ï€Ïάξεις OLAP όπως επιλογή, συναθÏοιστική άνοδος και αναλυτική κάθοδος σε επίπεδα αδÏομÎÏειας, μÎσω μιας συνεποÏÏ‚ και πλήÏης άλγεβÏας. Δείχνεται επίσης πώς αυτό το μοντÎλο μποÏεί να χÏησιμοποιηθεί σαν η βάση για την επεξεÏγασία λειτουÏγιών στους κÏβους και παÏουσιάζονται συντακτικοί χαÏακτηÏισμοί για τα Ï€Ïοβλήματα της χÏησιμότητας κÏβων (ήτοι, του Ï€Ïοβλήματος χÏησιμοποιήσεως δεδομÎνων από κάποιον κÏβο για να υπολογιστεί Îνας άλλος κÏβος).
Note: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
Abstract:
Extraction-Transformation-Loading (ETL) tools are pieces of software responsible for the extraction of data from several sources, their cleansing, customization and insertion into a data warehouse. Research has only recently dealt with the above problem and provided few models, tools and techniques to address the issues around the ETL environment [1,2,3,5]. In this paper, we present a logical model for ETL processes. The proposed model is characterized by several templates, representing frequently used ETL activities along with their semantics and their interconnection. In the full version of the paper [4] we present more details on the aforementioned issues and complement them with results on the characterization of the content of the involved data stores after the execution of an ETL scenario and impact-analysis results in the presence of changes.
Note: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.