The data warehouse ETL toolkit: practical techniques for extracting, cleaning, conforming, and delivering data

Bibliographic Details
Other Authors: Kimball, Ralph, Caserta, Joe
Format: Book
Table of Contents:
  • Part I Requirements, Realities, and Architecture Chapter 1 Surrounding the Requirements Requirements Architecture The Mission of the Data Warehouse The Mission of the ETL Team CHAPTER 2 ETL Data Structures To Stage or Not To Stage Designing the Staging Area Data Structures in the ETL System Planning and Design Standards Summary Part II Data Flow CHAPTER 3 Extracting Part 1: The Logical Data Map Inside the Logical Data Map Building the Logical Data Map Integrating Heterogeneous Data Sources Mainframe Sources Flat Files XML Sources Web Log Sources ERP System Sources Part 3: Extracting Changed Data Summary CHAPTER 4 Cleaning and Conforming Defining Data Quality Assumptions Part 1: Design Objectives Part 2: Cleaning Deliverables Part 3: Screens and Their Measurements Part 4: Conforming Deliverables Summary chapter 5 Delivering Dimension Tables The Basic Structure of a Dimension The Grain of a Dimension The Basic Load Plan for a Dimension Flat Dimensions and Snowflaked Dimensions Date and Time Dimensions Big Dimensions Small Dimensions One Dimension or Two Dimensional Roles Dimensions as Subdimensions of Another Dimension Degenerate Dimensions Slowly Changing Dimensions Type 1 Slowly Changing Dimension (Overwrite) Type 2 Slowly Changing Dimension (Partitioning History) Precise Time Stamping of a Type 2 Slowly Changing Dimension Type 3 Slowly Changing Dimension (Alternate Realities) Hybrid Slowly Changing Dimensions Late Arriving Dimension Records and Correcting Bad Data Multi-Valued Dimensions and Bridge Tables Ragged Hierarchies and Bridge Tables Technical Note: POPULATING HIERARCHY BRIDGE TABLES Using Positional Attributes in a Dimension to Represent Text Facts Summary chapter 6 Delivering Fact Tables The Basic Structure of a Fact Table Guaranteeing Referential Integrity Surrogate Key Pipeline Fundamental Grains Preparing for Loading Fact Tables Factless Fact Tables Augmenting a Type 1 Fact Table With Type 2 History Graceful Modifications Multiple Units of Measure In A Fact Table Collecting Revenue In Multiple Currencies Late Arriving Facts Aggregations Delivering Dimensional Data to OLAP Cubes Summary Part III Implementation and Operations chapter 7 Development Current Marketplace ETL Tool Suite Offerings Current Scripting Languages Time Is of the Essence Using Database Bulk Loader Utilities to Speed Inserts Managing Database Features to Improve Performance Troubleshooting Performance Problems Increasing ETL Throughput Summary chapter 8 Operations Scheduling and Support Migrating to Production Achieving Optimal ETL Performance Purging Historic Data Monitoring the ETL System Tuning ETL Processes ETL System Security Short Term Archiving and Recovery Long Term Archiving and Recovery Summary chapter 9 Metadata Defining Metadata Business Metadata Technical Metadata ETL-Generated Metadata Metadata Standards and Practices Impact Analysis Summary chapter 10 Responsibilities Planning and Leadership Managing the Project Summary Part IV Real Time Streaming ETL Systems chapter 11 Real Time ETL Systems Why Real-Time ETL? Defining Real-Time ETL Challenges and Opportunities of Real-Time Data Warehousing Real-Time Data Warehousing Review Categorizing the Requirement Real-Time ETL Approaches Summary chapter 12 Conclusions Deepening the Definition of ETL The Future of Data Warehousing and ETL in Particular Library of Congress Subject Headings for this publication: Data warehousing. Database design.