{"product_id":"pentaho-kettle-solutions-isbn-9780470635179","title":"Pentaho Kettle Solutions","description":"\u003cb\u003eA complete guide to Pentaho Kettle, the Pentaho Data lntegration toolset for ETL\u003c\/b\u003e  \u003cp\u003eThis practical book is a complete guide to installing, configuring, and managing Pentaho Kettle. If you’re a database administrator or developer, you’ll first get up to speed on Kettle basics and how to apply Kettle to create ETL solutions—before progressing to specialized concepts such as clustering, extensibility, and data vault models. Learn how to design and build every phase of an ETL solution.\u003c\/p\u003e \u003cul\u003e \u003cli\u003eShows developers and database administrators how to use the open-source Pentaho Kettle for enterprise-level ETL processes (Extracting, Transforming, and Loading data)\u003c\/li\u003e \u003cli\u003eAssumes no prior knowledge of Kettle or ETL, and brings beginners thoroughly up to speed at their own pace\u003c\/li\u003e \u003cli\u003eExplains how to get Kettle solutions up and running, then follows the 34 ETL subsystems model, as created by the Kimball Group, to explore the entire ETL lifecycle, including all aspects of data warehousing with Kettle\u003c\/li\u003e \u003cli\u003eGoes beyond routine tasks to explore how to extend Kettle and scale Kettle solutions using a distributed “cloud”\u003c\/li\u003e \u003c\/ul\u003e \u003cp\u003eGet the most out of Pentaho Kettle and your data warehousing with this detailed guide—from simple single table data migration to complex multisystem clustered data integration tasks.\u003c\/p\u003e  \u003cp\u003eIntroduction xxxi\u003c\/p\u003e \u003cp\u003ePart I Getting Started 1\u003c\/p\u003e \u003cp\u003eChapter 1 ETL Primer 3\u003c\/p\u003e \u003cp\u003eOLTP versus Data Warehousing 3\u003c\/p\u003e \u003cp\u003eWhat Is ETL? 5\u003c\/p\u003e \u003cp\u003eThe Evolution of ETL Solutions 5\u003c\/p\u003e \u003cp\u003eETL Building Blocks 7\u003c\/p\u003e \u003cp\u003eETL, ELT, and EII 8\u003c\/p\u003e \u003cp\u003eELT 9\u003c\/p\u003e \u003cp\u003eEII: Virtual Data Integration 10\u003c\/p\u003e \u003cp\u003eData Integration Challenges 11\u003c\/p\u003e \u003cp\u003eMethodology: Agile BI 12\u003c\/p\u003e \u003cp\u003eETL Design 14\u003c\/p\u003e \u003cp\u003eData Acquisition 14\u003c\/p\u003e \u003cp\u003eBeware of Spreadsheets 15\u003c\/p\u003e \u003cp\u003eDesign for Failure 15\u003c\/p\u003e \u003cp\u003eChange Data Capture 16\u003c\/p\u003e \u003cp\u003eData Quality 16\u003c\/p\u003e \u003cp\u003eData Profiling 16\u003c\/p\u003e \u003cp\u003eData Validation 17\u003c\/p\u003e \u003cp\u003eETL Tool Requirements 17\u003c\/p\u003e \u003cp\u003eConnectivity 17\u003c\/p\u003e \u003cp\u003ePlatform Independence 18\u003c\/p\u003e \u003cp\u003eScalability 18\u003c\/p\u003e \u003cp\u003eDesign Flexibility 19\u003c\/p\u003e \u003cp\u003eReuse 19\u003c\/p\u003e \u003cp\u003eExtensibility 19\u003c\/p\u003e \u003cp\u003eData Transformations 20\u003c\/p\u003e \u003cp\u003eTesting and Debugging 21\u003c\/p\u003e \u003cp\u003eLineage and Impact Analysis 21\u003c\/p\u003e \u003cp\u003eLogging and Auditing 22\u003c\/p\u003e \u003cp\u003eSummary 22\u003c\/p\u003e \u003cp\u003eChapter 2 Kettle Concepts 23\u003c\/p\u003e \u003cp\u003eDesign Principles 23\u003c\/p\u003e \u003cp\u003eThe Building Blocks of Kettle Design 25\u003c\/p\u003e \u003cp\u003eTransformations 25\u003c\/p\u003e \u003cp\u003eSteps 26\u003c\/p\u003e \u003cp\u003eTransformation Hops 26\u003c\/p\u003e \u003cp\u003eParallelism 27\u003c\/p\u003e \u003cp\u003eRows of Data 27\u003c\/p\u003e \u003cp\u003eData Conversion 29\u003c\/p\u003e \u003cp\u003eJobs 30\u003c\/p\u003e \u003cp\u003eJob Entries 31\u003c\/p\u003e \u003cp\u003eJob Hops 31\u003c\/p\u003e \u003cp\u003eMultiple Paths and Backtracking 32\u003c\/p\u003e \u003cp\u003eParallel Execution 33\u003c\/p\u003e \u003cp\u003eJob Entry Results 34\u003c\/p\u003e \u003cp\u003eTransformation or Job Metadata 36\u003c\/p\u003e \u003cp\u003eDatabase Connections 37\u003c\/p\u003e \u003cp\u003eSpecial Options 38\u003c\/p\u003e \u003cp\u003eThe Power of the Relational Database 39\u003c\/p\u003e \u003cp\u003eConnections and Transactions 39\u003c\/p\u003e \u003cp\u003eDatabase Clustering 40\u003c\/p\u003e \u003cp\u003eTools and Utilities 41\u003c\/p\u003e \u003cp\u003eRepositories 41\u003c\/p\u003e \u003cp\u003eVirtual File Systems 42\u003c\/p\u003e \u003cp\u003eParameters and Variables 43\u003c\/p\u003e \u003cp\u003eDefining Variables 43\u003c\/p\u003e \u003cp\u003eNamed Parameters 44\u003c\/p\u003e \u003cp\u003eUsing Variables 44\u003c\/p\u003e \u003cp\u003eVisual Programming 45\u003c\/p\u003e \u003cp\u003eGetting Started 46\u003c\/p\u003e \u003cp\u003eCreating New Steps 47\u003c\/p\u003e \u003cp\u003ePutting It All Together 49\u003c\/p\u003e \u003cp\u003eSummary 51\u003c\/p\u003e \u003cp\u003eChapter 3 Installation and Configuration 53\u003c\/p\u003e \u003cp\u003eKettle Software Overview 53\u003c\/p\u003e \u003cp\u003eIntegrated Development Environment: Spoon 55\u003c\/p\u003e \u003cp\u003eCommand-Line Launchers: Kitchen and Pan 57\u003c\/p\u003e \u003cp\u003eJob Server: Carte 57\u003c\/p\u003e \u003cp\u003eEncr.bat and encr.sh 58\u003c\/p\u003e \u003cp\u003eInstallation 58\u003c\/p\u003e \u003cp\u003eJava Environment 58\u003c\/p\u003e \u003cp\u003eInstalling Java Manually 58\u003c\/p\u003e \u003cp\u003eUsing Your Linux Package Management System 59\u003c\/p\u003e \u003cp\u003eInstalling Kettle 59\u003c\/p\u003e \u003cp\u003eVersions and Releases 59\u003c\/p\u003e \u003cp\u003eArchive Names and Formats 60\u003c\/p\u003e \u003cp\u003eDownloading and Uncompressing 60\u003c\/p\u003e \u003cp\u003eRunning Kettle Programs 61\u003c\/p\u003e \u003cp\u003eCreating a Shortcut Icon or Launcher for Spoon 62\u003c\/p\u003e \u003cp\u003eConfiguration 63\u003c\/p\u003e \u003cp\u003eConfiguration Files and the .kettle Directory 63\u003c\/p\u003e \u003cp\u003eThe Kettle Shell Scripts 69\u003c\/p\u003e \u003cp\u003eGeneral Structure of the Startup Scripts 70\u003c\/p\u003e \u003cp\u003eAdding an Entry to the Classpath 70\u003c\/p\u003e \u003cp\u003eChanging the Maximum Heap Size 71\u003c\/p\u003e \u003cp\u003eManaging JDBC Drivers 72\u003c\/p\u003e \u003cp\u003eSummary 72\u003c\/p\u003e \u003cp\u003eChapter 4 An Example ETL Solution—Sakila 73\u003c\/p\u003e \u003cp\u003eSakila 73\u003c\/p\u003e \u003cp\u003eThe Sakila Sample Database 74\u003c\/p\u003e \u003cp\u003eDVD Rental Business Process 74\u003c\/p\u003e \u003cp\u003eSakila Database Schema Diagram 75\u003c\/p\u003e \u003cp\u003eSakila Database Subject Areas 75\u003c\/p\u003e \u003cp\u003eGeneral Design Considerations 77\u003c\/p\u003e \u003cp\u003eInstalling the Sakila Sample Database 77\u003c\/p\u003e \u003cp\u003eThe Rental Star Schema 78\u003c\/p\u003e \u003cp\u003eRental Star Schema Diagram 78\u003c\/p\u003e \u003cp\u003eRental Fact Table 79\u003c\/p\u003e \u003cp\u003eDimension Tables 79\u003c\/p\u003e \u003cp\u003eKeys and Change Data Capture 80\u003c\/p\u003e \u003cp\u003eInstalling the Rental Star Schema 81\u003c\/p\u003e \u003cp\u003ePrerequisites and Some Basic Spoon Skills 81\u003c\/p\u003e \u003cp\u003eSetting Up the ETL Solution 82\u003c\/p\u003e \u003cp\u003eCreating Database Accounts 82\u003c\/p\u003e \u003cp\u003eWorking with Spoon 82\u003c\/p\u003e \u003cp\u003eOpening Transformation and Job Files 82\u003c\/p\u003e \u003cp\u003eOpening the Step’s Configuration Dialog 83\u003c\/p\u003e \u003cp\u003eExamining Streams 83\u003c\/p\u003e \u003cp\u003eRunning Jobs and Transformations 83\u003c\/p\u003e \u003cp\u003eThe Sample ETL Solution 84\u003c\/p\u003e \u003cp\u003eStatic, Generated Dimensions 84\u003c\/p\u003e \u003cp\u003eLoading the dim_date Dimension Table 84\u003c\/p\u003e \u003cp\u003eLoading the dim_time Dimension Table 86\u003c\/p\u003e \u003cp\u003eRecurring Load 87\u003c\/p\u003e \u003cp\u003eThe load_rentals Job 88\u003c\/p\u003e \u003cp\u003eThe load_dim_staff Transformation 91\u003c\/p\u003e \u003cp\u003eDatabase Connections 91\u003c\/p\u003e \u003cp\u003eThe load_dim_customer Transformation 95\u003c\/p\u003e \u003cp\u003eThe load_dim_store Transformation 98\u003c\/p\u003e \u003cp\u003eThe fetch_address Subtransformation 99\u003c\/p\u003e \u003cp\u003eThe load_dim_actor Transformation 101\u003c\/p\u003e \u003cp\u003eThe load_dim_film Transformation 102\u003c\/p\u003e \u003cp\u003eThe load_fact_rental Transformation 107\u003c\/p\u003e \u003cp\u003eSummary 109\u003c\/p\u003e \u003cp\u003ePart II ETL 111\u003c\/p\u003e \u003cp\u003eChapter 5 ETL Subsystems 113\u003c\/p\u003e \u003cp\u003eIntroduction to the 34 Subsystems 114\u003c\/p\u003e \u003cp\u003eExtraction 114\u003c\/p\u003e \u003cp\u003eSubsystems 1–3: Data Profiling, Change Data Capture, and\u003c\/p\u003e \u003cp\u003eExtraction 115\u003c\/p\u003e \u003cp\u003eCleaning and Conforming Data 116\u003c\/p\u003e \u003cp\u003eSubsystem 4: Data Cleaning and Quality Screen\u003c\/p\u003e \u003cp\u003eHandler System 116\u003c\/p\u003e \u003cp\u003eSubsystem 5: Error Event Handler 117\u003c\/p\u003e \u003cp\u003eSubsystem 6: Audit Dimension Assembler 117\u003c\/p\u003e \u003cp\u003eSubsystem 7: Deduplication System 117\u003c\/p\u003e \u003cp\u003eSubsystem 8: Data Conformer 118\u003c\/p\u003e \u003cp\u003eData Delivery 118\u003c\/p\u003e \u003cp\u003eSubsystem 9: Slowly Changing Dimension Processor 118\u003c\/p\u003e \u003cp\u003eSubsystem 10: Surrogate Key Creation System 119\u003c\/p\u003e \u003cp\u003eSubsystem 11: Hierarchy Dimension Builder 119\u003c\/p\u003e \u003cp\u003eSubsystem 12: Special Dimension Builder 120\u003c\/p\u003e \u003cp\u003eSubsystem 13: Fact Table Loader 121\u003c\/p\u003e \u003cp\u003eSubsystem 14: Surrogate Key Pipeline 121\u003c\/p\u003e \u003cp\u003eSubsystem 15: Multi-Valued Dimension Bridge Table Builder 121\u003c\/p\u003e \u003cp\u003eSubsystem 16: Late-Arriving Data Handler 122\u003c\/p\u003e \u003cp\u003eSubsystem 17: Dimension Manager System 122\u003c\/p\u003e \u003cp\u003eSubsystem 18: Fact Table Provider System 122\u003c\/p\u003e \u003cp\u003eSubsystem 19: Aggregate Builder 123\u003c\/p\u003e \u003cp\u003eSubsystem 20: Multidimensional (OLAP) Cube Builder 123\u003c\/p\u003e \u003cp\u003eSubsystem 21: Data Integration Manager 123\u003c\/p\u003e \u003cp\u003eManaging the ETL Environment 123\u003c\/p\u003e \u003cp\u003eSummary 126\u003c\/p\u003e \u003cp\u003eChapter 6 Data Extraction 127\u003c\/p\u003e \u003cp\u003eKettle Data Extraction Overview 128\u003c\/p\u003e \u003cp\u003eFile-Based Extraction 128\u003c\/p\u003e \u003cp\u003eWorking with Text Files 128\u003c\/p\u003e \u003cp\u003eWorking with XML files 133\u003c\/p\u003e \u003cp\u003eSpecial File Types 134\u003c\/p\u003e \u003cp\u003eDatabase-Based Extraction 134\u003c\/p\u003e \u003cp\u003eWeb-Based Extraction 137\u003c\/p\u003e \u003cp\u003eText-Based Web Extraction 137\u003c\/p\u003e \u003cp\u003eHTTP Client 137\u003c\/p\u003e \u003cp\u003eUsing SOAP 138\u003c\/p\u003e \u003cp\u003eStream-Based and Real-Time Extraction 138\u003c\/p\u003e \u003cp\u003eWorking with ERP and CRM Systems 138\u003c\/p\u003e \u003cp\u003eERP Challenges 139\u003c\/p\u003e \u003cp\u003eKettle ERP Plugins 140\u003c\/p\u003e \u003cp\u003eWorking with SAP Data 140\u003c\/p\u003e \u003cp\u003eERP and CDC Issues 146\u003c\/p\u003e \u003cp\u003eData Profiling 146\u003c\/p\u003e \u003cp\u003eUsing eobjects.org DataCleaner 147\u003c\/p\u003e \u003cp\u003eAdding Profile Tasks 149\u003c\/p\u003e \u003cp\u003eAdding Database Connections 149\u003c\/p\u003e \u003cp\u003eDoing an Initial Profile 151\u003c\/p\u003e \u003cp\u003eWorking with Regular Expressions 151\u003c\/p\u003e \u003cp\u003eProfiling and Exploring Results 152\u003c\/p\u003e \u003cp\u003eValidating and Comparing Data 153\u003c\/p\u003e \u003cp\u003eUsing a Dictionary for Column Dependency Checks 153\u003c\/p\u003e \u003cp\u003eAlternative Solutions 154\u003c\/p\u003e \u003cp\u003eText Profiling with Kettle 154\u003c\/p\u003e \u003cp\u003eCDC: Change Data Capture 154\u003c\/p\u003e \u003cp\u003eSource Data–Based CDC 155\u003c\/p\u003e \u003cp\u003eTrigger-Based CDC 157\u003c\/p\u003e \u003cp\u003eSnapshot-Based CDC 158\u003c\/p\u003e \u003cp\u003eLog-Based CDC 162\u003c\/p\u003e \u003cp\u003eWhich CDC Alternative Should You Choose? 163\u003c\/p\u003e \u003cp\u003eDelivering Data 164\u003c\/p\u003e \u003cp\u003eSummary 164\u003c\/p\u003e \u003cp\u003eChapter 7 Cleansing and Conforming 167\u003c\/p\u003e \u003cp\u003eData Cleansing 168\u003c\/p\u003e \u003cp\u003eData-Cleansing Steps 169\u003c\/p\u003e \u003cp\u003eUsing Reference Tables 172\u003c\/p\u003e \u003cp\u003eConforming Data Using Lookup Tables 172\u003c\/p\u003e \u003cp\u003eConforming Data Using Reference Tables 175\u003c\/p\u003e \u003cp\u003eData Validation 179\u003c\/p\u003e \u003cp\u003eApplying Validation Rules 180\u003c\/p\u003e \u003cp\u003eValidating Dependency Constraints 183\u003c\/p\u003e \u003cp\u003eError Handling 183\u003c\/p\u003e \u003cp\u003eHandling Process Errors 184\u003c\/p\u003e \u003cp\u003eTransformation Errors 186\u003c\/p\u003e \u003cp\u003eHandling Data (Validation) Errors 187\u003c\/p\u003e \u003cp\u003eAuditing Data and Process Quality 191\u003c\/p\u003e \u003cp\u003eDeduplicating Data 192\u003c\/p\u003e \u003cp\u003eHandling Exact Duplicates 193\u003c\/p\u003e \u003cp\u003eThe Problem of Non-Exact Duplicates 194\u003c\/p\u003e \u003cp\u003eBuilding Deduplication Transforms 195\u003c\/p\u003e \u003cp\u003eStep 1: Fuzzy Match 197\u003c\/p\u003e \u003cp\u003eStep 2: Select Suspects 198\u003c\/p\u003e \u003cp\u003eStep 3: Lookup Validation Value 198\u003c\/p\u003e \u003cp\u003eStep 4: Filter Duplicates 199\u003c\/p\u003e \u003cp\u003eScripting 200\u003c\/p\u003e \u003cp\u003eFormula 201\u003c\/p\u003e \u003cp\u003eJavaScript 202\u003c\/p\u003e \u003cp\u003eUser-Defined Java Expressions 202\u003c\/p\u003e \u003cp\u003eRegular Expressions 203\u003c\/p\u003e \u003cp\u003eSummary 205\u003c\/p\u003e \u003cp\u003eChapter 8 Handling Dimension Tables 207\u003c\/p\u003e \u003cp\u003eManaging Keys 208\u003c\/p\u003e \u003cp\u003eManaging Business Keys 209\u003c\/p\u003e \u003cp\u003eKeys in the Source System 209\u003c\/p\u003e \u003cp\u003eKeys in the Data Warehouse 209\u003c\/p\u003e \u003cp\u003eBusiness Keys 209\u003c\/p\u003e \u003cp\u003eStoring Business Keys 210\u003c\/p\u003e \u003cp\u003eLooking Up Keys with Kettle 210\u003c\/p\u003e \u003cp\u003eGenerating Surrogate Keys 210\u003c\/p\u003e \u003cp\u003eThe “Add sequence” Step 211\u003c\/p\u003e \u003cp\u003eWorking with auto_increment or IDENTITY Columns 217\u003c\/p\u003e \u003cp\u003eKeys for Slowly Changing Dimensions 217\u003c\/p\u003e \u003cp\u003eLoading Dimension Tables 218\u003c\/p\u003e \u003cp\u003eSnowflaked Dimension Tables 218\u003c\/p\u003e \u003cp\u003eTop-Down Level-Wise Loading 219\u003c\/p\u003e \u003cp\u003eSakila Snowflake Example 219\u003c\/p\u003e \u003cp\u003eSample Transformation 221\u003c\/p\u003e \u003cp\u003eDatabase Lookup Configuration 222\u003c\/p\u003e \u003cp\u003eSample Job 225\u003c\/p\u003e \u003cp\u003eStar Schema Dimension Tables 226\u003c\/p\u003e \u003cp\u003eDenormalization 226\u003c\/p\u003e \u003cp\u003eDenormalizing to 1NF with the “Database lookup” Step 226\u003c\/p\u003e \u003cp\u003eChange Data Capture 227\u003c\/p\u003e \u003cp\u003eSlowly Changing Dimensions 228\u003c\/p\u003e \u003cp\u003eTypes of Slowly Changing Dimensions 228\u003c\/p\u003e \u003cp\u003eType 1 Slowly Changing Dimensions 229\u003c\/p\u003e \u003cp\u003eThe Insert \/ Update Step 229\u003c\/p\u003e \u003cp\u003eType 2 Slowly Changing Dimensions 232\u003c\/p\u003e \u003cp\u003eThe “Dimension lookup \/ update” Step 232\u003c\/p\u003e \u003cp\u003eOther Types of Slowly Changing Dimensions 237\u003c\/p\u003e \u003cp\u003eType 3 Slowly Changing Dimensions 237\u003c\/p\u003e \u003cp\u003eHybrid Slowly Changing Dimensions 238\u003c\/p\u003e \u003cp\u003eMore Dimensions 239\u003c\/p\u003e \u003cp\u003eGenerated Dimensions 239\u003c\/p\u003e \u003cp\u003eDate and Time Dimensions 239\u003c\/p\u003e \u003cp\u003eGenerated Mini-Dimensions 239\u003c\/p\u003e \u003cp\u003eJunk Dimensions 241\u003c\/p\u003e \u003cp\u003eRecursive Hierarchies 242\u003c\/p\u003e \u003cp\u003eSummary 243\u003c\/p\u003e \u003cp\u003eChapter 9 Loading Fact Tables 245\u003c\/p\u003e \u003cp\u003eLoading in Bulk 246\u003c\/p\u003e \u003cp\u003eSTDIN and FIFO 247\u003c\/p\u003e \u003cp\u003eKettle Bulk Loaders 248\u003c\/p\u003e \u003cp\u003eMySQL Bulk Loading 249\u003c\/p\u003e \u003cp\u003eLucidDB Bulk Loader 249\u003c\/p\u003e \u003cp\u003eOracle Bulk Loader 249\u003c\/p\u003e \u003cp\u003ePostgreSQL Bulk Loader 250\u003c\/p\u003e \u003cp\u003eTable Output Step 250\u003c\/p\u003e \u003cp\u003eGeneral Bulk Load Considerations 250\u003c\/p\u003e \u003cp\u003eDimension Lookups 251\u003c\/p\u003e \u003cp\u003eMaintaining Referential Integrity 251\u003c\/p\u003e \u003cp\u003eThe Surrogate Key Pipeline 252\u003c\/p\u003e \u003cp\u003eUsing In-Memory Lookups 253\u003c\/p\u003e \u003cp\u003eStream Lookups 253\u003c\/p\u003e \u003cp\u003eLate-Arriving Data 255\u003c\/p\u003e \u003cp\u003eLate-Arriving Facts 256\u003c\/p\u003e \u003cp\u003eLate-Arriving Dimensions 256\u003c\/p\u003e \u003cp\u003eFact Table Handling 260\u003c\/p\u003e \u003cp\u003ePeriodic and Accumulating Snapshots 260\u003c\/p\u003e \u003cp\u003eIntroducing State-Oriented Fact Tables 261\u003c\/p\u003e \u003cp\u003eLoading Periodic Snapshots 263\u003c\/p\u003e \u003cp\u003eLoading Accumulating Snapshots 264\u003c\/p\u003e \u003cp\u003eLoading State-Oriented Fact Tables 265\u003c\/p\u003e \u003cp\u003eLoading Aggregate Tables 266\u003c\/p\u003e \u003cp\u003eSummary 267\u003c\/p\u003e \u003cp\u003eChapter 10 Working with OLAP Data 269\u003c\/p\u003e \u003cp\u003eOLAP Benefits and Challenges 270\u003c\/p\u003e \u003cp\u003eOLAP Storage Types 272\u003c\/p\u003e \u003cp\u003ePositioning OLAP 272\u003c\/p\u003e \u003cp\u003eKettle OLAP Options 273\u003c\/p\u003e \u003cp\u003eWorking with Mondrian 274\u003c\/p\u003e \u003cp\u003eWorking with XML\/A Servers 277\u003c\/p\u003e \u003cp\u003eWorking with Palo 282\u003c\/p\u003e \u003cp\u003eSetting Up the Palo Connection 283\u003c\/p\u003e \u003cp\u003ePalo Architecture 284\u003c\/p\u003e \u003cp\u003eReading Palo Data 285\u003c\/p\u003e \u003cp\u003eWriting Palo Data 289\u003c\/p\u003e \u003cp\u003eSummary 291\u003c\/p\u003e \u003cp\u003ePart III Management and Deployment 293\u003c\/p\u003e \u003cp\u003eChapter 11 ETL Development Lifecycle 295\u003c\/p\u003e \u003cp\u003eSolution Design 295\u003c\/p\u003e \u003cp\u003eBest and Bad Practices 296\u003c\/p\u003e \u003cp\u003eData Mapping 297\u003c\/p\u003e \u003cp\u003eNaming and Commentary Conventions 298\u003c\/p\u003e \u003cp\u003eCommon Pitfalls 299\u003c\/p\u003e \u003cp\u003eETL Flow Design 300\u003c\/p\u003e \u003cp\u003eReusability and Maintainability 300\u003c\/p\u003e \u003cp\u003eAgile Development 301\u003c\/p\u003e \u003cp\u003eTesting and Debugging 306\u003c\/p\u003e \u003cp\u003eTest Activities 307\u003c\/p\u003e \u003cp\u003eETL Testing 308\u003c\/p\u003e \u003cp\u003eTest Data Requirements 308\u003c\/p\u003e \u003cp\u003eTesting for Completeness 309\u003c\/p\u003e \u003cp\u003eTesting Data Transformations 311\u003c\/p\u003e \u003cp\u003eTest Automation and Continuous Integration 311\u003c\/p\u003e \u003cp\u003eUpgrade Tests 312\u003c\/p\u003e \u003cp\u003eDebugging 312\u003c\/p\u003e \u003cp\u003eDocumenting the Solution 315\u003c\/p\u003e \u003cp\u003eWhy Isn’t There Any Documentation? 316\u003c\/p\u003e \u003cp\u003eMyth 1: My Software Is Self-Explanatory 316\u003c\/p\u003e \u003cp\u003eMyth 2: Documentation Is Always Outdated 316\u003c\/p\u003e \u003cp\u003eMyth 3: Who Reads Documentation Anyway? 317\u003c\/p\u003e \u003cp\u003eKettle Documentation Features 317\u003c\/p\u003e \u003cp\u003eGenerating Documentation 319\u003c\/p\u003e \u003cp\u003eSummary 320\u003c\/p\u003e \u003cp\u003eChapter 12 Scheduling and Monitoring 321\u003c\/p\u003e \u003cp\u003eScheduling 321\u003c\/p\u003e \u003cp\u003eOperating System–Level Scheduling 322\u003c\/p\u003e \u003cp\u003eExecuting Kettle Jobs and Transformations from\u003c\/p\u003e \u003cp\u003ethe Command Line 322\u003c\/p\u003e \u003cp\u003eUNIX-Based Systems: cron 326\u003c\/p\u003e \u003cp\u003eWindows: The at utility and the Task Scheduler 327\u003c\/p\u003e \u003cp\u003eUsing Pentaho’s Built-in Scheduler 327\u003c\/p\u003e \u003cp\u003eCreating an Action Sequence to Run Kettle Jobs and\u003c\/p\u003e \u003cp\u003eTransformations 328\u003c\/p\u003e \u003cp\u003eKettle Transformations in Action Sequences 329\u003c\/p\u003e \u003cp\u003eCreating and Maintaining Schedules with the\u003c\/p\u003e \u003cp\u003eAdministration Console 330\u003c\/p\u003e \u003cp\u003eAttaching an Action Sequence to a Schedule 333\u003c\/p\u003e \u003cp\u003eMonitoring 333\u003c\/p\u003e \u003cp\u003eLogging 333\u003c\/p\u003e \u003cp\u003eInspecting the Log 333\u003c\/p\u003e \u003cp\u003eLogging Levels 335\u003c\/p\u003e \u003cp\u003eWriting Custom Messages to the Log 336\u003c\/p\u003e \u003cp\u003eE‑mail Notifications 336\u003c\/p\u003e \u003cp\u003eConfiguring the Mail Job Entry 337\u003c\/p\u003e \u003cp\u003eSummary 340\u003c\/p\u003e \u003cp\u003eChapter 13 Versioning and Migration 341\u003c\/p\u003e \u003cp\u003eVersion Control Systems 341\u003c\/p\u003e \u003cp\u003eFile-Based Version Control Systems 342\u003c\/p\u003e \u003cp\u003eOrganization 342\u003c\/p\u003e \u003cp\u003eLeading File-Based VCSs 343\u003c\/p\u003e \u003cp\u003eContent Management Systems 344\u003c\/p\u003e \u003cp\u003eKettle Metadata 344\u003c\/p\u003e \u003cp\u003eKettle XML Metadata 345\u003c\/p\u003e \u003cp\u003eTransformation XML 345\u003c\/p\u003e \u003cp\u003eJob XML 346\u003c\/p\u003e \u003cp\u003eGlobal Replace 347\u003c\/p\u003e \u003cp\u003eKettle Repository Metadata 348\u003c\/p\u003e \u003cp\u003eThe Kettle Database Repository Type 348\u003c\/p\u003e \u003cp\u003eThe Kettle File Repository Type 349\u003c\/p\u003e \u003cp\u003eThe Kettle Enterprise Repository Type 350\u003c\/p\u003e \u003cp\u003eManaging Repositories 350\u003c\/p\u003e \u003cp\u003eExporting and Importing Repositories 350\u003c\/p\u003e \u003cp\u003eUpgrading Your Repository 351\u003c\/p\u003e \u003cp\u003eVersion Migration System 352\u003c\/p\u003e \u003cp\u003eManaging XML Files 352\u003c\/p\u003e \u003cp\u003eManaging Repositories 352\u003c\/p\u003e \u003cp\u003eParameterizing Your Solution 353\u003c\/p\u003e \u003cp\u003eSummary 356\u003c\/p\u003e \u003cp\u003eChapter 14 Lineage and Auditing 357\u003c\/p\u003e \u003cp\u003eBatch-Level Lineage Extraction 358\u003c\/p\u003e \u003cp\u003eLineage 359\u003c\/p\u003e \u003cp\u003eLineage Information 359\u003c\/p\u003e \u003cp\u003eImpact Analysis Information 361\u003c\/p\u003e \u003cp\u003eLogging and Operational Metadata 363\u003c\/p\u003e \u003cp\u003eLogging Basics 363\u003c\/p\u003e \u003cp\u003eLogging Architecture 364\u003c\/p\u003e \u003cp\u003eSetting a Maximum Buffer Size 365\u003c\/p\u003e \u003cp\u003eSetting a Maximum Log Line Age 365\u003c\/p\u003e \u003cp\u003eLog Channels 366\u003c\/p\u003e \u003cp\u003eLog Text Capturing in a Job 366\u003c\/p\u003e \u003cp\u003eLogging Tables 367\u003c\/p\u003e \u003cp\u003eTransformation Logging Tables 367\u003c\/p\u003e \u003cp\u003eJob Logging Tables 373\u003c\/p\u003e \u003cp\u003eSummary 374\u003c\/p\u003e \u003cp\u003ePart IV Performance and Scalability 375\u003c\/p\u003e \u003cp\u003eChapter 15 Performance Tuning 377\u003c\/p\u003e \u003cp\u003eTransformation Performance: Finding the Weakest Link 377\u003c\/p\u003e \u003cp\u003eFinding Bottlenecks by Simplifying 379\u003c\/p\u003e \u003cp\u003eFinding Bottlenecks by Measuring 380\u003c\/p\u003e \u003cp\u003eCopying Rows of Data 382\u003c\/p\u003e \u003cp\u003eImproving Transformation Performance 384\u003c\/p\u003e \u003cp\u003eImproving Performance in Reading Text Files 384\u003c\/p\u003e \u003cp\u003eUsing Lazy Conversion for Reading Text Files 385\u003c\/p\u003e \u003cp\u003eSingle-File Parallel Reading 385\u003c\/p\u003e \u003cp\u003eMulti-File Parallel Reading 386\u003c\/p\u003e \u003cp\u003eConfiguring the NIO Block Size 386\u003c\/p\u003e \u003cp\u003eChanging Disks and Reading Text Files 386\u003c\/p\u003e \u003cp\u003eImproving Performance in Writing Text Files 387\u003c\/p\u003e \u003cp\u003eUsing Lazy Conversion for Writing Text Files 387\u003c\/p\u003e \u003cp\u003eParallel Files Writing 387\u003c\/p\u003e \u003cp\u003eChanging Disks and Writing Text Files 387\u003c\/p\u003e \u003cp\u003eImproving Database Performance 388\u003c\/p\u003e \u003cp\u003eAvoiding Dynamic SQL 388\u003c\/p\u003e \u003cp\u003eHandling Roundtrips 388\u003c\/p\u003e \u003cp\u003eHandling Relational Databases 390\u003c\/p\u003e \u003cp\u003eSorting Data 392\u003c\/p\u003e \u003cp\u003eSorting on the Database 393\u003c\/p\u003e \u003cp\u003eSorting in Parallel 393\u003c\/p\u003e \u003cp\u003eReducing CPU Usage 394\u003c\/p\u003e \u003cp\u003eOptimizing the Use of JavaScript 394\u003c\/p\u003e \u003cp\u003eLaunching Multiple Copies of a Step 396\u003c\/p\u003e \u003cp\u003eSelecting and Removing Values 397\u003c\/p\u003e \u003cp\u003eManaging Thread Priorities 397\u003c\/p\u003e \u003cp\u003eAdding Static Data to Rows of Data 397\u003c\/p\u003e \u003cp\u003eLimiting the Number of Step Copies 398\u003c\/p\u003e \u003cp\u003eAvoiding Excessive Logging 398\u003c\/p\u003e \u003cp\u003eImproving Job Performance 399\u003c\/p\u003e \u003cp\u003eLoops in Jobs 399\u003c\/p\u003e \u003cp\u003eDatabase Connection Pools 400\u003c\/p\u003e \u003cp\u003eSummary 401\u003c\/p\u003e \u003cp\u003eChapter 16 Parallelization, Clustering, and Partitioning 403\u003c\/p\u003e \u003cp\u003eMulti-Threading 403\u003c\/p\u003e \u003cp\u003eRow Distribution 404\u003c\/p\u003e \u003cp\u003eRow Merging 405\u003c\/p\u003e \u003cp\u003eRow Redistribution 406\u003c\/p\u003e \u003cp\u003eData Pipelining 407\u003c\/p\u003e \u003cp\u003eConsequences of Multi-Threading 408\u003c\/p\u003e \u003cp\u003eDatabase Connections 408\u003c\/p\u003e \u003cp\u003eOrder of Execution 409\u003c\/p\u003e \u003cp\u003eParallel Execution in a Job 411\u003c\/p\u003e \u003cp\u003eUsing Carte as a Slave Server 411\u003c\/p\u003e \u003cp\u003eThe Configuration File 411\u003c\/p\u003e \u003cp\u003eDefining Slave Servers 412\u003c\/p\u003e \u003cp\u003eRemote Execution 413\u003c\/p\u003e \u003cp\u003eMonitoring Slave Servers 413\u003c\/p\u003e \u003cp\u003eCarte Security 414\u003c\/p\u003e \u003cp\u003eServices 414\u003c\/p\u003e \u003cp\u003eClustering Transformations 417\u003c\/p\u003e \u003cp\u003eDefining a Cluster Schema 417\u003c\/p\u003e \u003cp\u003eDesigning Clustered Transformations 418\u003c\/p\u003e \u003cp\u003eExecution and Monitoring 420\u003c\/p\u003e \u003cp\u003eMetadata Transformations 421\u003c\/p\u003e \u003cp\u003eRules 422\u003c\/p\u003e \u003cp\u003eData Pipelining 425\u003c\/p\u003e \u003cp\u003ePartitioning 425\u003c\/p\u003e \u003cp\u003eDefining a Partitioning Schema 425\u003c\/p\u003e \u003cp\u003eObjectives of Partitioning 427\u003c\/p\u003e \u003cp\u003eImplementing Partitioning 428\u003c\/p\u003e \u003cp\u003eInternal Variables 428\u003c\/p\u003e \u003cp\u003eDatabase Partitions 429\u003c\/p\u003e \u003cp\u003ePartitioning in a Clustered Transformation 430\u003c\/p\u003e \u003cp\u003eSummary 430\u003c\/p\u003e \u003cp\u003eChapter 17 Dynamic Clustering in the Cloud 433\u003c\/p\u003e \u003cp\u003eDynamic Clustering 433\u003c\/p\u003e \u003cp\u003eSetting Up a Dynamic Cluster 434\u003c\/p\u003e \u003cp\u003eUsing the Dynamic Cluster 436\u003c\/p\u003e \u003cp\u003eCloud Computing 437\u003c\/p\u003e \u003cp\u003eEC2 438\u003c\/p\u003e \u003cp\u003eGetting Started with EC2 438\u003c\/p\u003e \u003cp\u003eCosts 438\u003c\/p\u003e \u003cp\u003eCustomizing an AMI 439\u003c\/p\u003e \u003cp\u003ePackaging a New AMI 442\u003c\/p\u003e \u003cp\u003eTerminating an AMI 442\u003c\/p\u003e \u003cp\u003eRunning a Master 442\u003c\/p\u003e \u003cp\u003eRunning the Slaves 443\u003c\/p\u003e \u003cp\u003eUsing the EC2 Cluster 444\u003c\/p\u003e \u003cp\u003eMonitoring 445\u003c\/p\u003e \u003cp\u003eThe Lightweight Principle and Persistence Options 446\u003c\/p\u003e \u003cp\u003eSummary 447\u003c\/p\u003e \u003cp\u003eChapter 18 Real-Time Data Integration 449\u003c\/p\u003e \u003cp\u003eIntroduction to Real-Time ETL 449\u003c\/p\u003e \u003cp\u003eReal-Time Challenges 450\u003c\/p\u003e \u003cp\u003eRequirements 451\u003c\/p\u003e \u003cp\u003eTransformation Streaming 452\u003c\/p\u003e \u003cp\u003eA Practical Example of Transformation Streaming 454\u003c\/p\u003e \u003cp\u003eDebugging 457\u003c\/p\u003e \u003cp\u003eThird-Party Software and Real-Time Integration 458\u003c\/p\u003e \u003cp\u003eJava Message Service 459\u003c\/p\u003e \u003cp\u003eCreating a JMS Connection and Session 459\u003c\/p\u003e \u003cp\u003eConsuming Messages 460\u003c\/p\u003e \u003cp\u003eProducing Messages 460\u003c\/p\u003e \u003cp\u003eClosing Shop 460\u003c\/p\u003e \u003cp\u003eSummary 461\u003c\/p\u003e \u003cp\u003ePart V Advanced Topics 463\u003c\/p\u003e \u003cp\u003eChapter 19 Data Vault Management 465\u003c\/p\u003e \u003cp\u003eIntroduction to Data Vault Modeling 466\u003c\/p\u003e \u003cp\u003eDo You Need a Data Vault? 466\u003c\/p\u003e \u003cp\u003eData Vault Building Blocks 467\u003c\/p\u003e \u003cp\u003eHubs 467\u003c\/p\u003e \u003cp\u003eLinks 468\u003c\/p\u003e \u003cp\u003eSatellites 469\u003c\/p\u003e \u003cp\u003eData Vault Characteristics 471\u003c\/p\u003e \u003cp\u003eBuilding a Data Vault 471\u003c\/p\u003e \u003cp\u003eTransforming Sakila to the Data Vault Model 472\u003c\/p\u003e \u003cp\u003eSakila Hubs 472\u003c\/p\u003e \u003cp\u003eSakila Links 473\u003c\/p\u003e \u003cp\u003eSakila Satellites 474\u003c\/p\u003e \u003cp\u003eLoading the Data Vault: A Sample ETL Solution 477\u003c\/p\u003e \u003cp\u003eInstalling the Sakila Data Vault 477\u003c\/p\u003e \u003cp\u003eSetting Up the ETL Solution 477\u003c\/p\u003e \u003cp\u003eCreating a Database Account 477\u003c\/p\u003e \u003cp\u003eThe Sample ETL Data Vault Solution 478\u003c\/p\u003e \u003cp\u003eSample Hub: hub_actor 478\u003c\/p\u003e \u003cp\u003eSample Link: link_customer_store 480\u003c\/p\u003e \u003cp\u003eSample Satellite: sat_actor 483\u003c\/p\u003e \u003cp\u003eLoading the Data Vault Tables 485\u003c\/p\u003e \u003cp\u003eUpdating a Data Mart from a Data Vault 486\u003c\/p\u003e \u003cp\u003eThe Sample ETL Solution 486\u003c\/p\u003e \u003cp\u003eThe dim_actor Transformation 486\u003c\/p\u003e \u003cp\u003eThe dim_customer Transformation 488\u003c\/p\u003e \u003cp\u003eThe dim_film Transformation 492\u003c\/p\u003e \u003cp\u003eThe dim_film_actor_bridge Transformation 492\u003c\/p\u003e \u003cp\u003eThe fact_rental Transformation 493\u003c\/p\u003e \u003cp\u003eLoading the Star Schema Tables 495\u003c\/p\u003e \u003cp\u003eSummary 495\u003c\/p\u003e \u003cp\u003eChapter 20 Handling Complex Data Formats 497\u003c\/p\u003e \u003cp\u003eNon-Relational and Non-Tabular Data Formats 498\u003c\/p\u003e \u003cp\u003eNon-Relational Tabular Formats 498\u003c\/p\u003e \u003cp\u003eHandling Multi-Valued Attributes 498\u003c\/p\u003e \u003cp\u003eUsing the Split Field to Rows Step 499\u003c\/p\u003e \u003cp\u003eHandling Repeating Groups 500\u003c\/p\u003e \u003cp\u003eUsing the Row Normaliser Step 500\u003c\/p\u003e \u003cp\u003eSemi- and Unstructured Data 501\u003c\/p\u003e \u003cp\u003eKettle Regular Expression Example 503\u003c\/p\u003e \u003cp\u003eConfiguring the Regex Evaluation Step 504\u003c\/p\u003e \u003cp\u003eVerifying the Match 507\u003c\/p\u003e \u003cp\u003eKey\/Value Pairs 508\u003c\/p\u003e \u003cp\u003eKettle Key\/Value Pairs Example 509\u003c\/p\u003e \u003cp\u003eText File Input 509\u003c\/p\u003e \u003cp\u003eRegex Evaluation 510\u003c\/p\u003e \u003cp\u003eGrouping Lines into Records 511\u003c\/p\u003e \u003cp\u003eDenormaliser: Turning Rows into Columns 512\u003c\/p\u003e \u003cp\u003eSummary 513\u003c\/p\u003e \u003cp\u003eChapter 21 Web Services 515\u003c\/p\u003e \u003cp\u003eWeb Pages and Web Services 515\u003c\/p\u003e \u003cp\u003eKettle Web Features 516\u003c\/p\u003e \u003cp\u003eGeneral HTTP Steps 516\u003c\/p\u003e \u003cp\u003eSimple Object Access Protocol 517\u003c\/p\u003e \u003cp\u003eReally Simple Syndication 517\u003c\/p\u003e \u003cp\u003eApache Virtual File System Integration 517\u003c\/p\u003e \u003cp\u003eData Formats 517\u003c\/p\u003e \u003cp\u003eXML 518\u003c\/p\u003e \u003cp\u003eKettle Steps for Working with XML 518\u003c\/p\u003e \u003cp\u003eKettle Job Entries for XML 519\u003c\/p\u003e \u003cp\u003eHTML 520\u003c\/p\u003e \u003cp\u003eJavaScript Object Notation 520\u003c\/p\u003e \u003cp\u003eSyntax 521\u003c\/p\u003e \u003cp\u003eJSON, Kettle, and ETL\/DI 522\u003c\/p\u003e \u003cp\u003eXML Examples 523\u003c\/p\u003e \u003cp\u003eExample XML Document 523\u003c\/p\u003e \u003cp\u003eXML Document Structure 523\u003c\/p\u003e \u003cp\u003eMapping to the Sakila Sample Database 524\u003c\/p\u003e \u003cp\u003eExtracting Data from XML 525\u003c\/p\u003e \u003cp\u003eOverall Design: The import_xml_into_db Transformation 526\u003c\/p\u003e \u003cp\u003eUsing the XSD Validator Step 528\u003c\/p\u003e \u003cp\u003eUsing the “Get Data from XML” Step 530\u003c\/p\u003e \u003cp\u003eGenerating XML Documents 537\u003c\/p\u003e \u003cp\u003eOverall Design: The export_xml_from_db Transformation 537\u003c\/p\u003e \u003cp\u003eGenerating XML with the Add XML Step 538\u003c\/p\u003e \u003cp\u003eUsing the XML Join Step 541\u003c\/p\u003e \u003cp\u003eSOAP Examples 544\u003c\/p\u003e \u003cp\u003eUsing the “Web services lookup” Step 544\u003c\/p\u003e \u003cp\u003eConfiguring the “Web services lookup” Step 544\u003c\/p\u003e \u003cp\u003eAccessing SOAP Services Directly 546\u003c\/p\u003e \u003cp\u003eJSON Example 549\u003c\/p\u003e \u003cp\u003eThe Freebase Project 549\u003c\/p\u003e \u003cp\u003eFreebase Versus Wikipedia 549\u003c\/p\u003e \u003cp\u003eFreebase Web Services 550\u003c\/p\u003e \u003cp\u003eThe Freebase Read Service 550\u003c\/p\u003e \u003cp\u003eThe Metaweb Query Language 551\u003c\/p\u003e \u003cp\u003eExtracting Freebase Data with Kettle 553\u003c\/p\u003e \u003cp\u003eGenerate Rows 554\u003c\/p\u003e \u003cp\u003eIssuing a Freebase Read Request 555\u003c\/p\u003e \u003cp\u003eProcessing the Freebase Result Envelope 556\u003c\/p\u003e \u003cp\u003eFiltering Out the Original Row 557\u003c\/p\u003e \u003cp\u003eStoring to File 558\u003c\/p\u003e \u003cp\u003eRSS 558\u003c\/p\u003e \u003cp\u003eRSS Structure 558\u003c\/p\u003e \u003cp\u003eChannel 558\u003c\/p\u003e \u003cp\u003eItem 559\u003c\/p\u003e \u003cp\u003eRSS Support in Kettle 560\u003c\/p\u003e \u003cp\u003eRSS Input 561\u003c\/p\u003e \u003cp\u003eRSS Output 562\u003c\/p\u003e \u003cp\u003eSummary 567\u003c\/p\u003e \u003cp\u003eChapter 22 Kettle Integration 569\u003c\/p\u003e \u003cp\u003eThe Kettle API 569\u003c\/p\u003e \u003cp\u003eThe LGPL License 569\u003c\/p\u003e \u003cp\u003eThe Kettle Java API 570\u003c\/p\u003e \u003cp\u003eSource Code 570\u003c\/p\u003e \u003cp\u003eBuilding Kettle 571\u003c\/p\u003e \u003cp\u003eBuilding javadoc 571\u003c\/p\u003e \u003cp\u003eLibraries and the Class Path 571\u003c\/p\u003e \u003cp\u003eExecuting Existing Transformations and Jobs 571\u003c\/p\u003e \u003cp\u003eExecuting a Transformation 572\u003c\/p\u003e \u003cp\u003eExecuting a Job 573\u003c\/p\u003e \u003cp\u003eEmbedding Kettle 574\u003c\/p\u003e \u003cp\u003ePentaho Reporting 574\u003c\/p\u003e \u003cp\u003ePutting Data into a Transformation 576\u003c\/p\u003e \u003cp\u003eDynamic Transformations 580\u003c\/p\u003e \u003cp\u003eDynamic Template 583\u003c\/p\u003e \u003cp\u003eDynamic Jobs 584\u003c\/p\u003e \u003cp\u003eExecuting Dynamic ETL in Kettle 586\u003c\/p\u003e \u003cp\u003eResult 587\u003c\/p\u003e \u003cp\u003eReplacing Metadata 588\u003c\/p\u003e \u003cp\u003eDirect Changes with the API 589\u003c\/p\u003e \u003cp\u003eUsing a Shared Objects File 589\u003c\/p\u003e \u003cp\u003eOEM Versions and Forks 590\u003c\/p\u003e \u003cp\u003eCreating an OEM Version of PDI 590\u003c\/p\u003e \u003cp\u003eForking Kettle 591\u003c\/p\u003e \u003cp\u003eSummary 592\u003c\/p\u003e \u003cp\u003eChapter 23 Extending Kettle 593\u003c\/p\u003e \u003cp\u003ePlugin Architecture Overview 593\u003c\/p\u003e \u003cp\u003ePlugin Types 594\u003c\/p\u003e \u003cp\u003eArchitecture 595\u003c\/p\u003e \u003cp\u003ePrerequisites 596\u003c\/p\u003e \u003cp\u003eKettle API Documentation 596\u003c\/p\u003e \u003cp\u003eLibraries 596\u003c\/p\u003e \u003cp\u003eIntegrated Development Environment 596\u003c\/p\u003e \u003cp\u003eEclipse Project Setup 597\u003c\/p\u003e \u003cp\u003eExamples 598\u003c\/p\u003e \u003cp\u003eTransformation Step Plugins 599\u003c\/p\u003e \u003cp\u003eStepMetaInterface 599\u003c\/p\u003e \u003cp\u003eValue Metadata 605\u003c\/p\u003e \u003cp\u003eRow Metadata 606\u003c\/p\u003e \u003cp\u003eStepDataInterface 607\u003c\/p\u003e \u003cp\u003eStepDialogInterface 607\u003c\/p\u003e \u003cp\u003eEclipse SWT 607\u003c\/p\u003e \u003cp\u003eForm Layout 607\u003c\/p\u003e \u003cp\u003eKettle UI Elements 609\u003c\/p\u003e \u003cp\u003eHello World Example Dialog 609\u003c\/p\u003e \u003cp\u003eStepInterface 614\u003c\/p\u003e \u003cp\u003eReading Rows from Specific Steps 616\u003c\/p\u003e \u003cp\u003eWriting Rows to Specific Steps 616\u003c\/p\u003e \u003cp\u003eWriting Rows to Error Handling 617\u003c\/p\u003e \u003cp\u003eIdentifying a Step Copy 617\u003c\/p\u003e \u003cp\u003eResult Feedback 618\u003c\/p\u003e \u003cp\u003eVariable Substitution 618\u003c\/p\u003e \u003cp\u003eApache VFS 619\u003c\/p\u003e \u003cp\u003eStep Plugin Deployment 619\u003c\/p\u003e \u003cp\u003eThe User-Defined Java Class Step 620\u003c\/p\u003e \u003cp\u003ePassing Metadata 620\u003c\/p\u003e \u003cp\u003eAccessing Input and Fields 620\u003c\/p\u003e \u003cp\u003eSnippets 620\u003c\/p\u003e \u003cp\u003eExample 620\u003c\/p\u003e \u003cp\u003eJob Entry Plugins 621\u003c\/p\u003e \u003cp\u003eJobEntryInterface 622\u003c\/p\u003e \u003cp\u003eJobEntryDialogInterface 624\u003c\/p\u003e \u003cp\u003ePartitioning Method Plugins 624\u003c\/p\u003e \u003cp\u003ePartitioner 625\u003c\/p\u003e \u003cp\u003eRepository Type Plugins 626\u003c\/p\u003e \u003cp\u003eDatabase Type Plugins 627\u003c\/p\u003e \u003cp\u003eSummary 628\u003c\/p\u003e \u003cp\u003eAppendix A The Kettle Ecosystem 629\u003c\/p\u003e \u003cp\u003eKettle Development and Versions 629\u003c\/p\u003e \u003cp\u003eThe Pentaho Community Wiki 631\u003c\/p\u003e \u003cp\u003eUsing the Forums 631\u003c\/p\u003e \u003cp\u003eJira 632\u003c\/p\u003e \u003cp\u003e##pentaho 633\u003c\/p\u003e \u003cp\u003eAppendix B Kettle Enterprise Edition Features 635\u003c\/p\u003e \u003cp\u003eAppendix C Built-in Variables and Properties Reference 637\u003c\/p\u003e \u003cp\u003eInternal Variables 637\u003c\/p\u003e \u003cp\u003eKettle Variables 640\u003c\/p\u003e \u003cp\u003eVariables for Configuring VFS 641\u003c\/p\u003e \u003cp\u003eNoteworthy JRE Variables 642\u003c\/p\u003e \u003cp\u003eIndex 643\u003c\/p\u003e \u003cp\u003eMatt Casters is Founder of Kettle and works as Chief Data Integration at Pentaho, where he leads Kettle software development. Roland Bouman is an application developer focusing on open source web technology, databases, and business intelligence. Jos van Dongen is an independent business intelligence consultant and well-known author, analyst, and presenter. \u003c\/p\u003e   \u003cp\u003eThe ultimate resource on building and deploying data integration solutions with Kettle\u003c\/p\u003e \u003cp\u003eKettle is a scaleable and extensible open source ETL and data integration tool that lets you extract data from databases, flat and XML files, web services, ERP systems, and OLAP cubes. It provides over 120 built-in transformation steps to validate, cleanse, and conform data, as well as numerous options to load data into data warehouses and many other targets. Kettle is a comprehensive, low-cost alternative to traditional data integration tools like Informatica PowerCenter, IBM InfoSphere DataStage, and BusinessObjects Data Integrator.\u003c\/p\u003e \u003cp\u003eThis book explains in detail how to use Kettle to create, test, and deploy your own ETL and data integration solutions. You'll learn to use Kettle's programs to create transformations and jobs, use version control, audit data, and schedule your ETL solution. Then you'll progress to more advanced concepts such as clustering and cloud computing, real-time data integration, loading a Data Vault model, and extending Kettle by building your own plugins. In addition, you'll find hands-on examples and case studies that show exactly how to put Kettle's features into practice.\u003c\/p\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eExplore the components of the Kettle ETL toolset\u003c\/p\u003e \u003c\/li\u003e \u003cli\u003e \u003cp\u003eDiscover how to install and configure Kettle and connect it to various data sources and targets\u003c\/p\u003e \u003c\/li\u003e \u003cli\u003e \u003cp\u003eDesign and build every aspect of an ETL solution using Kettle\u003c\/p\u003e \u003c\/li\u003e \u003cli\u003e \u003cp\u003eLearn how to load a data warehouse with Kettle\u003c\/p\u003e \u003c\/li\u003e \u003cli\u003e \u003cp\u003eUnderstand the steps for deploying and scheduling ETL solutions\u003c\/p\u003e \u003c\/li\u003e \u003cli\u003e \u003cp\u003eGain the skills to integrate Kettle with third-party products\u003c\/p\u003e \u003c\/li\u003e \u003cli\u003e \u003cp\u003eLearn to extend Kettle and build your own plugins\u003c\/p\u003e \u003c\/li\u003e \u003cli\u003e \u003cp\u003eUse clustering and cloud computing to scale and improve the performance of your Kettle ETL solutions\u003c\/p\u003e \u003c\/li\u003e \u003cli\u003e \u003cp\u003eFind out how to use Kettle for real-time data integration\u003c\/p\u003e \u003c\/li\u003e \u003c\/ul\u003e","brand":"Wiley","offers":[{"title":"Default Title","offer_id":47989759508709,"sku":"NP9780470635179","price":50.0,"currency_code":"USD","in_stock":false}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/1842\/7735\/files\/9780470635179.jpg?v=1761785379","url":"https:\/\/k12savings.com\/products\/pentaho-kettle-solutions-isbn-9780470635179","provider":"K12savings","version":"1.0","type":"link"}