Create bookmark
Pentaho Data Integration 4 Cookbook
Over 70 recipes to solve ETL problems using Pentaho Kettle
Do you like this document?
Notes
Please login to add notes
- Cover
- Copyright
- Credits
- About the Authors
- About the Reviewers
- www.PacktPub.com
- Table of Contents
- Preface
-
+
Chapter 1: Working with Databases
- Introduction
- Connecting to a database
- Getting data from a database
- Getting data from a database by providing parameters
- Getting data from a database by running a query built at runtime
- Inserting or updating rows in a table
- Inserting new rows where a simple primary key has to be generated
- Inserting new rows where the primary key has to be generated based on stored values
- Deleting data from a table
- Creating or altering a database table from PDI (design time)
- Creating or altering a database table from PDI (runtime)
- Inserting, deleting, or updating a table depending on a field
- Changing the database connection at runtime
- Loading a parent-child table
-
+
Chapter 2:
Reading and Writing Files
- Introduction
- Reading a simple file
- Reading several files at the same time
- Reading unstructured files
- Reading files having one field by row
- Reading files with some fields occupying two or more rows
- Writing a simple file
- Writing an unstructured file
- Providing the name of a file (for reading or writing) dynamically
- Using the name of a file (or part of it) as a field
- Reading an Excel file
- Getting the value of specific cells in an Excel file
- Writing an Excel file with several sheets
- Writing an Excel file with a dynamic number of sheets
-
+
Chapter 3:
Manipulating XML Structures
- Introduction
- Reading simple XML files
- Specifying fields by using XPath notation
- Validating well-formed XML files
- Validating an XML file against DTD definitions
- Validating an XML file against an XSD schema
- Generating a simple XML document
- Generating complex XML structures
- Generating an HTML page using XML and XSL transformations
- + Chapter 4: File Management
-
+
Chapter 5:
Looking for Data
- Introduction
- Looking for values in a database table
- Looking for values in a database (with complex conditions or multiple tables involved)
- Looking for values in a database with extreme flexibility
- Looking for values in a variety of sources
- Looking values by proximity
- Looking for values consuming a web service
- Looking for values over an intranet
-
+
Chapter 6:
Understanding Data Flows
- Introduction
- Splitting a stream into two or more streams based on a condition
- Merging rows of two streams with the same or different structures
- Comparing two streams and generating differences
- Generating all possible pairs formed from two datasets
- Joining two or more streams based on given conditions
- Interspersing new rows between existent rows
- Executing steps even when your stream is empty
- Processing rows differently based on the row number
-
+
Chapter 7:
Executing and Reusing Jobs and Transformations
- Introduction
- Executing a job or a transformation by setting static arguments and parameters
- Executing a job or a transformation from a job by setting arguments and parameters dynamically
- Executing a job or a transformation whose name is determined at runtime
- Executing part of a job once for every row in a dataset
- Executing part of a job several times until a condition is true
- Creating a process flow
- Moving part of a transformation to a subtransformation
-
+
Chapter 8:
Integrating Kettle and the Pentaho Suite
- Introduction
- Creating a Pentaho report with data coming from PDI
- Configuring the Pentaho BI Server for running PDI jobs and transformations
- Executing a PDI transformation as part of a Pentaho process
- Executing a PDI job from the Pentaho User Console
- Generating files from the PUC with PDI and the CDA plugin
- Populating a CDF dashboard with data coming from a PDI transformation
-
+
Chapter 9:
Getting the Most Out of Kettle
- Introduction
- Sending e-mails with attached files
- Generating a custom log file
- Programming custom functionality
- Generating sample data for testing purposes
- Working with Json files
- Getting information about transformations and jobs (file-based)
- Getting information about transformations and jobs (repository-based)
- + Appendix: Data Structures
- Index
This book has step-by-step instructions to solve data manipulation problems using PDI in the form of recipes. It has plenty of well-organized tips, screenshots, tables, and examples to aid quick and easy understanding. If you are a software developer or anyone involved or interested in developing ETL solutions, or in general, doing any kind of data manipulation, this book is for you. It does not cover PDI basics, SQL basics, or database concepts. You are expected to have a basic understanding of the PDI tool, SQL language, and databases.
Your free to read time expires in minutes. After that you have to pause for an hour.
Book Details
Authors
Publishers
Publication year : 2011
License: All rights reserved ©
Times read: 68

