Joiner Transformation Overview
Embed Size (px)
Transcript of Joiner Transformation Overview
Joiner Transformation OverviewBy PenchalaRaju.Yanamala Transformation type: Active Connected Use the Joiner transformation to join source data from two related heterogeneous sources residing in different locations or file systems. You can also join data from the same source. The Joiner transformation joins sources with at least one matching column. The Joiner transformation uses a condition that matches one or more pairs of columns between the two sources. The two input pipelines include a master pipeline and a detail pipeline or a master and a detail branch. The master pipeline ends at the Joiner transformation, while the detail pipeline continues to the target.
To join more than two sources in a mapping, join the output from the Joiner transformation with another source pipeline. Add Joiner transformations to the mapping until you have joined all the source pipelines. The Joiner transformation accepts input from most transformations. However, consider the following limitations on the pipelines you connect to the Joiner transformation: You cannot use a Joiner transformation when either input pipeline contains an Update Strategy transformation.
You cannot use a Joiner transformation if you connect a Sequence Generator transformation directly before the Joiner transformation. Working with the Joiner Transformation When you work with the Joiner transformation, you must configure the transformation properties, join type, and join condition. You can configure the Joiner transformation for sorted input to improve Integration Service performance. You can also configure the transformation scope to control how the Integration Service applies transformation logic. To work with the Joiner transformation, complete the following tasks: Configure the Joiner transformation properties. Properties for the Joiner transformation identify the location of the cache directory, how the Integration Service processes the transformation, and how the Integration Service handles caching. For more information, see Joiner Transformation Properties. Configure the join condition. The join condition contains ports from both input sources that must match for the Integration Service to join two rows. Depending on the type of join selected, the Integration Service either adds the row to the result set or discards the row. For more information, see Defining a Join Condition. Configure the join type. A join is a relational operator that combines data from multiple tables in different databases or flat files into a single result set. You can configure the Joiner transformation to use a Normal, Master Outer, Detail Outer, or Full Outer join type. For more information, see Defining the Join Type. Configure the session for sorted or unsorted input. You can improve session performance by configuring the Joiner transformation to use sorted input. To configure a mapping to use sorted data, you establish and maintain a sort order in the mapping so that the Integration Service can use the sorted data when it processes the Joiner transformation. For more information about configuring the Joiner transformation for sorted input, see Using Sorted Input. Configure the transaction scope. When the Integration Service processes a Joiner transformation, it can apply transformation logic to all data in a transaction, all incoming data, or one row of data at a time. For more information about configuring how the Integration Service applies transformation logic, see Working with Transactions. If you have the partitioning option in PowerCenter, you can increase the number of partitions in a pipeline to improve session performance Joiner Transformation Properties Properties for the Joiner transformation identify the location of the cache directory, how the Integration Service processes the transformation, and how the Integration Service handles caching. The properties also determine how the Integration Service joins tables and files. When you create a mapping, you specify the properties for each Joiner transformation. When you create a session, you can override some properties, such as the index and data cache size for each transformation. The following table describes the Joiner transformation properties:
Option Description Case-Sensitive If selected, the Integration Service uses case-sensitive string String Comparison comparisons when performing joins on string columns. Cache Directory Specifies the directory used to cache master or detail rows and the index to these rows. By default, the cache files are created in a directory specified by the process variable $PMCacheDir. If you override the directory, make sure the directory exists and contains enough disk space for the cache files. The directory can be a mapped or mounted drive. Join Type Specifies the type of join: Normal, Master Outer, Detail Outer, or Full Outer. Null Ordering in Not applicable for this transformation type. Master Null Ordering in Not applicable for this transformation type. Detail Tracing Level Amount of detail displayed in the session log for this transformation. The options are Terse, Normal, Verbose Data, and Verbose Initialization. Joiner Data Cache Data cache size for the transformation. Default cache size is Size 2,000,000 bytes. If the total configured cache size is 2 GB or more, you must run the session on a 64-bit Integration Service. You can configure a numeric value, or you can configure the Integration Service to determine the cache size at runtime. If you configure the Integration Service to determine the cache size, you can also configure a maximum amount of memory for the Integration Service to allocate to the cache. Joiner Index Index cache size for the transformation. Default cache size is Cache Size 1,000,000 bytes. If the total configured cache size is 2 GB or more, you must run the session on a 64-bit Integration Service. You can configure a numeric value, or you can configure the Integration Service to determine the cache size at runtime. If you configure the Integration Service to determine the cache size, you can also configure a maximum amount of memory for the Integration Service to allocate to the cache. Sorted Input Specifies that data is sorted. Choose Sorted Input to join sorted data. Using sorted input can improve performance. Master Sort Order Specifies the sort order of the master source data. Choose Ascending if the master source data is in ascending order. If you choose Ascending, also enable sorted input. Default is Auto. Transformation Specifies how the Integration Service applies the transformation Scope logic to incoming data. You can choose Transaction, All Input, or Row. Defining a Join Condition The join condition contains ports from both input sources that must match for the Integration Service to join two rows. Depending on the type of join selected, the Integration Service either adds the row to the result set or discards the row. The Joiner transformation produces result sets based on the join type, condition, and input data sources.
Before you define a join condition, verify that the master and detail sources are configured for optimal performance. During a session, the Integration Service compares each row of the master source against the detail source. To improve performance for an unsorted Joiner transformation, use the source with fewer rows as the master source. To improve performance for a sorted Joiner transformation, use the source with fewer duplicate key values as the master. By default, when you add ports to a Joiner transformation, the ports from the first source pipeline display as detail sources. Adding the ports from the second source pipeline sets them as master sources. To change these settings, click the M column on the Ports tab for the ports you want to set as the master source. This sets ports from this source as master ports and ports from the other source as detail ports. You define one or more conditions based on equality between the specified master and detail sources. For example, if two sources with tables called EMPLOYEE_AGE and EMPLOYEE_POSITION both contain employee ID numbers, the following condition matches rows with employees listed in both sources: EMP_ID1 = EMP_ID2 Use one or more ports from the input sources of a Joiner transformation in the join condition. Additional ports increase the time necessary to join two sources. The order of the ports in the condition can impact the performance of the Joiner transformation. If you use multiple ports in the join condition, the Integration Service compares the ports in the order you specify. The Designer validates datatypes in a condition. Both ports in a condition must have the same datatype. If you need to use two ports in the condition with nonmatching datatypes, convert the datatypes so they match. If you join Char and Varchar datatypes, the Integration Service counts any spaces that pad Char values as part of the string: Char(40) = "abcd" Varchar(40) = "abcd" The Char value is abcd padded with 36 blank spaces, and the Integration Service does not join the two fields because the Char field contains trailing spaces. Note: The Joiner transformation does not match null values. For example, if both EMP_ID1 and EMP_ID2 contain a row with a null value, the Integration Service does not consider them a match and does not join the two rows. To join rows with null values, replace null input with default values, and then join on the default values. Defining the Join Type
In SQL, a join is a relational operator that combines data from multiple tables into a single result set. The Joiner transformation is similar to an SQL join except that data can originate from different types of sources. You define the join type on the Properties tab in the transformation. The Joiner transformation supports the following types of joins: Normal Master Outer Detail Outer Full Outer Note: A normal or master outer join performs faster than a full outer or