A Join stage is used to join two datasets with different
sets of column headers but with one or more shared headers. It accepts two inputs,
allows configuration of multiple join points and outputs combinations of matched
and unmatched rows.
The Join stage in Quantemplate accepts two inputs, allows
for multiple join points and is able to output multiple combinations of matched
and unmatched rows.
Workspace
Inputs
A join stage supports two input datasets, which are then
defined as Left and Right. The left dataset will appear on the left side of the
output dataset, the right dataset on the right side.
Join points
The join point is the column on which the source datasets
will be combined. When a value matches exactly in both source datasets, the row
will be joined.
Multiple join points can be configured. This allows joining
on multiple criteria, for instance when both the Policy No. and Insured Name match.
The join point column appears in both output datasets. Where
both sides contain a column with the same name, the column from the right side has
the source dataset name appended to it (see example below).
Outputs
The rows contained in output datasets are configured via
three toggles:
Matched Rows from both datasets
Unmatched from Left dataset
Unmatched from Right dataset
This allows the configuration of all standard join types:
Inner Join
Only matched rows from both datasets
Left Outer Join
Matched rows from both datasets, plus unmatched rows from Left dataset
Right Outer Join
Matched rows from both datasets, plus unmatched rows from Right dataset
Full Outer Join
Matched and unmatched rows from both datasets
Full Outer Join with Exclusion
Unmatched rows from both datasets
Left Outer Join with Exclusion
Unmatched rows from Left dataset
Right Outer Join with Exclusion
Unmatched rows from Right dataset
Multiple output files may be configured, for instance:
one file with the results of a Left Outer Join, plus another file with all the
unmatched rows from the join. To avoid creating duplicate output files, identical
output combinations will be exported only once.
Configuring a join
To configure a join:
1.
Insert a join stage.
2.
In the input selector, choose two input
files to join.
3.
Click on ‘Click to configure join’ to enter the join editor.
4.
Select the Left and Right datasets from the input files.
5.
Configure a join point: select the column to join on in the Left and Right datasets.
6.
Configure additional join points if required.
7.
Select what the output dataset should contain (e.g. matched rows, plus unmatched from left).
8.
Configure additional output files if required.
Example
KEY
LEFT
RIGHT
Matched row
One-to-many match
Unmatched row
Input: Table 1
A dataset of the policies written and their Inception and Expiry dates is
configured as the Left dataset in the join.
Policy Ref
Insured
Incept
Expiry
AB123456
General Insure
1/1/2010
31/12/2010
AB123457
Etc Insure
1/1/2010
31/12/2010
AB123458
More Insure
1/1/2010
31/12/2010
AB123459
Less Insure
1/1/2010
31/12/2010
AB123461
XYZ Insure
1/1/2010
31/12/2010
Input: Table 2
A list of contents amounts is configured
as the Right dataset in the join.
Policy Ref
Property
Contents
AB123457
5 Main Road
75,000
AB123457
1 Highway
80,000
AB123458
74 Highstreet
75,000
AB123459
24 Chestnut Avenue
50,000
AB123460
99 Red Balloon Way
50,000
AB123461
12 Grove Road
60,000
AB123462
36 Acacia Avenue
40,000
Output 1: Tables 1 and 2
Include matched rows
Include unmatched rows from left
Exclude unmatched rows from right
Table 1 and Table 2 are joined on Policy Ref, with the
output dataset containing all rows from the Table 1 and matching rows from Table
2. Two matches in were found in Table 2 for AB123457, creating two rows in the
output dataset. No match was found for AB123456 – a row is created in the output
dataset, but the columns from Table 2 are empty.
Note that the Policy Ref column is now duplicated, with the column from Table 2
renamed ‘Policy Ref (Table 2)‘. If desired, this column can be removed via a subsequent
Map Columns operation.
Policy Ref
Insured
Incept
Expiry
Policy Ref (Table 2)
Property
Contents
AB123456
General Insure
1/1/2010
31/12/2010
AB123457
Etc Insure
1/1/2010
31/12/2010
AB123457
5 Main Road
75,000
AB123457
Etc Insure
1/1/2010
31/12/2010
AB123457
1 Highway
80,000
AB123458
More Insure
1/1/2010
31/12/2010
AB123458
74 Highstreet
75,000
AB123459
Less Insure
1/1/2010
31/12/2010
AB123459
24 Chestnut Avenue
50,000
AB123461
XYZ Insure
1/1/2010
31/12/2010
AB123461
12 Grove Road
60,000
Output 2: Unmatched Claims
Exclude matched rows
Exclude unmatched rows from left
Include unmatched rows from right
These items could not be matched with a policy
number in the first dataset, so may require further investigation.