operation takes source files with differing header schemas and maps them to a unified
header schema. This allows the datasets to be
For example, you may have three source files, each with a
column of policy numbers, but named differently in each dataset: Policy Number, Pol.
#, Policy Num. These column names can be mapped to a common ‘Policy Number’ field.
A typical workflow incorporating column mapping would be:
Define the data area within each dataset
Map the column headers to a single schema
Union the datasets together
Additionally, columns in the output dataset can be added, removed
or re-ordered, and empty or missing columns in the source data populated with text or numbers.
List of columns in master schema.
Lists of columns in source schemas. Horizontal position of
mapped items indicates their mapping to the master.
The layout shows each column mapping
across the source schemas and the master schema horizontally, making it very
quick to visually deduce the mapping structure across all your sources.
Toggle between mapped and unmapped items.
Switch between a mode which enforces the exact
master schema, or a mode which only maps column header names. See
Enforcing the master schema.
Clicking on an unmapped item suggests mappings in the
master schema, alongside suggestions for new mappings.
On opening the operation, the initial view shows the distinct
column structures across the input files to the operation, with no master schema or
If there are multiple input files that have identical column
structures, then Quantemplate will stack these together, so that only distinct columns
structures are displayed. To see which input files to the operation are stacked in
each distinct column structure, hover over the file name at the top of the column.
Defining the master schema
Define a source schema as a master
The most common scenario is to use one of the displayed
file schemas as the baseline for the master schema and then to map any remaining
column headers from the other source schemas to the master.
Define master from this dataset
for the file schema that you wish to use as a starting point for the master
schema template. This will instantly:
Define the selected source schema as the master
Map all of the column headers in that schema to
the corresponding columns in the master schema.
Maps columns in the other source schemas that
have the same name as the columns in the master schema. The column name is
case-insensitive in the mapping.
Keep a dataset which represents your organisation’s
target mappings in your data repo, and reuse this when you wish to create
Map individual fields
Rather than taking a whole source schema as a starting
point, individual fields can be mapped in from one or more source schemas. See
Mapping in source files below.
Manually enter fields
Alternatively a master schema can be created from scratch
by manually entering fields. To add a new field, click the
button at the bottom of the
master schema, enter the field name and hit return (↵).
Mapping in source files
To view unmapped fields in a source schema click on the
. Clicking on an unmapped
item reveals mapping suggestions.
Map to master
will map the source field to an existing column header in the master schema.
appears under suggestions that are not in the master schema, and will create a
field in the master schema and map the source item to it.
Add to master
will take the exact name of the source column and
add it to the master schema as a new field.
indicates Quantemplate’s statistical confidence level in the suggested mapping.
About suggested mappings
Quantemplate uses machine learning to provide
mapping suggestions. These combine suggestions from items in your master schema with
suggestions from your other pipelines and from organisations who have chosen to share
their mappings with you. Quantemplate learns from the mapping decisions you make to
continuously improve the quality of its suggestions. Mapping data can also be supplied
directly to train the mapping model – contact Quantemplate support for more information.
You can unmap, remap or replace currently mapped column headers.
To see the column headers that are currently mapped
for a given schema, click on the Mapped toggle. Clicking on a
toggle. Clicking on a
mapped item reveals mapping suggestions.
Unmap the item from the master schema. Item returns to the unmapped column.
Remap the item to a different available item in the master schema.
Master items that have items from this source schema already mapped to them are not available.
Replace the item with one of the unmapped items in the source.
. When the Enforce toggle
is on, enforcing the master schema is enabled. This is the default setting.
Schemas of output datasets will match the master schema exactly.
Mapped source column names will be changed.
Unmapped source columns will be removed.
Column order of output files match the master.
Columns can be populated with values.
Enforce disabled (column name mapping only)
Output dataset headers are renamed, schema structure remains the same.
Mapped source column names will be changed to match the master.
Unmapped source columns will be output unchanged.
Column order of output files will be unchanged from the order in the source.
Enforcing the master is automatically enabled when defining a master schema from a source dataset.
For a given source:
Apply the following mappings:
Outputs when enforce option is enabled / disabled:
indicate renamed items.
Default column mapping
Where a master schema has been defined and a new input is added to the stage, the operation will apply the following default mapping rules:
If the new input matches an existing schema, then the columns in the new input will be mapped identically to that schema.
If the new input doesn’t match an existing schema, then each column will be mapped to a matching column name in the master schema. Note that column name matches ignore differences in capitalisation.
A warning will be shown in the stage panel if, after applying the above rules:
There remain entries in the master schema without a mapped column from the new input.
There remain entries in the new input that are unmapped.
Once the mappings have been reviewed in the column header mapping operation the warning will be removed
For a given source:
and the following mappings:
Adding another input with the same schema as the initial source will apply the same set of mappings to that new input.
Adding another input with a different set of column names will apply the default mapping rules, matching column names ignoring capitalisation
For a new source schema:
will result in the mappings:
Populating source fields
is enabled and a source file has no data for a master field, that field will be
output as an empty column by default. This column can be populated with homogenous
data if required. For example, if your master schema contains the field currency, yet
your source data has no currency column, then the empty fields can be populated
with a currency value.
Using the populate function
To populate a source column:
Click on an empty mapped item.
Enter text or numbers then hit return (↵).
The data you have entered is shown in the field, alongside the populate symbol.
To remove populated data
Click on an empty mapped item
Delete the data from the the populate field then hit return (↵)
Populate all unmapped fields at once
Sometimes source datasets may not contain all the columns in your master schema. It’s often useful to populate these blank columns with a single value, such as ‘Unmapped’, so when the data is used in Analyse, these datapoints can be identified and filtered out.
To populate all unmapped fields
Click on the cog next to ‘enforce schema’.
Enter a value for ‘Populate unmapped’. The unmapped fields update live as you type.
Auto-populated unmapped fields can be overridden with an individual populated value. If the value for the auto-populated field changes, the overridden value is retained.
The Automap function automates the column header mapping
process by applying all suggested mappings above a user-specified strength
The Automap preview mode shows the effect of adjusting the
strength threshold on the applied mappings.
Click the Automap button to reveal the automap preview.
Select the desired automap settings via the settings cog.
Drag the slider to the desired threshold.
Click the ‘Apply mappings’ button to apply the mappings
and leave the preview, or click ‘Exit preview’ to disregard the mappings.
Note that for any given schema mapping suggestions and strength
ratings may change over time as the ML-powered decision-making model learns from your
Overwrite existing mappings
Mapped items will be overridden by suggestions.
This is a ‘destructive’ mapping option since any mappings you have made
previously will be overwritten. Off by default.
Overwrite populated fields
Manually populated fields will be overridden by
suggestions. This is a ‘destructive’ mapping option since any populated
fields you have made previously will be overwritten. Off by default.
Create new master fields
Suggestions not in the master schema will be
added to it. Off by default.