Quantemplate breaks the data transformation process down into stages and operations, allowing you to sequence and structure your data transformation process in a way that is easy to browse and navigate.
Operations are individual transformation process, grouped together via a Transform stage. Operations with sophisticated configuration options have an interactive user interface; more straightforward operations are currently configured via a simple script pre-populated with typical default values.
For a full list of operations available in Transform stages, see the Operations Index.
Stages can be visually expanded to show inputs, outputs and configuration options, and can be collapsed to provide an overview the whole process. Operations slide across to reveal their configuration panel.
A Join stage combines two different datasets which have one or more shared columns of data. For example, a premium dataset could be joined to a claims dataset using the policy number column as the join point. A join stage would typically have one or two outputs. Learn more about the Join stage.
A Union stage is used to combine multiple datasets with identical sets of column headers to produce a single output. Because the headers are the same in each source file the rows can be stacked on top of each other to produce a single unified output dataset. Learn more about the Union stage.
Operations to transform your data are accessed and sequenced in Transform stages. Transform stages allow multiple input datasets; the number of output datasets will usually equal the number of input datasets, unless an operation to create additional datasets (such as a pivot operation) has been used.
Data runs through the pipeline in sequential order, from top to bottom in the pipeline view. For the data to run between stages, the outputs of one stage are selected as the inputs to a lower stage.
The names of output files from each stage append the name of the stage to the name of the source file. This allows you to see the transformation steps the data has undergone when selecting stage inputs.
As you make changes to a stage configuration, it may take a few moments for the changes to percolate through the pipeline. The stage spinner indicates that upstream changes are being processed and the data within that stage may not be up to date.
Trace is a feature that enables changes in pipeline configuration to be rapidly previewed downstream. As you build and edit a pipeline, Trace simulates and validates any changes you make along the way, without the need to execute the pipeline. This provides a real time view of the data structure at any point in a pipeline and helps to flag errors prior to running the pipeline.
As you make changes, it may take a few moments for Trace to run through the pipeline. Trace progress is indicated by a spinner on each stage being processed.
To configure a Transform stage:
To configure Join stages, add inputs and click on the join configuration panel. Learn more.
To configure Union stages, simply add inputs – no further configuration required. Learn more.
Use the input selector to define inputs to your stages.
To select stage inputs:
By default, the input selector displays your pipeline stage outputs; if no stage outputs have been configured yet, your uploaded datasets or data repo will be shown. Click ‘More input options’ to navigate to the full set of inputs available.
Raw datasets uploaded to this pipeline
Datasets in your data repo
Outputs from previous stages in this pipeline
Outputs from pipelines shared by other organisations (if your organisation has enabled this feature)
Stage outputs are the datasets resulting from the processes applied to the data in a stage. They can be connected to the inputs of subsequent stages for further processing, exported your data repo, or downloaded directly.
By default all stage outputs appear in your pipeline outputs available for export. Disabling an output removes it from the list of exportable pipeline outputs, though it is still available for use in subsequent pipeline stages. Disabling outputs that are not required for export has two benefits:
1. Cleaner output list
Your output list only displays your final datasets, so it’s easier to configure their export destinations.
2. Accelerated pipeline run-times
Limiting the number of datasets available for export will speed up pipeline run time.
To disable a stage output, click on it. The disabled output indicator displays next to the output name. Next time the pipeline runs it will be removed from the pipeline outputs list. To re-enable the output, click it again.
To rename a stage, double click on the stage name, or click the edit icon to the right of the stage name.
To edit an operation, click on the operation to go to its edit panel. The stage the operation sits within is listed top left – click it to go back to the stages view.
To rename an operation, click on the operation to go to its edit panel. Click on the name to edit it.
To reorder stages, click on the stage number and edit it to the desired sequential position. Note that if a stage that uses outputs from a previous stage is reordered, to maintain the sequential nature of the stages its inputs will be removed and will need to be reselected. Stages which rely on the outputs for this stage will also have their inputs removed.
To reorder operations, hover over the operation you wish to move then drag and drop it to its new position.
Transform and join stages can be temporarily disabled, allowing before and after comparison of the effects of that stage on your output data, without losing their configuration.
To disable an operation or join stage, click on the diamond operation button next to the name. The operation or join stage is greyed-out, but the parameters can still be edited.
To disable all operations in a Transform stage, click on the master operation button next to the Operations heading. Stages with all the operations disabled will display greyed-out in the pipeline editor.
To re-enable operations, click the operations button.
If a stage has been configured incorrectly, for example it contains a script operation with unsupported values, you will be notified via an error message in the stage and running the pipeline will be disabled.