Help Centre

Workspace
Stages and operations

In this section

About stages and operations ↓

Stage types ↓

Working with stages ↓

Adding and configuring stages and operations ↓

Stage inputs ↓

Stage outputs ↓

Editing, renaming and re-ordering ↓

Disabling stages and operations ↓

Configuration errors ↓

About stages and operations

Quantemplate breaks the data transformation process down into stages and operations, allowing you to sequence and structure your data transformation process in a way that is easy to browse and navigate.

Stages are structural components with a defined number of inputs and outputs. There are three kinds of stage: Transform, Union and Join.

Operations are individual transformation process, grouped together via a Transform stage. Operations with sophisticated configuration options have an interactive user interface; more straightforward operations are currently configured via a simple script pre-populated with typical default values.

For a full list of operations available in Transform stages, see the Operations Index.

Stages can be visually expanded to show inputs, outputs and configuration options, and can be collapsed to provide an overview the whole process. Operations slide across to reveal their configuration panel.

Stage types

Join stage

A Join stage combines two different datasets which have one or more shared columns of data. For example, a premium dataset could be joined to a claims dataset using the policy number column as the join point. A join stage would typically have one or two outputs. Learn more about the Join stage.

Union stage

A Union stage is used to combine multiple datasets with identical sets of column headers to produce a single output. Because the headers are the same in each source file the rows can be stacked on top of each other to produce a single unified output dataset. Learn more about the Union stage.

Transform stage

Operations to transform your data are accessed and sequenced in Transform stages. Transform stages allow multiple input datasets; the number of output datasets will usually equal the number of input datasets, unless an operation to create additional datasets (such as a pivot operation) has been used.

Working with stages

Sequencing and connecting stages

Data runs through the pipeline in sequential order, from top to bottom in the pipeline view. For the data to run between stages, the outputs of one stage are selected as the inputs to a lower stage.

Output file naming

The names of output files from each stage append the name of the stage to the name of the source file. This allows you to see the transformation steps the data has undergone when selecting stage inputs.

As you make changes to a stage configuration, it may take a few moments for the changes to percolate through the pipeline. The stage spinner indicates that upstream changes are being processed and the data within that stage may not be up to date.

Previewing changes with ‘Trace’

Trace is a feature that enables changes in pipeline configuration to be rapidly previewed downstream. As you build and edit a pipeline, Trace simulates and validates any changes you make along the way, without the need to execute the pipeline. This provides a real time view of the data structure at any point in a pipeline and helps to flag errors prior to running the pipeline.

As you make changes, it may take a few moments for Trace to run through the pipeline. Trace progress is indicated by a spinner on each stage being processed.

Adding and configuring stages and operations

To configure a Transform stage:

  1. Click ‘Add stage’ and select ‘Transform stage’
  2. Select inputs for the stage.
  3. Click ‘Add operation‘ and select or search for an operation.
  4. Click on the operation to view its configuration panel.
  5. Add more operations as desired.

To configure Join stages, add inputs and click on the join configuration panel. Learn more.

To configure Union stages, simply add inputs – no further configuration required. Learn more.

Tip
Group operations which share the same inputs into a single stage. If you need a large number of operations it may help to break down the sequences into separate Transform stages, grouping related processes together.

Stage inputs

Selecting stage inputs

Use the input selector to define inputs to your stages.

To select stage inputs:

  1. Click [+] in the inputs area.
  2. In the popup input selector navigate and search your available input files.
  3. Select the desired inputs by clicking their individual checkboxes, or select all the results of a search by clicking ‘Select all results’.
Tip
Selections are retained when performing an additional search. This means you can search for ‘Premiums’ and select all results, then search for ‘Claims’ and select all results to input all datasets with either premium or claims in their filename.

Input options

By default, the input selector displays your pipeline stage outputs; if no stage outputs have been configured yet, your uploaded datasets or data repo will be shown. Click ‘More input options’ to navigate to the full set of inputs available.

Uploaded files

Raw datasets uploaded to this pipeline

Data repo

Datasets in your data repo

Pipeline stage outputs

Outputs from previous stages in this pipeline

Link from external pipelines

Outputs from pipelines shared by other organisations (if your organisation has enabled this feature)

Stage outputs

Stage outputs are the datasets resulting from the processes applied to the data in a stage. They can be connected to the inputs of subsequent stages for further processing, exported your data repo, or downloaded directly.

Disabling stage outputs

By default all stage outputs appear in your pipeline outputs available for export. Disabling an output removes it from the list of exportable pipeline outputs, though it is still available for use in subsequent pipeline stages. Disabling outputs that are not required for export has two benefits:

1. Cleaner output list
Your output list only displays your final datasets, so it’s easier to configure their export destinations.

2. Accelerated pipeline run-times
Limiting the number of datasets available for export will speed up pipeline run time.

To disable a stage output, click on it. The disabled output indicator displays next to the output name. Next time the pipeline runs it will be removed from the pipeline outputs list. To re-enable the output, click it again.

Editing, renaming and re-ordering

Renaming stages

To rename a stage, double click on the stage name, or click the edit icon to the right of the stage name.

Editing and renaming operations

To edit an operation, click on the operation to go to its edit panel. The stage the operation sits within is listed top left – click it to go back to the stages view.

To rename an operation, click on the operation to go to its edit panel. Click on the name to edit it.

Reordering stages and operations

To reorder stages, click on the stage number and edit it to the desired sequential position. Note that if a stage that uses outputs from a previous stage is reordered, to maintain the sequential nature of the stages its inputs will be removed and will need to be reselected. Stages which rely on the outputs for this stage will also have their inputs removed.

To reorder operations, hover over the operation you wish to move then drag and drop it to its new position.

Disabling operations and stages

Transform and join stages can be temporarily disabled, allowing before and after comparison of the effects of that stage on your output data, without losing their configuration.

To disable an operation or join stage, click on the diamond operation button next to the name. The operation or join stage is greyed-out, but the parameters can still be edited.

To disable all operations in a Transform stage, click on the master operation button next to the Operations heading. Stages with all the operations disabled will display greyed-out in the pipeline editor.

To re-enable operations, click the operations button.

Configuration errors

If a stage has been configured incorrectly, for example it contains a script operation with unsupported values, you will be notified via an error message in the stage and running the pipeline will be disabled.