Quantemplate pipelines can be reused to perform the same
set of transformations on data sources which are regularly updated. Once you’ve
configured a Quantemplate pipeline to transform a particular set of data sources,
it’s easy to feed in identically formatted new data as you receive it.
For example, each month you receive a dataset from three
different sources. You have configured a pipeline to transform the data to the desired
output format. When new data comes in next month, it can be fed through the pipeline
and exported to the data repo.
To feed new data into a pipeline:
Go to the inputs tab and remove the old pipeline inputs. Click on the [-] button to remove.
Upload the latest data.
Now thread the new data through the pipeline.
Go to the pipeline tab, open the first stage and select the newly uploaded files as
stage inputs. The stage outputs are generated automatically.
Configure the inputs for the next stage, until
all stages have their inputs configured. Union stages create a single output based
on the stage name, so for all subsequent stages the new data is threaded through
Run the pipeline to generate your output
data, then export it to your data repo.
If you’re creating a pipeline that will be frequently re-used,
consider using this structure:
stage to remove rows, detect
headers and map to a common schema.
stage to combine the source data.
This creates a single output with the output name based on the stage name, meaning
the rest of the pipeline can flow in the new data automatically.
Add subsequent transform and join stages.
If your data transformation requirements mean you’re not able
to use this kind of pipeline structure, then the new data can still be threaded through
from stage to stage individually. Be sure to select the right inputs for each stage.
Dealing with source data format changes
If something changes in one of your source data formats, your
pipeline may produce unexpected results. Quantemplate allows you to fix this by editing
your transformations to accommodate the new data format.
For example, one of your data providers has moved to a new
system, generating slightly different column header names. To fix this, go into the
column mapping stage and configure the correct mappings.