Legacy date operations
About legacy date operations
Legacy date operations are script operations which have been superseded by the easy-to-use operations
Date Cleanse and
Date Output, along with
date functions in Calculate.
They will soon be deprecates so they can no longer be added to a pipeline. They will continue to work in existing pipelines. Updating existing pipelines to remove legacy date operations is advised.
Given a column containing a term and a column
containing a date, this operation creates a new column containing the
result of adding (or subtracting) the term and the date. Dates in source column can be in any of the formats
checked by the
date cleansing operation.
Usage
addTermToDate:
dateColumn: "<insert column name>"
termColumn: "<insert column name>"
resultColumn: "<insert new column name>"
unit: "<days, months or years>"
Notes
1.
Output dates are in ISO format.
2.
The operation is date aware, so it will account for leap years.
3.
Adding months to dates will output the same day of the subsequent
month. On the last days of the month where the subsequent month is shorter, it will output the last day of
the month. For example, adding 1 month to 30 January will set an output date of 28 February (in a non-leap
year).
Example
Calculate the end date from a start date and a given term.
addTermToDate:
dateColumn: "start date"
termColumn: "term"
resultColumn: "end date"
unit: "days"
ID
Start date
Term
End date
This operation reads dates from a wide variety of formats and cleanses
into ISO format.
Usage
cleanseDateColumns:
columns:
- "<insert date column name>"
- "<insert date column name>"
Alternatively, apply date cleansing to columns whose name contains a specified
keyword:
cleanseDateColumnsWithNames:
containing: "header"
For example, cleanseDateColumnsWithNames: {containing:
"Date"} would cleanse the date of columns with headers: ‘Date of Birth’ and
‘Date of Inception’ but ignore headers with: ‘Premium’, ‘DoB’, ‘date of birth’. Note that the matching is case
sensitive.
The date cleansing goes through the following formats
in order to try and get a match:
yyyyMMdd
dd-MM-yyyy
dd/MM/yyyy
yyyy-MM-dd
dd-MMM-yyyy
dd-MMM-yy
ddMMyyyy
Excel date format (days since Jan 1st 1900)
The date will be output in basic ISO format (yyyyMMdd). If another output format is
required, follow the Date Cleansing operation with the
Extract date operation.
US date formats
To convert US date format to ISO format, use the following configuration:
cleanseDateColumns:
columns:
- "<date column name>"
format: us
This will check the following formats:
MM-dd-yyyy
MM/dd/yyyy
MMddyyyy
The date will be output in basic ISO format (yyyyMMdd). If another output format is
required, follow the Date Cleansing operation with the
Extract date operation.
Given two date columns this operation calculates the days,
months or years between these dates. Dates in source columns can be in any of the formats checked by the
date cleansing operation.
Usage
yearsBetween:
from: "<insert column name>"
to: "<insert column name>"
resultColumn: "<insert new column name>"
Replace
yearsBetween
with:
-
monthsBetween
to give the number of whole months between the dates
-
daysBetween
to give the number of days between the dates
Example
yearsBetween:
from: "Date of Birth"
to: "Report date"
resultColumn: "Age"
ID
Date of birth
Report date
Age
Extract Date
Take a column containing a date in one format and create
a new column containing that date in another format. This operation also handles
partial date formats. Allowing you to convert year-month or year-quarter into
a date.
Usage
extractDate:
column: "<insert column name>"
resultColumn: "<insert column name>"
Example
extractDate:
column: filename
resultColumn: reporting date eom
input:
pattern: '(\d{4}) (\d{2})'
groupKey:
year: 1
month: 2
output:
format: ISO
dayOfMonthRule: EOM
There are two optional arguments: Input and Output.
Input
used to determine the date in the column value
if not specified it tries a range of partial date formats:
Year month (2013-01)
Year quarter (2013Q1)
Named Month Year (Jan 2013)
Arguments
Either specify a named format, or a
pattern and group key to identify the year, month and optionally day
in any text.
Format
One of:
YearMonth
YearNamedMonth
ISO
YearQuarter
Pattern
Enter a regular expression containing 2 or 3 groupings, eg:
(\d{4}) (\d{2})
Group Key
Identify the groups in the pattern
year: 1
month: 2
day: 3 (optional)
Output
Once the date components are identified the
data in the output column can be defined. If not specified then the
output is the basic ISO date. If the input is a partial date (i.e.
has no day part) then the output will be the last day in that month.
dayOfMonthRule:
Either
1. EOM
2. PreviousEOM
3. SubsequentEOM
Format
One of:
1. "ISO"
2. "YearMonth"
3. "Year-Month"
4. "YearQuarter"
Pattern
A pattern can be used instead of a preset format and should be a Java DateTimeFormatter
string:
using predefined constants, such as
ISO_LOCAL_DATE
using pattern letters, such as yyyy-MM-dd
See the
Java DateTimeFormatter documentation for full details.
Extracts the calendar year and calculates the year
quarter from a given date. Results are given in a new column. Dates in the source column can be in any of the
formats checked by the
date cleansing operation.
Usage
extractYearQuarter:
dateColumn: "<insert column name>"
resultColumn: "<insert new column name>"
Example
For input data
extractYearQuarter:
dateColumn: "dt"
resultColumn: "quarter"