Calendar table on Vertica DB


Calendar Table



In this post we will describe how can we create a calendar table on Vertica db using Timeseries analytics.



TIMESERIES Clause

Provides gap-filling and interpolation (GFI) computation, an important component of time series analytics computation. More information you can find on timeseries info.


Our query consists of 3 parts in order to have all the necessary fields.


CREATE TABLE IF NOT EXISTS calendar AS

👉  SELECT ROW_NUMBER() over(partition by 1) AS id  , tmp.*
FROM
(
👉    SELECT  slice_time                                           AS  actual_timestamp,
                    CAST(slice_time AS DATE)                        AS  date,
                    DATE_PART('YEAR', slice_time)               AS  year,
                    DATE_PART('QUARTER', slice_time)       AS  quarter,
                    DATE_PART('MONTH', slice_time)           AS  month,
                    TO_CHAR(slice_time, 'Month')                  AS  month_description,
                    TO_CHAR(slice_time, 'MON')                   AS  month_desc_abbr,
                    TO_CHAR(slice_time, 'YYYY/MM')        AS  yyyymm,
                    DATE_PART('WEEK', slice_time)             AS  week, 
                    WEEK_ISO(slice_time)                              AS  iso_week,
                    DATE_PART('DAY', slice_time)                AS  day,
                    TO_CHAR(slice_time, 'Day')                     AS  dayname,
                    TO_CHAR(slice_time, 'DY')                      AS  dayname_abbr,
                    TO_CHAR(slice_time, 'D')                         AS  dayofweek
    FROM
    (
👉   SELECT  '2019-01-01 00:00:00'  AS sample_date
        UNION
        SELECT  '2025-12-31 00:00:00'  AS sample_date
    ) a TIMESERIES slice_time AS '1 DAY' OVER( ORDER BY sample_date::VARCHAR::TIMESTAMP) 

)tmp

 Explanation

  First subquery, contain start & end dates. Between these 2 dates we will fill the gap using TIMESERIES clause. Timeseries will perform an operation on a query result set.

⚠ Then we should  specify the length of time unit of time slice computation, which is '1 DAY'. This is the step that we use in order to fill the gap between the sample dates.

OVER() specifies that the timeseries operates on a query result set , that is the rows that are returned after the FROM . The timeseries clause requires an ORDER BY operation on the timestamp column, so we  sort the data based on timestamp generated.

  Second sub-query, contains all the necessary fields. Here, we use 3 different functions to succeed that:

  1. TO_CHAR():    Converts date/time and numeric values into text strings.
    • It's ideal to take abbreviations for day, month, year.
  2. DATE_PART(): Extracts a sub-field such as year or hour from a date/time expression.
  3. WEEK_ISO():   Returns the week of the year for the specified date

Last but not least, we need to create a PK for the table. For this reason we use the analytic function ROW_NUMBER() over(partition by 1). So the operation of the ROW_NUMBER()  is taking place of the result of the second query and is put everything into the same partition.

❗❗❗ partition by 1 : is used for syntax integrity. It can be omitted.



👀 The below video shows the result set of the query.





Σχόλια

Δημοφιλείς αναρτήσεις από αυτό το ιστολόγιο