Scheduling of Snowflake Tasks (Pic Credit: Thinketl.com)

Scheduling Snowflake Tasks using CRON

Priya Yogendra Rana
5 min readJul 30, 2023

Quite a few Snowflake developers are aware of Tasks. To a layman, Task is scheduling the execution of a SQL query or a stored procedure. There are 2 variants: CRON and Non-CRON. Here, I will restrict myself to CRON as that’s the extent of my real-world experience, as of now.

I have never covered server-side topics before but when one is assigned a project and, in the process, learns about new things, a blog is a sure way to re-enforce new learning and findings. And well I learnt, one should never say never.

Let’s begin first with the problem statement, followed by root cause, correct syntax, solution and suggestions.

Problem statement: A certain data figure which should ideally be less than 2000 was appearing in millions.

Root cause: Upon analysis, it was found that one of the tasks had an insert statement and this task, instead of running once a month, was running every minute of the execution date. This was happening due to a syntax error.

Before delving into why’s and how’s, let’s understand how to schedule a task. The syntax is Schedule and it has 2 options. They are as follows:

1. SCHEDULE = 5 MINUTE

This means that once you execute the task or schedule it at a set time, then from that time every 5 minute this task will run. The maximum value it supports is 11520 i.e. 8 days.

2. However, focus will be on the second option which provides more flexibility in scheduling a task. The syntax goes as:

SCHEDULE = ‘USING CRON * * * * * <time zone>’

There are 5 parameters to be addressed before specifying the time zone.

The first parameter stands for minutes. It can have numeric values between 0 to 59 only.

The second parameter stands for hours. It can have numeric values between 0 to 23 only.

The third parameter stands for day of month. It can have either numeric values between 1 to 31 or L.

The fourth parameter stands for month. It can have either numeric values 1 to 12 or values JAN to DEC.

The last parameter stands for day of the week. It can have either numeric values between 0 to 6 or L or values between SUN to SAT.

Now check out all the possible values that could be used for each of the 5 parameters:

a. * means any number of occurrences.

SCHEDULE = ‘USING CRON * * * * * UTC’

Here, it means the task will run every minute of the year.

b. Number

SCHEDULE = ‘USING CRON 10 18 * 9 3 <time zone>’

Here, the task will run on every Wednesday in the month of September at 6:10 pm as per the time zone specified.

c. number-number (hyphen separated numbers)

SCHEDULE = ‘USING CRON 2–4 2–4 2–4 2–4 * <time zone>’

Here, the task will run on 2nd, 3rd, and 4th of February, March and April at 2:02 am, 2:03 am, 2:04 am, 3:02 am, 3:03 am, 3:04 am, 4:02 am, 4:03 am, 4:04 am as per the time zone specified. In this example, every minute/hour/day of the month/month from 2nd to 4th will be considered.

d. number, number, number (comma separated numbers)

SCHEDULE = ‘USING CRON 2,6,15 12,14 * 3,7,8 1 <time zone>’

Here, every Monday of March, July and August at 12:02 am, 12:06 am, 12: 15 am, 02:02 pm, 02:06 pm, 02:15 pm the task will run. In this example, by using commas, one is specifying at what minute, hour, month and week the task should run.

e. L means last.

SCHEDULE = ‘USING CRON * * * * 5L <time zone>’

It means that the task will run every minute, hour on the last Friday of the month.

f. /n means the nth instance of a given unit of time. ‘USING CRON 30 12 2 4/3 * <time zone>’ means that the task will run at 12:30 am on 2nd of April, July and October. When the year changes the cycle will start again for the same months above. 4/3 means after every 3 months, starting of the 4th month. Personally, I find this confusing so I will recommend keeping it simple by specifying the month number.

Solution: Below mentioned are both the original syntax and the corrected syntax.

Original syntax: SCHEDULE = ‘USING CRON * * * * 5L UTC’

Corrected syntax: SCHEDULE = ‘USING CRON 0 0 * JAN-DEC 5L UTC’

Based on the knowledge gained, the former statement will see that the task runs every minute of the day of execution whereas, the latter statement will ensure that the task is executed at 00:00 am on the last Friday of every month.

Testing different syntaxes

The best way to handle confusion is to try out the options. Create a dummy task which doesn’t impact any main data. Schedule and execute the task on a test server. Then run the task history. Check and study the values especially in columns: SCHEDULED_TIME and NEXT_SCHEDULED_TIME. If the values in these columns are not as expected readjusts the parameter values till you get it right. Once you migrate the task to the production server, be sure to closely monitor this task for a few cycles as a precautionary measure.

There are 2 reasons to adopt this trial and error approach.

1. Sometimes, the product documentation might not be up to date or in sync with the version installed. Snowflake documentation suggests that if the day of the month and day of the week both are specified then tasks will run on day of the month and on the day of the week outside the day of the month. So ‘USING CRON 0 0 3 JAN THU UTC’ indicates that the task will run on 3rd of January as well as on Thursdays of the same month. According to this, syntax ‘USING CRON 0 0 * JAN-DEC 5L <time zone>’ should result in the task running on last Friday but also on any other day of the month. However, on checking the task history, the NEXT_SCHEDULED_TIME column showed the desired date which is the last Friday of the next month. Hence, the best way forward would be to test before migrating to production.

2. As analysts and developers, we need to ensure and emphasize more on data quality. Data quality and good performance is what keeps the clients engaged. And last but not the least, one must realise that there will be a point in life where moving on to other tasks or projects is unavoidable. Inheriting non-functional legacy is a nightmare that everyone wishes to avoid, especially in our world.

To conclude, in order to provide your best, it’s imperative to understand the syntaxes and their formats. Then test the output of the same in a test environment before migrating to the production server. This will not only solidify your knowledge but ensure data quality as well. Until next time.

--

--

Priya Yogendra Rana
Priya Yogendra Rana

Written by Priya Yogendra Rana

Ex-Software Developer, MBA, Data Analyst Enthusiast. Bring together business focus and data skills

No responses yet