Project OpenInsight - Analyze the history of Postgres pg_cron  Jobs

Project OpenInsight - Analyze the history of Postgres pg_cron Jobs

·

3 min read

By Itay Braun, CTO, Metis

Database maintenance is a critical aspect of managing PostgreSQL databases, and automation can greatly simplify routine tasks. In this blog post, we'll explore how to use the pg_cron extension to create a cron job in PostgreSQL. Then we'll use a Notebook to analyze the history of the jobs.

Using pg_cron to Automate Tasks

pg_cron is an extension for PostgreSQL that allows you to schedule and manage cron jobs directly within your database. It simplifies the process of automating database tasks, making it easier to execute SQL queries at specific intervals.

Installing pg_cron

Before we create our cron job, we need to ensure that pg_cron is installed and enabled in our PostgreSQL database. You can follow these steps to install pg_cron:

  1. Download the pg_cron extension from the official GitHub repository or use your distribution's package manager if available.

  2. Load the extension into your PostgreSQL database:

-- Connect to your PostgreSQL database
CREATE EXTENSION IF NOT EXISTS pg_cron;

Creating a Cron Job with pg_cron

Now that we have pg_cron installed, let's create a cron job that will delete old data from a PostgreSQL table every Saturday at 3:30 AM GMT.

-- Schedule a job to delete old data on Saturday at 3:30 AM GMT
SELECT cron.schedule(
    'delete_old_data',
    '30 3 * * 6', -- Cron schedule: 30 minutes past 3 AM every Saturday
    'DELETE FROM events WHERE event_time < now() - interval ''1 week'''
);

Here's a breakdown of the cron.schedule function call:

  • 'delete_old_data': This is the name of the cron job.

  • '30 3 * * 6': This is the cron schedule expressed as a string. It represents 3:30 AM (30 minutes past 3 AM) every Saturday (day of the week 6).

  • The SQL command 'DELETE FROM events WHERE event_time < now() - interval ''1 week''' is the query that will be executed when the cron job runs. In this case, it deletes old data from the "events" table where the "event_time" is more than one week old.

💡
Tip: Writing the cron schedule expression (such as "30 3 6") might be tricky. Use crontab.guru to view the schedule in simple English.
💡
Tip: If the SQL text is long, or contains deep internal logic, it is recommended to replace it with a function. This way you can test and debug the function, rather than a long SQL code.

Managing and Verifying Cron Jobs

You can view the list of scheduled cron jobs in your database using the following query:

SELECT * FROM cron.job;

To remove a scheduled cron job, you can use the cron.unschedule function:

SELECT cron.unschedule('delete_old_data');

Monitoring and Administering pg_cron Jobs in PostgreSQL

Effective management of pg_cron jobs in PostgreSQL involves:

  1. Success and Failure Rates: Regularly tracking these rates helps assess job reliability and pinpoint potential issues.

  2. Average Job Duration: Monitoring job run times aids in identifying performance anomalies or resource constraints.

  3. Identifying High Durations: Consistently high durations may indicate problems requiring investigation or optimization.

  4. Accessing Error Messages: Detailed error messages are crucial for quick troubleshooting and maintaining job reliability.

Incorporating these practices ensures the efficiency and reliability of your scheduled tasks.

💡
Insight: the table cron.job_run_details requires a job to delete old data. If you don't do that, the table will continue to grow. It is recommended to write a job that runs every day and delete data older than X days.

Observability Toolkit

I wrote about an open-source observability-toolkit-for-postgres. It now contains a new notebook for cron jobs. The Notebook covers the administrative tasks listed above:

  • Extension configuration

  • Success and failure rate

  • The current size of the table cron.job_run_details.

  • Average job duration

  • View detailed runs to understand the error messages and high durations.