Documentation for Deleting Old Data from RDS Aurora PostgreSQL Table

This documentation describes the process of setting up an automated deletion of old data from the table public.subscribers_subscriberupdateevent_v2 in an RDS Aurora PostgreSQL database. The deletion is performed using a cron job that runs daily on a bastion host.

Step 1: Create a User with Limited Privileges in RDS

  1. Connect to your RDS instance using an admin user:

    psql -h webpush-worker-rds-prd.cluster-cmpslvys4mms.us-east-2.rds.amazonaws.com -U root -d webpush
  2. Create a new user with specific privileges:

    CREATE USER cleanup_user WITH PASSWORD 'eilok!ioph0Eef2n';
    GRANT SELECT, DELETE ON TABLE public.subscribers_subscriberupdateevent_v2 TO cleanup_user;

Step 2: Create a Cron Job for Daily Execution on the Bastion Host

  1. Edit the crontab file to schedule the job:

    crontab -e
  2. Add the following line to schedule the script to run every day at 8:00 AM (UTC):

    0 8 * * * /opt/rds_db_cleanup/cleanup_script.sh >> /opt/rds_db_cleanup/logs/cleanup.log 2>&1
    

Step 3: Create the Cleanup Script

Create the script /opt/rds_db_cleanup/cleanup_script.sh to find and delete old data.

Script Content

#!/bin/bash
 
# Define variables
RDS_HOST="webpush-worker-rds-prd.cluster-cmpslvys4mms.us-east-2.rds.amazonaws.com"
RDS_DB="webpush"
RDS_USER="cleanup_user"
RDS_PASSWORD="eilok!ioph0Eef2n"
TABLE_NAME="public.subscribers_subscriberupdateevent_v2"
LOG_DIR="/opt/rds_db_cleanup/logs"
LOG_FILE="$LOG_DIR/cleanup_$(date +%Y%m%d_%H%M%S).log"
 
# Ensure log directory exists
mkdir -p $LOG_DIR
 
# Log start time
echo "$(date +'%Y-%m-%d %H:%M:%S') - Starting cleanup process" >> $LOG_FILE
 
# Execute SQL commands to find and delete old data
SQL="SELECT MAX(id) AS max_id, COUNT(*) AS old_data_count FROM $TABLE_NAME WHERE datetime < date_trunc('day', now());"
RESULT=$(PGPASSWORD=$RDS_PASSWORD psql -h $RDS_HOST -U $RDS_USER -d $RDS_DB -t -c "$SQL" | xargs)
MAX_ID=$(echo $RESULT | cut -d '|' -f 1 | xargs)
OLD_DATA_COUNT=$(echo $RESULT | cut -d '|' -f 2 | xargs)
 
if [ -n "$MAX_ID" ] && [ "$OLD_DATA_COUNT" -gt 1 ]; then
  # Log the number of old records found
echo "$(date +'%Y-%m-%d %H:%M:%S') - Found $OLD_DATA_COUNT records to delete, up to ID $MAX_ID" >> $LOG_FILE
 
  # Execute delete command
  DELETE_SQL="DELETE FROM $TABLE_NAME WHERE id < $MAX_ID;"
  # DELETE_SQL="SELECT count(id) FROM $TABLE_NAME WHERE id < $MAX_ID;"
  DELETE_COUNT=$(PGPASSWORD=$RDS_PASSWORD psql -h $RDS_HOST -U $RDS_USER -d $RDS_DB -c "$DELETE_SQL" -t -A | grep "DELETE" | awk '{print $2}')
 
  # Log the number of records deleted
echo "$(date +'%Y-%m-%d %H:%M:%S') - Deleted $DELETE_COUNT records" >> $LOG_FILE
 
  # Log remaining count
  REMAINING_COUNT=$(PGPASSWORD=$RDS_PASSWORD psql -h $RDS_HOST -U $RDS_USER -d $RDS_DB -t -c "SELECT COUNT(*) FROM $TABLE_NAME;" | xargs)
  echo "$(date +'%Y-%m-%d %H:%M:%S') - Remaining records: $REMAINING_COUNT" >> $LOG_FILE
else
  echo "$(date +'%Y-%m-%d %H:%M:%S') - No records found for deletion" >> $LOG_FILE
fi
 
# Log end time
echo "$(date +'%Y-%m-%d %H:%M:%S') - Cleanup process completed" >> $LOG_FILE

Logs

Logs of the script execution are stored in the /opt/rds_db_cleanup/logs directory and contain information about:

  • Start and end times of the process.
  • Number of records found and deleted.
  • Number of records remaining after deletion.