Documentation for Deleting Old Data from RDS Aurora PostgreSQL Table
This documentation describes the process of setting up an automated deletion of old data from the table public.subscribers_subscriberupdateevent_v2 in an RDS Aurora PostgreSQL database. The deletion is performed using a cron job that runs daily on a bastion host.
Step 1: Create a User with Limited Privileges in RDS
-
Connect to your RDS instance using an admin user:
psql -h webpush-worker-rds-prd.cluster-cmpslvys4mms.us-east-2.rds.amazonaws.com -U root -d webpush -
Create a new user with specific privileges:
CREATE USER cleanup_user WITH PASSWORD 'eilok!ioph0Eef2n'; GRANT SELECT, DELETE ON TABLE public.subscribers_subscriberupdateevent_v2 TO cleanup_user;
Step 2: Create a Cron Job for Daily Execution on the Bastion Host
-
Edit the crontab file to schedule the job:
crontab -e -
Add the following line to schedule the script to run every day at 8:00 AM (UTC):
0 8 * * * /opt/rds_db_cleanup/cleanup_script.sh >> /opt/rds_db_cleanup/logs/cleanup.log 2>&1
Step 3: Create the Cleanup Script
Create the script /opt/rds_db_cleanup/cleanup_script.sh to find and delete old data.
Script Content
#!/bin/bash
# Define variables
RDS_HOST="webpush-worker-rds-prd.cluster-cmpslvys4mms.us-east-2.rds.amazonaws.com"
RDS_DB="webpush"
RDS_USER="cleanup_user"
RDS_PASSWORD="eilok!ioph0Eef2n"
TABLE_NAME="public.subscribers_subscriberupdateevent_v2"
LOG_DIR="/opt/rds_db_cleanup/logs"
LOG_FILE="$LOG_DIR/cleanup_$(date +%Y%m%d_%H%M%S).log"
# Ensure log directory exists
mkdir -p $LOG_DIR
# Log start time
echo "$(date +'%Y-%m-%d %H:%M:%S') - Starting cleanup process" >> $LOG_FILE
# Execute SQL commands to find and delete old data
SQL="SELECT MAX(id) AS max_id, COUNT(*) AS old_data_count FROM $TABLE_NAME WHERE datetime < date_trunc('day', now());"
RESULT=$(PGPASSWORD=$RDS_PASSWORD psql -h $RDS_HOST -U $RDS_USER -d $RDS_DB -t -c "$SQL" | xargs)
MAX_ID=$(echo $RESULT | cut -d '|' -f 1 | xargs)
OLD_DATA_COUNT=$(echo $RESULT | cut -d '|' -f 2 | xargs)
if [ -n "$MAX_ID" ] && [ "$OLD_DATA_COUNT" -gt 1 ]; then
# Log the number of old records found
echo "$(date +'%Y-%m-%d %H:%M:%S') - Found $OLD_DATA_COUNT records to delete, up to ID $MAX_ID" >> $LOG_FILE
# Execute delete command
DELETE_SQL="DELETE FROM $TABLE_NAME WHERE id < $MAX_ID;"
# DELETE_SQL="SELECT count(id) FROM $TABLE_NAME WHERE id < $MAX_ID;"
DELETE_COUNT=$(PGPASSWORD=$RDS_PASSWORD psql -h $RDS_HOST -U $RDS_USER -d $RDS_DB -c "$DELETE_SQL" -t -A | grep "DELETE" | awk '{print $2}')
# Log the number of records deleted
echo "$(date +'%Y-%m-%d %H:%M:%S') - Deleted $DELETE_COUNT records" >> $LOG_FILE
# Log remaining count
REMAINING_COUNT=$(PGPASSWORD=$RDS_PASSWORD psql -h $RDS_HOST -U $RDS_USER -d $RDS_DB -t -c "SELECT COUNT(*) FROM $TABLE_NAME;" | xargs)
echo "$(date +'%Y-%m-%d %H:%M:%S') - Remaining records: $REMAINING_COUNT" >> $LOG_FILE
else
echo "$(date +'%Y-%m-%d %H:%M:%S') - No records found for deletion" >> $LOG_FILE
fi
# Log end time
echo "$(date +'%Y-%m-%d %H:%M:%S') - Cleanup process completed" >> $LOG_FILELogs
Logs of the script execution are stored in the /opt/rds_db_cleanup/logs directory and contain information about:
- Start and end times of the process.
- Number of records found and deleted.
- Number of records remaining after deletion.