Summary
In order to save costs on AWS infrastructure, we are migrating ClickHouse database from an EKS cluster to on-premises servers. This migration involves transferring the entire database, including all tables and data, to ensure continuity of service and data integrity. A certain AWS workload should also be migrated to on-premises servers, including:
- statistics-api
- statistics-listener
- callback-processor
- directory-reach-extractor
- directory-reach-detector
- directory-awake-detection
- directory-offline-package-uploader
- callback-api
- messages-api - should be investigated if it is used for statistics purposes
Database migration steps
-
Set up a new ClickHouse instance on-premises, ensuring it is properly configured and optimized for performance.
-
Set up workloads on-premises servers, ensuring they are properly configured, have a proper connection to the new ClickHouse instance, and are ready to handle the input data.
-
Prepare traffic mirroring to replicate data from the AWS ClickHouse instance to the on-premises instance.
-
Enable traffic mirroring and monitor the replication process to ensure data consistency and integrity.
-
Wait for a couple of full days of traffic mirroring to ensure that the on-premises setup is capable of handling the load and that all data is being accurately replicated.
-
Validate and compare NEW data between the AWS and on-premises ClickHouse instances to ensure they are in sync:
- Get the next day after the migration started and use it as a reference date (let’s call it
{DATE}), since this is the first full day when both databases should have the same data. - Compare
channels_stat_dailytable on both databases:
select toDate(datetime), sum(sent_count), sum(delivered_count), sum(unique_click_count), sum(subscribe_count), sum(unsubscribe_count) from channels_stat_daily where toDate(datetime) >= '{DATE}' group by toDate(datetime);- Compare
page_eventstable on both databases:
select toDate(timestamp), sum(visit_count) from page_events where toDate(timestamp) >= '{DATE}' group by toDate(timestamp);- Compare
subscriber_sources_activitiestable on both databases:
select toDate(datetime) as date, sum(delivered_count), sum(clicked_count) from subscriber_sources_activities where datetime >= '{DATE}' group by date;- Compare
subscription_eventstable on both databases:
select toDate(timestamp) as date, sum(subscribe_count), sum(unsubscribe_count) from subscription_events where toDate(timestamp) >= '{DATE}' group by date;- Compare
sender_firebase_statisticstable on both databases:
select date, sum(delivered_count) as delivers, sum(clicked_count) as clicks from sender_firebase_statistics where date >= '{DATE}' group by date;- Compare
firebase_errorstable on both databases:
select toDate(timestamp) as date, uuid, sum(count) from firebase_errors where date > '{DATE}' group by date, uuid order by date; - Get the next day after the migration started and use it as a reference date (let’s call it
-
Now we should build a list of tables we need to backfill data into.
- Get a list of tables (
target_table) that are populated by materialized views (clickhouse-materialized-structure). Use second query that excludes those which are adding data from Kafka streams - Add the following
'raw'tables to the list:- **
page_events** - **
subscription_events** subscriber_sources_activitiesfirebase_errorsfirebase_subscription_errors
- **
- Highlighted tables should be backfilled with disabled materialized views, so we don’t duplicate data
- Get a list of tables (
-
Now as we have new data validated, a list of tables we need to backup, we need to backfill the old data from AWS ClickHouse to on-premises ClickHouse.
- Data older than {DATE} should be deleted from these tables:
ALTER TABLE {TABLE} DELETE WHERE toDate(datetime) < '{DATE}';- Create partitions snapshots on AWS ClickHouse:
ALTER TABLE ... FREEZE PARTITION** This is suitable for all tables that have
PARTITION BY toYYYYMM(datetime)(should be the majority of them) **After snapshot is created, it should be available in the
/var/lib/clickhouse/shadow/directory on the ClickHouse server. We can useWITH NAMEclause to give a custom name to the snapshot.- Copy snapshot files from AWS ClickHouse server to on-premises ClickHouse using SCP or any other tool
- Copy the data from the
data/database/table/directory inside the backup to the/var/lib/clickhouse/data/database/table/detached/directory on the on-premises ClickHouse server (https://clickhouse.com/docs/sql-reference/statements/alter/partition#attach-partitionpart) - Run
ALTER TABLE t ATTACH PARTITIONfor all tables that are safe to do so (except the highlighted ones:page_eventsandsubscription_events) - For the highlighted tables, we need to backfill them with disabled materialized views
- Find all materialized views that are populating from these tables:
page_events_analyticsforpage_eventssubscription_events_analytics,subscription_events_channel_events,subscription_events_sender_firebase,subscription_events_window_eventsforsubscription_events
- Backup their definitions using
SHOW CREATE TABLE {TABLE}command - Also backup definitions for Kafka stream views:
page_events_stream_to_tableforpage_eventssubscription_stream_to_tableforsubscription_events
- Drop the Kafka stream views (
page_events_stream_to_tableandsubscription_stream_to_table) to stop new data from being added to the base tables - Drop materialized views we found earlier
- Run
ALTER TABLE t ATTACH PARTITIONfor these tables - Recreate the materialized views using the definitions we backed up earlier
- Find all materialized views that are populating from these tables:
-
Final validation. Execute step 6 again, but this time we should compare the full or just broader data sets on both databases to ensure they are fully in sync.