Summary

In order to save costs on AWS infrastructure, we are migrating ClickHouse database from an EKS cluster to on-premises servers. This migration involves transferring the entire database, including all tables and data, to ensure continuity of service and data integrity. A certain AWS workload should also be migrated to on-premises servers, including:

statistics-api
statistics-listener
callback-processor
directory-reach-extractor
directory-reach-detector
directory-awake-detection
directory-offline-package-uploader
callback-api
messages-api - should be investigated if it is used for statistics purposes

Database migration steps

Set up a new ClickHouse instance on-premises, ensuring it is properly configured and optimized for performance.
Set up workloads on-premises servers, ensuring they are properly configured, have a proper connection to the new ClickHouse instance, and are ready to handle the input data.
Prepare traffic mirroring to replicate data from the AWS ClickHouse instance to the on-premises instance.
Enable traffic mirroring and monitor the replication process to ensure data consistency and integrity.
Wait for a couple of full days of traffic mirroring to ensure that the on-premises setup is capable of handling the load and that all data is being accurately replicated.

Validate and compare NEW data between the AWS and on-premises ClickHouse instances to ensure they are in sync:

Get the next day after the migration started and use it as a reference date (let’s call it {DATE}), since this is the first full day when both databases should have the same data.
Compare channels_stat_daily table on both databases:

select toDate(datetime), sum(sent_count), sum(delivered_count), sum(unique_click_count), sum(subscribe_count), sum(unsubscribe_count) from channels_stat_daily where toDate(datetime) >= '{DATE}' group by toDate(datetime);

Compare page_events table on both databases:

select toDate(timestamp), sum(visit_count) from page_events where toDate(timestamp) >= '{DATE}' group by toDate(timestamp);

Compare subscriber_sources_activities table on both databases:

select toDate(datetime) as date, sum(delivered_count), sum(clicked_count) from subscriber_sources_activities where datetime >= '{DATE}' group by date;

Compare subscription_events table on both databases:

select toDate(timestamp) as date, sum(subscribe_count), sum(unsubscribe_count) from subscription_events where toDate(timestamp) >= '{DATE}' group by date;

Compare sender_firebase_statistics table on both databases:

select date, sum(delivered_count) as delivers, sum(clicked_count) as clicks from sender_firebase_statistics where date >= '{DATE}' group by date;

Compare firebase_errors table on both databases:

select toDate(timestamp) as date, uuid, sum(count) from firebase_errors where date > '{DATE}' group by date, uuid order by date;

Now we should build a list of tables we need to backfill data into.
- Get a list of tables (target_table) that are populated by materialized views (clickhouse-materialized-structure). Use second query that excludes those which are adding data from Kafka streams
- Add the following 'raw' tables to the list:
  - ** page_events **
  - ** subscription_events **
  - subscriber_sources_activities
  - firebase_errors
  - firebase_subscription_errors
- Highlighted tables should be backfilled with disabled materialized views, so we don’t duplicate data
Now as we have new data validated, a list of tables we need to backup, we need to backfill the old data from AWS ClickHouse to on-premises ClickHouse.
- Data older than {DATE} should be deleted from these tables:
```
ALTER TABLE {TABLE} DELETE WHERE toDate(datetime) < '{DATE}';
```
- Create partitions snapshots on AWS ClickHouse:
```
ALTER TABLE ... FREEZE PARTITION
```
** This is suitable for all tables that have PARTITION BY toYYYYMM(datetime) (should be the majority of them) **

After snapshot is created, it should be available in the /var/lib/clickhouse/shadow/ directory on the ClickHouse server. We can use WITH NAME clause to give a custom name to the snapshot.
- Copy snapshot files from AWS ClickHouse server to on-premises ClickHouse using SCP or any other tool
- Copy the data from the data/database/table/ directory inside the backup to the /var/lib/clickhouse/data/database/table/detached/ directory on the on-premises ClickHouse server (https://clickhouse.com/docs/sql-reference/statements/alter/partition#attach-partitionpart)
- Run ALTER TABLE t ATTACH PARTITION for all tables that are safe to do so (except the highlighted ones: page_events and subscription_events)
- For the highlighted tables, we need to backfill them with disabled materialized views
  - Find all materialized views that are populating from these tables:
    - page_events_analytics for page_events
    - subscription_events_analytics, subscription_events_channel_events, subscription_events_sender_firebase, subscription_events_window_events for subscription_events
  - Backup their definitions using SHOW CREATE TABLE {TABLE} command
  - Also backup definitions for Kafka stream views:
    - page_events_stream_to_table for page_events
    - subscription_stream_to_table for subscription_events
  - Drop the Kafka stream views (page_events_stream_to_table and subscription_stream_to_table) to stop new data from being added to the base tables
  - Drop materialized views we found earlier
  - Run ALTER TABLE t ATTACH PARTITION for these tables
  - Recreate the materialized views using the definitions we backed up earlier
Final validation. Execute step 6 again, but this time we should compare the full or just broader data sets on both databases to ensure they are fully in sync.

Nashpush

Explorer

aws-statistics

Summary

Database migration steps

Graph View

Table of Contents