Here we will look into an example of extending a table with a new column and migrating the data from the old table to the new one.
Keep the old data safe
RENAME TABLE subscriber_sources_activities TO subscriber_sources_activities_old;Create a new table with a relevant schema
CREATE TABLE statistics.subscriber_sources_activities
(
`datetime` Date DEFAULT '1970-01-01',
`key` String,
`value` String,
`channel_id` Int64,
`sent_count` Int32,
`delivered_count` Int32,
`clicked_count` Int32,
`closed_count` Int32,
`subscribed_count` Int32,
`unsubscribed_count` Int32,
`page_view_count` Int32
)
ENGINE = SummingMergeTree((sent_count, delivered_count, clicked_count, closed_count, subscribed_count, unsubscribed_count, page_view_count))
PARTITION BY toYYYYMM(datetime)
ORDER BY (datetime, channel_id, key, value)
SETTINGS index_granularity = 8192Prepare data migration insert statements
Here is a main step that will take our old data and copy it to the new table with a new column (page_view_count). We will use the system.parts table to get the list of partitions and generate all the INSERT statements for each partition.
select concat('insert into subscriber_sources_activities (datetime, key, value, channel_id, sent_count, delivered_count, clicked_count, closed_count, subscribed_count, unsubscribed_count, page_view_count) select datetime, key, value, channel_id, sent_count, delivered_count, clicked_count, closed_count, subscribed_count, unsubscribed_count, toInt32(0) as page_view_count from subscriber_sources_activities_old where toYYYYMM(datetime)=',partition) as cmd,
database, table, partition, sum(rows), sum(bytes_on_disk), count()
from system.parts
where database='statistics' and table='subscriber_sources_activities_old'
group by database, table, partition
order by partition
-- the next is output
insert into subscriber_sources_activities (datetime, key, value, channel_id, sent_count, delivered_count, clicked_count, closed_count, subscribed_count, unsubscribed_count, page_view_count) select datetime, key, value, channel_id, sent_count, delivered_count, clicked_count, closed_count, subscribed_count, unsubscribed_count, toInt32(0) as page_view_count from subscriber_sources_activities_old where toYYYYMM(datetime)=202306
insert into subscriber_sources_activities (datetime, key, value, channel_id, sent_count, delivered_count, clicked_count, closed_count, subscribed_count, unsubscribed_count, page_view_count) select datetime, key, value, channel_id, sent_count, delivered_count, clicked_count, closed_count, subscribed_count, unsubscribed_count, toInt32(0) as page_view_count from subscriber_sources_activities_old where toYYYYMM(datetime)=202401
Start moving the data
After each query compare the number of rows in both tables. If the INSERT statement was interrupted and failed to copy data, drop the partition in subscriber_sources_activities and repeat the INSERT. If a partition was copied successfully, proceed to the next partition.
Links
- focus: