Emptying MySQL Tables Without Impact

In a discussion with some Jr. DBAs I found the knowledge of how to prune tables in MySQL varies widely from person to person, as such I thought I would write a post on my two preferred methods. I know, some people work with datasets small enough they can just run DELETE FROM atable WHERE pk < 123456; but this is not the case for most of us.

Truncate = Bad

Truncate table, in modern versions of MySQL, takes out a global metadata lock, drops the old table, purges references from the LRU list, creates the new table then releases the lock. This can cause some significant locking. In addition it provides no rollback if, suddenly, your application team finds out in horror that they do actually need some or all of the data they asked to be removed.
In larger production environments there is still a need to remove data from tables, so how can one achieve this without causing impact?
The two methods I will discuss are the rename method or the pt-archiver method, both of which are kinder gentler method of doing the same thing, however there are also significantly different from each other.

pt-archiver vs table rename method

Concern Archiver Rename
Time to truncate a table Linear increase with size of table Consistently fast
Reclaimed ibd space No Yes **Assuming file per table
Persistent trimming Yes No
Complex initiation No Yes
Needs to be monitored Yes no
System resource utilization Proportional with time to completion Consistently small cpu and io impact. Io impact can be scheduled for a time different then the time of table rename.
Works with PK targets Very well Not so well
Rollback and Data Recovery Difficulty proportional to amount of data. Fairly easy
Locking concern Small range lock potential for deleted records for the duration of a run segment.  Single sub-second lock during atomic rename

Summary:

pt-archiver is best when you have tables that need to be maintained at a small size or when cretin records need to be purged or the tables are small enough that personal preference matters.

Table rename works best when you have large tables that need to be truncated. However, both methods provide overlapping areas of use.

Scenario: Truncate table

Method: pt-archiver

# Using the `--dry-run` argument to verify query before executing
$ pt-archiver --source h=db1020.region.company.com,D=mapping_shard99,t=user --purge --primary-key-only --bulk-delete --bulk-delete-limit --limit 1000 --where 'user_id > 0' --dry-run
SELECT /*!40001 SQL_NO_CACHE */ `user_id` FROM `mapping_shard99`.`user_move_queue` FORCE INDEX(`PRIMARY`) WHERE (user_id > 0) AND (`user_id` < '12358132134') ORDER BY `user_id` LIMIT 1000
SELECT /*!40001 SQL_NO_CACHE */ `user_id` FROM `mapping_shard99`.`user_move_queue` FORCE INDEX(`PRIMARY`) WHERE (user_id > 0) AND (`user_id` < '12358132134') AND ((`user_id` >= ?)) ORDER BY `user_id` LIMIT 1000
DELETE FROM `mapping_shard99`.`user_move_queue` WHERE (((`user_id` >= ?))) AND (((`user_id` <= ?))) AND (user_id > 0) LIMIT 1000
  • Time to set up is typically 5 min or less
  • Time to complete is based on table size, row_size (io), limit_number, etc. 

Method: Table Rename

Impact is kept small and broken into several controllable areas

  • Rename (truncate)
  • Drop
  • File deletion

Rename

Table rename

master_mysql> CREATE TABLE sometable_tmp LIKE sometable; RENAME TABLE sometable TO sometable_old, sometable_temp TO sometable; 

At this point the effects of the truncation are completed from the applications point of view. However, you still have effective rollbacks as all the data is still available.

Rollback:

  • You can either rename the tables again and backfill the new information
  • Or fill specific records from the old table to the new one.

Whatever makes the most sense for your data. You can do selective MySQL dump from the old table on a slave to preserve the data in perpetuity or wait for a period of time or simply delete it. However, if it is a large table you may want to provide a hard link before dropping it. 

Here are some sample times to create and rename a table

Create table

+----------------------+----------+    +----------------------+----------+ 
| Status               | Duration |    | Status               | Duration |
+----------------------+----------+    +----------------------+----------+
| starting             | 0.000067 |    | starting             | 0.000064 |
| checking permissions | 0.000006 |    | checking permissions | 0.000006 |
| checking permissions | 0.000005 |    | checking permissions | 0.000005 |
| Opening tables       | 0.000381 |    | Opening tables       | 0.000100 |
| creating table       | 0.002232 |    | creating table       | 0.002707 |
| After create         | 0.000012 |    | After create         | 0.000011 |
| query end            | 0.000031 |    | query end            | 0.000027 |
| closing tables       | 0.000009 |    | closing tables       | 0.000008 |
| freeing items        | 0.000017 |    | freeing items        | 0.000018 |
| cleaning up          | 0.000018 |    | cleaning up          | 0.000017 |
+----------------------+----------+    +----------------------+----------+

Rename profile  Expand source

+----------------------+----------+   +----------------------+----------+   +----------------------+----------+    +----------------------+----------+
| Status               | Duration |   | Status               | Duration |   | Status               | Duration |    | Status               | Duration |
+----------------------+----------+   +----------------------+----------+   +----------------------+----------+    +----------------------+----------+
| starting             | 0.000017 |   | starting             | 0.000019 |   | starting             | 0.000057 |    | starting             | 0.000040 |
| checking permissions | 0.000002 |   | checking permissions | 0.000001 |   | checking permissions | 0.000004 |    | checking permissions | 0.000003 |
| checking permissions | 0.000002 |   | checking permissions | 0.000003 |   | checking permissions | 0.000004 |    | checking permissions | 0.000003 |
| checking permissions | 0.000001 |   | checking permissions | 0.000001 |   | checking permissions | 0.000001 |    | checking permissions | 0.000001 |
| checking permissions | 0.004110 |   | checking permissions | 0.002985 |   | checking permissions | 0.004485 |    | checking permissions | 0.005110 |
| query end            | 0.000019 |   | query end            | 0.000020 |   | query end            | 0.000038 |    | query end            | 0.000042 |
| closing tables       | 0.000005 |   | closing tables       | 0.000005 |   | closing tables       | 0.000010 |    | closing tables       | 0.000007 |
| freeing items        | 0.000013 |   | freeing items        | 0.000008 |   | freeing items        | 0.000033 |    | freeing items        | 0.000132 |
| cleaning up          | 0.000008 |   | cleaning up          | 0.000017 |   | cleaning up          | 0.000023 |    | cleaning up          | 0.000016 |
+----------------------+----------+   +----------------------+----------+   +----------------------+----------+    +----------------------+----------+

Drop the table:

Hard links

It can take several seconds to remove the underlying inodes from a large file. During which time innobd has a global mutex, and here is where the impact of the original truncate comes in to play. To help prevent this we can create hard links on large .ibd file so that the file system believes the inodes are in use and will not remove them. This means when MySQL reaches out to the underlying OS to remove the files the OS simply removes the file handle and returns instantly.
At this point you can delete your hard link, which will cause some IO issues,but will otherwise not effect MySQL’s operations. 

On all servers:
Create hard link:

bash> ln dir/database/sometable_old.ibd  hardlinks/sometable_old.ibd 

Drop Table:

To purge the ibdata from MySQL you simply drop the table like always, and again, this can be done anytime, such as during non-peak hours. Unfortunately, this can still cause micro stalls in opining tables due to InnoDB’s need to go through LRU list and discard the pages which belong to this table. But since you can do this at any time you can reduce the impact. Furthermore since the tables are not in use you can do this one node at time by SET GLOBAL BINLOG = 0

Drop Table

master_mysql> DROP TABLE sometable_old

Again, the above can also be done no-binlog on any one server giving you further control

Here are some times for dropping some medium files:

drop table bigtest1                   drop table sbtest21 
(105GB 200Million Rows)               (48GB 200Million rows)
+----------------------+----------+   +----------------------+----------+
| Status               | Duration |   | Status               | Duration |
+----------------------+----------+   +----------------------+----------+
| starting             | 0.000053 |   | starting             | 0.000043 |
| checking permissions | 1.761864 |   | checking permissions | 4.183406 |
| query end            | 0.000037 |   | query end            | 0.000035 |
| closing tables       | 0.000010 |   | closing tables       | 0.000008 |
| freeing items        | 0.000024 |   | freeing items        | 0.000023 |
| logging slow query   | 0.000001 |   | logging slow query   | 0.000003 |
| logging slow query   | 0.000002 |   | logging slow query   | 0.000002 |
| cleaning up          | 0.000016 |   | cleaning up          | 0.000017 |
+----------------------+----------+   +----------------------+----------+

File deletion:
At some point, we will want to remove the underlying inodes, which can cause some IO issues just like deleting any large amount of data but also frees up space and since this is no longer visible toMySQL causes no internal locking.

rmfile

rm hardlinks/sometable_old.ibd

Caution:
Just like truncate this can cause issues for tables that have triggers or foreign key relationships. Ensure they are removed before proceeding.
Also consider not using them in the future with MySQL 😉

Prologue

Other examples of overlap include things like keeping ~30 days of records in a table.

pt-archiver can constantly delete anything over 30 days. Otherwise you can partition on time, and drop old tables. Method chosen depends mostly on the amount of data flowing and complexity vs performance desired.   

Similarly, if you want to keep all records newer then a few days, and you need them prepopulated before swap the tables, you would simply need to populate the new table with those records in a loop until the difference is small. Then take out a global read lock (this will cause impact if your application does not auto retry) while the final transfer and rename occur. This would spew errors and is not practical in many environments making pt-archiver the obvious choice. If you use an external sequence for primary keys and an application smart enough to hold and retry for half second of impact, then using rename may be a viable choice.

Worlds Largest CRIT Database

Mongo Achievement

Our CRIT database manages over a million signatures and as of December 2014 is the worlds larges and keeps growing. We have plenty of room for it to grow with less then 3% of the clusters theoretical capacity purchased we can keep adding nodes for the foreseeable future. I implemented Mongo for a few projects before this but this was the first large auto shareded system where I had to pre-allocate shards to keep the system from collapsing when the application started. I built these servers from scratch and configured the systems and given the hardware I had to work with I am quite pleased with how it turned out.

Worlds Largest PKI Implementation

PKI Achievements

We are clearly one of the largest CA’s in the world, however the largest depends highly on one how you measure. In any case we represent, again depending on how you measure, between 1/4th and 1/3rd of the worlds total SSL certs. And I am told by a number of people who have worked on our system and others, as well as by our auditors, that we are one of the finest CA’s in the world.

While I have little to do with the application development, and the current architecture was mostly inherited, I do manage the databases for this. At one time we also boasted the worlds most efficient OSCP responded written by my former boss Chris G.  I can not say enough good things about him, and the entire team I work with. They are all amazing people! The environment went stagnant for a bit but I am proud of where we are going with PKI; as these handful of instances our now my secure baby  🙂

Worlds Largest Spacewalk installation

PostgreSQL Achievement

We have the worlds largest Spacewalk installation with many tens of thousands of nodes. This is all managed by a single PostgreSQL database (with a wall replica for disaster recovery). This lovely piece of kit was set up by an amazing former coworker Bo J. I inherited it and really wish I could spend more time optimizing. The servers resource utilization is sub optimal as is performance and I wish I had more postgreSQL skill, but my expertise lies in MySQL and Mongo. Give me six months and I will make postgreSQL sing 😉

Worlds Largest Authoritative DNS

DNS Achievements:

I am responsible for the MySQL back-end for the worlds largest authoritative DNS with over 37 million zones.
When I inherited this system (Over 10 million zones ago) it was fraught with problems. However with diligence and an eye for detail, and lots of help from the application team, I was able to remove the land minds, document the complicated infrastructure, and implement a streamlined design that lends it’s self to automation. I started with 4, significantly expensive, anycast pods and a hodgepodge of master. Since then we have increased our zones by over a third and while added we added a new pod I also worked hard to stabilized the masters and pods with little additional hardware.
We are on the verge of completing a new phase where the masters will be bi-coastal and dozens of new pods will drop our premium DNS to response time. Continue “Worlds Largest Authoritative DNS”