Worlds Largest Authoritative DNS
DNS Achievements:
I am responsible for the MySQL back-end for the worlds largest authoritative DNS with over 37 million zones.
When I inherited this system (Over 10 million zones ago) it was fraught with problems. However with diligence and an eye for detail, and lots of help from the application team, I was able to remove the land minds, document the complicated infrastructure, and implement a streamlined design that lends it’s self to automation. I started with 4, significantly expensive, anycast pods and a hodgepodge of master. Since then we have increased our zones by over a third and while added we added a new pod I also worked hard to stabilized the masters and pods with little additional hardware.
We are on the verge of completing a new phase where the masters will be bi-coastal and dozens of new pods will drop our premium DNS to response time.
The architecture I inherited is just a hodgepodge of master database servers in on data center and anycast pods in 4 other DC’s. The replication map was so complicated that it could not be easily mapped using Visio or Excel or anything flat. On night I broke down and modeled it using a relational database and a small script to harvest and map the environment.
However the new design will significantly increase the size and decrease the complexity. This is important as we automate more and more actions, including hot swings of the masters between datacenters. Unfortunately the new design will not be fully implemented for another two years. The upgrades and minesweeps to date have taken that long already, And I can say that some of those land minds were quite difficult to disarm without service interruption, wish I could go into detail. With regard to the time, resource constraints are not unfamiliar to me and we just keep moving forward. Once complete I will post a picture of the architecture I inherited.
Manage the MySQL Databases for the worlds largest authoritative DNS:
As of today we are authoritatively serving well above 37 million zones.
Achieve over 3 Million QPS
Some people may not be impressed with this, after all I can get a 200KQPS (ideal setup, 10-20K is more realistic for most schema) on a single stock Dell PE server but our infrastructure is fairly compact and the nature of anycast targets the queries on just a few nodes. In truth this systems has seen over 9 million quires per second during an attack but that almost never happens due to our strong ddos mitigation capabilities. And since we are server only our zones authoritatively we make up less then 10% of all DNS traffic most months.
Achieve over 99.999% uptime
With exception of a networking issue a few years back (It is never good when your outage makes CNN) we have provided uninterpreted service. Our systems automatically address and mitigate failure. We still have room for improvement and the next few years should bring me to the end of my five phase plan and service will be greatly improved.
Achieve Top 10 DNS response times
We fluctuate a lot, but usually fall in the top 10. When our plans are done we should consistently be in the top 5.