DBA Dev Ops http://onealeng.com I manage your data. Would you like to know how? Sun, 08 Aug 2021 20:07:22 +0000 en-US hourly 1 https://wordpress.org/?v=5.8.1 196342783 People Don’t Scale: The fallacy of staffing for failure. http://onealeng.com/people-dont-scale-the-fallacy-of-staffing-for-failure/ http://onealeng.com/people-dont-scale-the-fallacy-of-staffing-for-failure/#respond Sun, 08 Aug 2021 19:58:34 +0000 http://onealeng.com/?p=283   I am sure you have all heard the adage, people don’t scale. You may have even heard this from your VP as a canned response...]]> <quick rant while I sit in a waiting room>


I am sure you have all heard the adage, people don’t scale. You may have even heard this from your VP as a canned response to the request for staffing. But is it true?


Kind of; yes and no.

Amazon is one of the worlds largest private employers. But they are also the leaders in automation to reduce the need for people. In the end, people have helped them scale very well, but only so far as they employed enough people to fulfill their current needs and more to help increase the profitability of each employee.  


As an analytical person I would like to propose the following hypothetical math problem.

nasal desmopressin Given: 

  • You have an old boat that leaks.
  • The boat has a capacity of 100 tons of cargo. 
  • The more cargo you add the more the boat leaks.
  • There is no limit to the amount of cargo people want transported.

This inflow of water represents the toil generated by operational inefficiencies. Be that from poor procedures or a lack of suitable automation. In and of itself some inefficiencies are OK so lets set some more boundaries to our math problem.

criss cross poker Given: 

  • You can make one cargo run trip per day
  • It takes 100units of currency per day to keep the boat operating with no crew and no cargo.
  • It takes an additional cost of 100 units per run to operate the ship.
  • For every ton of cargo added on your trip you gain 100units of currency. 
  • For every ton of cargo you transport you incur 50 units of cost in fuel, loading, unloading, etc.
  • After 5 tons of cargo the leaking is so bad your ship would sink before it completed a run.
  • You can hire a person to bail the water at a rate of 10 units per day to bail water and keep up with each ton of cargo.

Ok lets see if we can construct a profitability curve.

  • Cargo = X (in tons)
  • Profit = X*(100-50) – 2X (In units) – 200

In this case it makes perfect sense to hire 100 people to bail water.

However, you may notice that this is an overly simplistic view and this is not how people work in your office! This is true, lets make things more complicated. For example, let us say for every 10 employees you hire you need a manager who makes twice as much as the average employee they manage. And for every 10 managers you need a director of operations.

Now, we could model this as a more complicated step function but as not everyone in my audience has a minor in math, and since I am writing this from my dentists waiting room I don’t really want to start creating and sharing excel sheets at this time for simple examples like this. So… I will simplify the labor cost model thusly

best slots to play at casino Lom Leak Mitigation Labor = 

  • F(bailers) = 2x 
  • F(Managers) = (1/10 * F(bailers) ) * 2
  • F(Directors) =  (1/10 * F(Managers) ) * 2

https://barbs.com/9-cat/casino_10.html Or 

  • 2x+.4x+.08x

agrimectin ivermectin Or simply

  • 2.48x

Here again, it is most profitable to max out your bailers. But each bailer reduces your marginal return per ton of cargo, eventually to zero if you have enough ships. 

Of course when you have 100 bailers you can increase efficiency of each bailer through process, but you also can decrease efficiency through process.

For example, A bucket brigade will bail many times faster than individuals running between decks getting in each other’s way. Ideally, that is what your manager and director should be supporting, but being removed from daily bailing they may have little context to do so and thus need to listen to the bailers and have. The problem here is those who may have expertise may lack the legitimate power to enact the change or even gain an audience with those who can. This is because seldom in our industry are people who can act on the information, incentivizing those who have the information to share it. Please see my rant on rock stars for more about that.

Fortunately, our world is more complicated and more wonderful, because not only can we employ bailers, we can employ people to improve the bailing process, fit bilge pumps into the ship, fix leaks, increase the engine efficiency, and even build a second, better, boat!


So what seems to be the problem?
The problem is 2 fold; Communication and Improper Scale.

Why, you may ask, do we seem to hire people to build a better boat, but then employ them as bailers?  Because we don’t have enough bailers to keep us from sinking. And upper management then says we have tried employing boat builders but we still don’t have a better boat so clearly, people are not the solution. 


Scale is something obvious to most people in operations. You require 5 people to do the daily toil work, bailing the water, and this number increases by 50% per year, but management only allots you one new req every 2 years. You will never be able to make much progress on reducing toil, installing pumps, fixing leaks, etc. Thus you will need to keep increasing your staff as you will never gain efficiency because you can not afford to assign people to toil reduction and system improvement.

Also an issue in scale is when management gives you the req for high end specialists, people who can install the bilge pumps, without realizing the problem is not that you don’t have the skill, after all you have 3 bilge pump experts already, but those experts don’t have time, and neither does the new one, who is now bailing water to keep the ship from sinking. 

What is worse is the experts salary could have paid for 2 or 3 bailers, freeing the experts they already have. Instead, by employing a new expert, when the team can’t allow the ones they have to do what they were hired to do, management causes resentment amongst the employees they have, reducing efficiency and making it harder to replace them when they leave; thereby increasing the cost of labor even further. 


The second issue is communication. The IT engineering world is not like other engineering disciplines. We don’t generate project proformas or generate P&E reports. In fact, I would hazard a guess that most pure IT companies can’t tell you which of their cost centers are internally profitable and which need to be outsourced. Some of the worst accounting I have ever seen has been in Info Tech mega corporations. 

So, how do we fix this? Show management how the resources provided will scale and return substantial benefit to the company.

  1. Build proper project cost and profit evaluations
    1. All projects must show how they generate value through revenue generation or cost reduction. Preferably in the central currency of the company. 
    2. Labor is not a sunk cost, it is an opportunity cost, and should be accounted for in all project evaluations.
    3. Every internal resource should show the costs of supporting projects up the chain so you have a better understanding of the true cost of any project.
    4. Projects should be configured so that value is generated from every phase of the project.
  2. Every project should be ranked by both total profitability and return on the individual investment before deciding where to put resources. 
  3. Proper communication of every project and its current state should be top priority. Management can not decide how to pivot without the information from the people working on the projects and the people working on the projects will be resentful of changes without understanding the reasons. Particularly when many projects will depend on certain phases of other projects to be completed on time.

In conclusion, people do scale. And while the cost of adding each new person may increase, the profit each person can generate may also increase if they are handled correctly. However you have to show management how each project will help the company scale. Doing this at all levels will greatly increase both employee satisfaction and company profits.  

Of course this is a massive over simplification, and investment needs to be made just to start the process of understanding where investments need to be made. But I hope you can appreciate some of this rant and understand how we, as a collectives industry, found ourselves in this mess and how the companies that work to get out of it can sore to the top

http://onealeng.com/people-dont-scale-the-fallacy-of-staffing-for-failure/feed/ 0 283
Agile Development – A 30 Minute Summery http://onealeng.com/agile-development-a-30-minute-summery/ http://onealeng.com/agile-development-a-30-minute-summery/#respond Tue, 15 Oct 2019 03:41:30 +0000 http://onealeng.com/?p=197 Sometime back I was asked to do a 30ish minute presentation about Agile Development for the Phoenix Linux Users Group, or PLUG for short. Getting the reminder email around noon on the day of the presentation I pulled up the slide deck I had started and, realizing that all I had was a title page, started adding content 😉
In the time since I have changed it very little and been asked for copies several times. And every time I find it harder to locate a copy – so I am now posting it here 🙂

Here is the deck – I hope you enjoy 🙂

☝Low quality preview

If you happen to find errors please send me some notes and I will make corrections 🙂

And when you are done with that I highly recommend the following videos

A equally quick and slightly cynical explanations: https://youtu.be/jNhRX-RBs_4
Some real online training: http://scrummethodology.com/
A humorous review that should be watched every
few weeks while you are in the processes of adopting:
And some more real training 😉 https://www.atlassian.com/agile




Below you will find a high quality version of the slide deck – – Let me know if you use it.

http://onealeng.com/agile-development-a-30-minute-summery/feed/ 0 197
Increasing ownership in your environment. http://onealeng.com/increasing-ownership-in-your-environment/ http://onealeng.com/increasing-ownership-in-your-environment/#respond Mon, 14 Oct 2019 16:44:00 +0000 http://onealeng.com/?p=261 Ownership:

What is ownership

buy priligy safely own·er·ship



noun: funplay jackpotcity warningly ownership;

  1. the act, state, or right of possessing, or being responsible for, something.

root: Own

adjective: own;

  1. of, relating to, or belonging to oneself or itself (usually used after a possessive to emphasize the idea of ownership, interest, or relation conveyed by the possessive):


What ownership provides for us:

There are three pieces to that definition I wish to unpack:

  1. Possession, or reaping the benefits of that which you own.
  2. Being responsible for, and having authority over, that which you own.
  3. The implication that ownership is related to a person.

A culture of Ownership differs from a Rock Star culture in that with ownership you not only gain the personal benefit from things you produce or maintain for others, usually in the form of reputation, but you also suffer when what you produce is not of significant quality, or does not make the lives of your users better. In a Rock Star culture, you gain the benefits simply by producing something new and flashy, or worse by constantly “improving” that which you produced, even if this is at a detriment to those who use your product by adding features no one wants or constantly changing contracts simply to keep buzz flowing. Even when this reduces the stability, and/or, the usability for the product.

For an organization ownership improves quality, accountability, and momentum.

In an organization, when employees understand their responsibilities across hierarchical levels, it helps in strategizing and achieving organizational goals in a more efficient and effective manner. From an overall organizational perspective, it ensures individuals will proactively identify potential loopholes in a particular process and taking appropriate measures to prevent re-occurrence. It also increases the speed at which we can achieve changes as people are invested in learning and understanding their product as much as possible. This happens as a natural product of ownership. If a person owns a product, they are responsible for this product. If the product does well their reputation for quality products will increase, they will spend less time repairing the product, and can move quickly on improvements that customers desire. This is because they have the most experience with the product because they are the ones working with the product the most, a positive feedback cycle that increases efficiency, reliability, and momentum. Ironically accountability also increases momentum, it helps with the Fail Fast / Get Shit Done culture that has fueled the information sector over the last 30 years. Because if people know they have the authority to do something, and the responsibility to fix it if things don’t work out, they move faster and provide better support for their users. This also benefits managers and project planers, as the owners can more accurately estimate the costs of a project related to the product. In addition, the costs will generally be lower due to the higher familiarity and increased desire for reliability. Thereby, ownership is one of the key steps towards effective leadership and management.

Ownership also provides benefits to the individuals as well. In the competitive environment of information technology, all individuals need to project their skills, talent, and attributes not only to help them stand out, but move the company forward.

  1. Ownership serves as a key factor in accelerating the growth and development of an individual.
    1. It strengthens the employee-employer relationship and also instills a sense of mutual trust and confidence within the workplace.
    2. It also helps to build relationships within functions, departments in the organizational hierarchy.
  2. Employees become more productive and action-oriented.
    1. Problems are fixed faster by those who have the most knowledge on the product.
    2. Finding and solving these problems is done more proactively since they will be ultimately responsible.
    3. This in turn fosters personal growth in the skill set required for quality.
  3. Distributing ownership of things across a team.
    1. This increases team bandwidth as fewer projects suffer from a single gate keeper.
    2. Increases the number of failure domains reducing impact when people leave a team.
    3. No one is blocked by “Rock Stars” who are seeking to carve out a fiefdom but other wise do not have strong expertise on the product.


When we put this together we see the clear benefits. All organizations have a subset of resources (people) who do not believe in taking any extra initiative apart from their usual work responsibilities. For them, it’s just completing the log-in hours sitting in their own comfort zone. Likewise, a lack of ownership increases the number of people who feel this way as they do not feel empowered to act. So, when we increase ownership we impact employees in the following ways:

  1. Motivation levels increase and become more aligned with the organizational goals and objectives.
  2. Employees no longer lose out in competition with a more aggressive and vocal pool of resources who love taking the spotlight.

This increases the talent of all resources and increases the productivity of underutilized resources for the betterment of the organization as a whole.

Furthermore, ownership provides the authority and responsibility to act, which, brings us to the controversial third part of that definition I wanted to talk about. Who owns something.

All of the benefits described only work if the “thing” [project, product, program, feature, etc.] is owned by a person or very, very, small group of people. If the thing in question is owned by a large team or department this will not work. To demonstrate this I ask, if you were suffering a heart attack and needed someone to call for an ambulance, would you prefer to be in a small room with just 1 or 2 other people, or in large open space with a hundred other people who can hear your cry for assistance? The naive approached is to assume that the more people around the more likely you are that any one of them will help. But this is not the case due to the diffusion of responsibility. In fact, you are significantly more likely to get help more quickly in a room of just one other person than in a stadium of tens of thousands of people. This is because no one is sure if they should be helping, or will they be in the way, or what happens if they do the wrong thing? How will this affect them? After all, there are so many people around, certainly someone else will help. However, if there is only one person in the room, they know they must act, and quickly, or you may die.
To bring that around to ownership, when a person knows they are responsible for the outcome of their work, only then will their goals align with that of the company. As they personally feel the cost of producing poor products and simultaneously feel empowered to produce quality products now, when they, the person most knowledgeable about the product/service in question, see benefit to doing so.
On the other hand, in a “Rock Star” culture the emphasis is forcing the creation and adoption of the new, then quickly divorcing yourself from that which you created in the past as those who use it suffer with ever changing contracts, poor performance, buggy implementation, or simply the requirement to use the product that does not meet their needs.

Even if no one is a Rock Star, you still cannot sustain core knowledge across a large team. This is because the effort to properly communicate this knowledge increases at a rate of N*(N-1)/2. Similarly, as the number of projects your team manage increase, the amount of effort an employee would need to invest increases at the same alarming rate.
As an analogy to explain this think of medical professionals. A brain surgeons and heart surgeons both know a ton of advanced medicine. Both will be on the hospitals surgical team.  However, if your are going in for heart surgery which one do you want to work on you? Sure, if it is an emergency and the heart surgeon isn’t available, at worst you would call in the brain surgeon – but it is reasonable to expect sub-standard results!

The traditional Agile way is to decrease the size of the teams, but this increases the burden on managers and creates silos. These silos block closely related knowledge and foster duplication of work, or worse actively block architectural changes. However, in a culture of ownership you are responsible for meeting the needs of your customers and fixing what you built regardless of the team size.


Management Frameworks

It should be no surprise, given the benefits, then that nearly all the popular management frameworks emphasize ownership. Be it ITIL, COBIT, ISO 20000, or even Agile. However, each does this in different ways.

COBIT really focuses on owner ship for compliance and accountability, but otherwise aligns with what I am trying to present here, every “thing” has an owner. And by owner, I mean a person is responsible.

ITIL has a broader base over its many sections but here ownership is usually defined by who holds the knowledge related to the particular piece of the puzzle. But as I have mentioned, knowledge and ownership, i.e. accountability, responsibility, and authority, create a positive feedback loop.

Unfortunately, Agile is weak on ownership, having only two owners, neither of which are individual contributors. The product owner is responsible for the business objectives, for the cost benefit analysis of what is done when. The scrum master is responsible for ensuring the product owner’s desires are achievable by the team. The problem is when you have a large, diverse, team that is responsible for many ongoing concerns, there is little room for responsibility, accountability, or authority in the hands of the individual contributors under the Scrum framework. This actually can slow down progress, as an individual contributor may see an issue, but at best they have the authority to create a ticket to be groomed at some time, that may or may not be put into a sprint, and may or may not be assigned to them, at some point in the future when they have lost most of the context anyway. Probably after a problem has already occurred because only then is the cost of not doing it most visible.
In addition, when you have a half dozen people on your team’s poker meeting is a nonsensical if only one or two of them have ever touched the product who’s repair or new feature the scrum master is looking to scope. This wastes time and decreases engagement which leads to other unhealthy practices, like skipping poker and planning all together. 
It is for these, and numerous other reasons, I propose a simple addition to help elevate the malaise that can develop in the face of these stagnant processes with a complementary framework to solve the ownership problem in Agile Scrum.


Ownership Implantation: The Rule of Two

“Always two are the sith lord, a master and an apprentice” This philosophy is not only good for rapidly taking over a galaxy from a superior opponent, it is a recipe for happier, more productive, and more engaged employees.

I have spent the last while extolling the virtue of ownership, but what happens when someone no longer wants to own a product, they want growth, they want change, or worse, they want to leave the company and take their knowledge with them? This is why you need multiple owners. Typically, a master and an apprentice. The rankings are somewhat important for the simple reason that, when everyone feels they are the smartest person in the room, nothing gets done. And also, the buck must stop with a single person. But this too can rapidly fester if certain rules are not put into place.

  1. Every thing (project, service, application, etc.) that requires work at any point should have two, or potentially three, subject matter experts (SME); a Master (primary) and an Apprentice (secondary). This is all ongoing concerns, not just new projects. So that program the team built and is occasionally used but has no active work ALSO needs owners. We cannot have feral, zombie projects if this is going to work.
    The exceptions to this is
    1. Any “thing” so basic that any generally qualified individual off the street should be able to deploy some “thing” with less than 20 minutes of focused training.
    2. We are not looking to automate or improve the process in any way.
    3. No one has any passion for this thing.
    4. The thing that needs work is not an ongoing concern, which is rare but does pop up from time to time.
  2. A manager should be able to ask any question regarding that project to either SME and get an answer. If the secondary on the project cannot answer the question the primary is not handling their responsibility and should consider having fewer projects.
  3. Project ownership should be evenly distributed. No one person can have more than 1.25 the average projects. Also, no one should have less than 0.75 the of the team average. This forces people to focus on their passions and for masters to train their apprentices, so they can move on. In addition to preventing people from being spread too thin it prevents gate keeping. Remember the whole point of this is to increase ownership, and ownership means autonomy and self-authority and this cannot flourish if one person owns everything. Ideally every person should have 2N/E things they own, either as a primary owner, or secondary owner, where N is number of projects and E is the number of people on a team. Preferably with a mix of highly active products/services and less active ones. And while some variation is allowed, and even necessary, it should be contained within the prescribe boundaries.
  4. People choose their projects based on passion.
    In practice this is only partially true. People will gravitate to projects they desire to learn from and projects that spark their passions. However, it is likely that any given project will have more than two people who want to work on it. It is up to the manager to take desires into account, along with skill sets, and business requirements, when solidifying the owners of each thing. But no one should be forced, perhaps incentivized, but never forced to work with someone, or on something, they do not desire to be part of.
  5. Mobility is also required. An apprentice should be able to switch projects as often as they desire in order to maximize their learning, or simply find a teacher they can learn from or a project that are passionate about. Unfortunately, a master can only move once an apprentice is ready to take over that project.
  6. A third person should be added to a project when the master is ready to give up their role to the apprentice, during which time the third person becomes an added apprentice.
  7. A master / apprentice (primary/secondary) pair cannot be static. Regardless if you are primary or secondary on a project you form a group of two people. Assuming a person works on multiple projects, which inevitably they will, this group should not be the only group a person belongs to. We want to avoid silos and have a natural flow of good practices and techniques, so a person should have multiple masters or apprentices spread across multiple projects.
  8. Every quarter someone (not everyone but someone) on the team MUST give up their role as master or apprentice on at least one project and move. It is likely people will choose to do this as new and interesting concerns are added to the product mix.
  9. A person may be master on some projects and apprentice on others. After the initial round of ownership, a person will always start as an apprentice to the existing project master.
  10. Meta project teams like unifying architecture, security, compliance, coding standards, etc. should have three equally ranked people to establish quorum. Each should be available for regular consultation for the design of other projects. These kinds of “things” produce nothing but documentation and help other people with their projects.

With these simple rules we can have every project owned in a healthy and productive way. And increasing ownership increases employee satisfaction, produce multiple subject matter experts per project, which increases project quality, and decreases time/cost of adding features or maintaining products.
It also has the side benefit of forcing managers to keep an inventory of supported “things” and how much time they consume. This allows us to focus on stable improvements, not chasing the shiny squirrel.

http://onealeng.com/increasing-ownership-in-your-environment/feed/ 0 261
Simulating a DC move with tc http://onealeng.com/simulating-a-dc-move-with-tc/ http://onealeng.com/simulating-a-dc-move-with-tc/#respond Thu, 07 Mar 2019 19:57:33 +0000 http://onealeng.com/?p=149 I would first like to warn you that I am not a tc expert. In fact, I consider TC to be a form of black magic. That said I will do my best to explain to you what little I know – mostly because I know I will need this information in the future so it is important to write it down 😉

What is TC

tc is a way to interface with kernel level traffic control settings. You can shape incoming or outgoing traffic, however, incoming traffic shaping is very primitive and thus most of what I will show you is shaping traffic leaving your server destined for some other server

Why use TC

Simulate network lag, congestion, or other issues.
Personally, I have a situation where, for one of my products, the primary data center is just a few miles from the redundant data center. This worked out well, however, we just built a new shiny data center across the country and the space we are leasing so this product can be redundant has been deemed unnecessary. I can totally see this. However…

  • The product has app servers in each data center.
  • These app servers write to a common database in one datacenter
  • These app servers will occasional read from the master – no matter what DC houses it
  • These app servers will other wise read things they wrote from their own data center.

This means when we add some speed of light issues we may have trouble. So how do we test this and correct these troubles before we move?

Basic shaping examples

tc qdisc add dev bond0 handle 1: root htb
tc class add dev bond0 parent 1:1 classid 1:11 htb rate 1000Mbps
tc class add dev bond0 parent 1:1 classid 1:12 htb rate 1000Mbps
tc qdisc add dev bond0 parent 1:11 handle 10: netem delay 200ms 100ms distribution normal
tc qdisc add dev bond0 parent 1:12 handle 20: netem loss 3%
tc filter add dev bond0 protocol ip prio 1 u32 match ip dst flowid 1:11
tc filter add dev bond0 protocol ip prio 1 u32 match ip dst flowid 1:12

In the above…

Viewing and deleting existing rules


When viewing existing rules we will use tc -s {qdisc|class} ls for list. Show also works instead of ls depending on how you prefer to remember such things. The -s indicates show and you can just as easily do something like tc -s qdisc show dev eth0

[root@devsrv04 STG boneal]# tc -s qdisc ls
qdisc pfifo_fast 0: dev eth0 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
..Sent 3233631430439 bytes 1623041258 pkt (dropped 0, overlimits 0 requeues 0)
..rate 0bit 0pps backlog 0b 0p requeues 0
qdisc htb 1: dev bond0 r2q 10 default 0 direct_packets_stat 401748689
..Sent 991209608724 bytes 983293850 pkt (dropped 19860, overlimits 542341718 requeues 0)
..rate 0bit 0pps backlog 0b 5p requeues 0
qdisc netem 10: dev bond0 parent 1:11 limit 1000 delay 200.0ms  100.0ms
..Sent 35467782723 bytes 78783479 pkt (dropped 0, overlimits 0 requeues 0)
..rate 0bit 0pps backlog 0b 5p requeues 0
qdisc netem 20: dev bond0 parent 1:12 limit 1000 loss 3%
..Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
..rate 0bit 0pps backlog 0b 0p requeues 0


Just use the del command, so tc qdisc del {root|ingress} dev {device}
You can also just delete all rules on a disk by issuing tc qdisk del

tc qdisc del root dev eth0

Class Vs Classless

I have to admit that I am uncertain about this. But i do know that for more complex layered filters you seem to either need to apply them to a class, or, layer them on a virtual interface.

In the end I need to do more research and testing on this


Again, I need to do more research here – come back for updates

Some links I found useful





reverting queu change






This post is dedicated to prior planning and finding out problems when you can roll back and fix at leisure. Instead of finding out problems when you have no choices. 🙂

http://onealeng.com/simulating-a-dc-move-with-tc/feed/ 0 149
Emptying MySQL Tables Without Impact http://onealeng.com/emptying-mysql-tables-without-impact/ http://onealeng.com/emptying-mysql-tables-without-impact/#respond Mon, 17 Dec 2018 03:41:00 +0000 http://onealeng.com/?p=241 In a discussion with some Jr. DBAs I found the knowledge of how to prune tables in MySQL varies widely from person to person, as such I thought I would write a post on my two preferred methods. I know, some people work with datasets small enough they can just run DELETE FROM atable WHERE pk < 123456; but this is not the case for most of us.

Truncate = Bad

Truncate table, in modern versions of MySQL, takes out a global metadata lock, drops the old table, purges references from the LRU list, creates the new table then releases the lock. This can cause some significant locking. In addition it provides no rollback if, suddenly, your application team finds out in horror that they do actually need some or all of the data they asked to be removed.
In larger production environments there is still a need to remove data from tables, so how can one achieve this without causing impact?
The two methods I will discuss are the rename method or the pt-archiver method, both of which are kinder gentler method of doing the same thing, however there are also significantly different from each other.

pt-archiver vs table rename method

Concern Archiver Rename
Time to truncate a table Linear increase with size of table Consistently fast
Reclaimed ibd space No Yes **Assuming file per table
Persistent trimming Yes No
Complex initiation No Yes
Needs to be monitored Yes no
System resource utilization Proportional with time to completion Consistently small cpu and io impact. Io impact can be scheduled for a time different then the time of table rename.
Works with PK targets Very well Not so well
Rollback and Data Recovery Difficulty proportional to amount of data. Fairly easy
Locking concern Small range lock potential for deleted records for the duration of a run segment.  Single sub-second lock during atomic rename


pt-archiver is best when you have tables that need to be maintained at a small size or when cretin records need to be purged or the tables are small enough that personal preference matters.

Table rename works best when you have large tables that need to be truncated. However, both methods provide overlapping areas of use.

Scenario: Truncate table

Method: pt-archiver

# Using the `--dry-run` argument to verify query before executing
$ pt-archiver --source h=db1020.region.company.com,D=mapping_shard99,t=user --purge --primary-key-only --bulk-delete --bulk-delete-limit --limit 1000 --where 'user_id > 0' --dry-run
SELECT /*!40001 SQL_NO_CACHE */ `user_id` FROM `mapping_shard99`.`user_move_queue` FORCE INDEX(`PRIMARY`) WHERE (user_id > 0) AND (`user_id` < '12358132134') ORDER BY `user_id` LIMIT 1000
SELECT /*!40001 SQL_NO_CACHE */ `user_id` FROM `mapping_shard99`.`user_move_queue` FORCE INDEX(`PRIMARY`) WHERE (user_id > 0) AND (`user_id` < '12358132134') AND ((`user_id` >= ?)) ORDER BY `user_id` LIMIT 1000
DELETE FROM `mapping_shard99`.`user_move_queue` WHERE (((`user_id` >= ?))) AND (((`user_id` <= ?))) AND (user_id > 0) LIMIT 1000
  • Time to set up is typically 5 min or less
  • Time to complete is based on table size, row_size (io), limit_number, etc. 

Method: Table Rename

Impact is kept small and broken into several controllable areas

  • Rename (truncate)
  • Drop
  • File deletion


Table rename

master_mysql> CREATE TABLE sometable_tmp LIKE sometable; RENAME TABLE sometable TO sometable_old, sometable_temp TO sometable; 

At this point the effects of the truncation are completed from the applications point of view. However, you still have effective rollbacks as all the data is still available.


  • You can either rename the tables again and backfill the new information
  • Or fill specific records from the old table to the new one.

Whatever makes the most sense for your data. You can do selective MySQL dump from the old table on a slave to preserve the data in perpetuity or wait for a period of time or simply delete it. However, if it is a large table you may want to provide a hard link before dropping it. 

Here are some sample times to create and rename a table

Create table

+----------------------+----------+    +----------------------+----------+ 
| Status               | Duration |    | Status               | Duration |
+----------------------+----------+    +----------------------+----------+
| starting             | 0.000067 |    | starting             | 0.000064 |
| checking permissions | 0.000006 |    | checking permissions | 0.000006 |
| checking permissions | 0.000005 |    | checking permissions | 0.000005 |
| Opening tables       | 0.000381 |    | Opening tables       | 0.000100 |
| creating table       | 0.002232 |    | creating table       | 0.002707 |
| After create         | 0.000012 |    | After create         | 0.000011 |
| query end            | 0.000031 |    | query end            | 0.000027 |
| closing tables       | 0.000009 |    | closing tables       | 0.000008 |
| freeing items        | 0.000017 |    | freeing items        | 0.000018 |
| cleaning up          | 0.000018 |    | cleaning up          | 0.000017 |
+----------------------+----------+    +----------------------+----------+

Rename profile  Expand source

+----------------------+----------+   +----------------------+----------+   +----------------------+----------+    +----------------------+----------+
| Status               | Duration |   | Status               | Duration |   | Status               | Duration |    | Status               | Duration |
+----------------------+----------+   +----------------------+----------+   +----------------------+----------+    +----------------------+----------+
| starting             | 0.000017 |   | starting             | 0.000019 |   | starting             | 0.000057 |    | starting             | 0.000040 |
| checking permissions | 0.000002 |   | checking permissions | 0.000001 |   | checking permissions | 0.000004 |    | checking permissions | 0.000003 |
| checking permissions | 0.000002 |   | checking permissions | 0.000003 |   | checking permissions | 0.000004 |    | checking permissions | 0.000003 |
| checking permissions | 0.000001 |   | checking permissions | 0.000001 |   | checking permissions | 0.000001 |    | checking permissions | 0.000001 |
| checking permissions | 0.004110 |   | checking permissions | 0.002985 |   | checking permissions | 0.004485 |    | checking permissions | 0.005110 |
| query end            | 0.000019 |   | query end            | 0.000020 |   | query end            | 0.000038 |    | query end            | 0.000042 |
| closing tables       | 0.000005 |   | closing tables       | 0.000005 |   | closing tables       | 0.000010 |    | closing tables       | 0.000007 |
| freeing items        | 0.000013 |   | freeing items        | 0.000008 |   | freeing items        | 0.000033 |    | freeing items        | 0.000132 |
| cleaning up          | 0.000008 |   | cleaning up          | 0.000017 |   | cleaning up          | 0.000023 |    | cleaning up          | 0.000016 |
+----------------------+----------+   +----------------------+----------+   +----------------------+----------+    +----------------------+----------+

Drop the table:

Hard links

It can take several seconds to remove the underlying inodes from a large file. During which time innobd has a global mutex, and here is where the impact of the original truncate comes in to play. To help prevent this we can create hard links on large .ibd file so that the file system believes the inodes are in use and will not remove them. This means when MySQL reaches out to the underlying OS to remove the files the OS simply removes the file handle and returns instantly.
At this point you can delete your hard link, which will cause some IO issues,but will otherwise not effect MySQL’s operations. 

On all servers:
Create hard link:

bash> ln dir/database/sometable_old.ibd  hardlinks/sometable_old.ibd 

Drop Table:

To purge the ibdata from MySQL you simply drop the table like always, and again, this can be done anytime, such as during non-peak hours. Unfortunately, this can still cause micro stalls in opining tables due to InnoDB’s need to go through LRU list and discard the pages which belong to this table. But since you can do this at any time you can reduce the impact. Furthermore since the tables are not in use you can do this one node at time by SET GLOBAL BINLOG = 0

Drop Table

master_mysql> DROP TABLE sometable_old

Again, the above can also be done no-binlog on any one server giving you further control

Here are some times for dropping some medium files:

drop table bigtest1                   drop table sbtest21 
(105GB 200Million Rows)               (48GB 200Million rows)
+----------------------+----------+   +----------------------+----------+
| Status               | Duration |   | Status               | Duration |
+----------------------+----------+   +----------------------+----------+
| starting             | 0.000053 |   | starting             | 0.000043 |
| checking permissions | 1.761864 |   | checking permissions | 4.183406 |
| query end            | 0.000037 |   | query end            | 0.000035 |
| closing tables       | 0.000010 |   | closing tables       | 0.000008 |
| freeing items        | 0.000024 |   | freeing items        | 0.000023 |
| logging slow query   | 0.000001 |   | logging slow query   | 0.000003 |
| logging slow query   | 0.000002 |   | logging slow query   | 0.000002 |
| cleaning up          | 0.000016 |   | cleaning up          | 0.000017 |
+----------------------+----------+   +----------------------+----------+

File deletion:
At some point, we will want to remove the underlying inodes, which can cause some IO issues just like deleting any large amount of data but also frees up space and since this is no longer visible toMySQL causes no internal locking.


rm hardlinks/sometable_old.ibd

Just like truncate this can cause issues for tables that have triggers or foreign key relationships. Ensure they are removed before proceeding.
Also consider not using them in the future with MySQL 😉


Other examples of overlap include things like keeping ~30 days of records in a table.

pt-archiver can constantly delete anything over 30 days. Otherwise you can partition on time, and drop old tables. Method chosen depends mostly on the amount of data flowing and complexity vs performance desired.   

Similarly, if you want to keep all records newer then a few days, and you need them prepopulated before swap the tables, you would simply need to populate the new table with those records in a loop until the difference is small. Then take out a global read lock (this will cause impact if your application does not auto retry) while the final transfer and rename occur. This would spew errors and is not practical in many environments making pt-archiver the obvious choice. If you use an external sequence for primary keys and an application smart enough to hold and retry for half second of impact, then using rename may be a viable choice.

http://onealeng.com/emptying-mysql-tables-without-impact/feed/ 0 241
Should We Teach Cursive In School? http://onealeng.com/should-we-teach-cursive-in-school/ http://onealeng.com/should-we-teach-cursive-in-school/#respond Wed, 19 Jul 2017 03:14:58 +0000 http://onealeng.com/?p=223 This has been a controversial topic for a while. And I thought we had put this matter to rest. But across my social media feed I still see arguments and it is time I simply wrote something out. This way I won’t have to repeat my self on each retort and can simply post this link.


Cementing knowledge by writing:

This is one I hear a lot of, that forming the letters with your hands some how causes you to remember the information better.
If this was the case, then would it not be even better to mold them out of clay then to write the? And how does writing in cursive some how make you remember better then printing or writing in short hand – what magic in this English cursive writing that some how has rocketed those, globally, few people into memory stardom – and how jealous must the Chinese, or Japanese, be given that their kids are not thought cursive writing?
The whole idea that cursive is magic in this regard is ridicules – the fact is this memory boost comes from re-encoding – the form does not matter. It is funny because if you ask a computer programmer the syntax for some boiler plate piece of code goes they will not write it! But they will start miming the movements of their hands on a keyboard and mouse as they recall it. Cursive is nothing special in this regard

It helps with spelling

Again, there is no proof in that cursive has anything to do with the effect..
All the studies say it is purely a matter of re-encoding the information and route practice. Be it printing, cursive, or typing.

Our hands should be multilingual:

This was a direct quote from an article published by the Federalist on the topic and using some work by Dr. Virginia Berniger to justify it. The argument goes writing in different characters activates different sections of the brain and aids in cognitive development. Why not use that time to teach the kids to be truly multilingual? learn Mandarin or Cantonese or Korean – And learn how to write them. They would certainly be more usefully he learning cursive or Latin – since both are dead and Chinese is not.

It is a faster:

Actually, no. Typing is the faster.
Most people can type 30 words per minute with very little effort. With a margin of practice, you can double that. And if you work on it for as much as kids work on had writing you can perhaps achieve 120WPM – although I have to admit getting much above that is very difficult.
However, in contrast writing is painfully slow. Most people write cursive around 40 characters a minute. The absolute fastest of us do 120 characters a minuet, again characters, not words. And this can not be done for hours on end without significant pain. Of coerce, if they were to give up cursive, and move to something performance driven, like shorthand, those same dedicated people could bump that to 300 words per minuet.

It helps with Dyslexia:

This one offends me, not only is it based on bad science but dyslexia are a very real and debilitating problem. I can tell you, as some one who suffers from it, I have almost no issue reading printed word, preferably in a fixed width serif font.
But cursive? It is certainly not easier to read. Perhaps it may have been easier to write as a child – I have no recollection if it being easier. However, it is in no way easier then typing. Don’t believe me? Look at the research put out by the association for people with dyslexia.

But what about historic documents!

I have never heard anyone say this that has actually had to do primary research on historical documents. Why? Because they are not easy to read, even if you know cursive. Hand writing is almost universally atrocious, the documents are not in the best condition, the language has shifted, etc. The point I am trying to make is you need to put effort into it Here is an example of something PRINTED http://www.bl.uk/learning/timeline/external/coffee-tl.jpg
Now here is something hand written with impeccable penmanship https://s-media-cache-ak0.pinimg.com/originals/c2/4e/5e/c24e5e5212bf1e24fd5676920a752f47.jpg
This one is also extremely legible http://www.bl.uk/learning/timeline/large126714.html
And you know what – give this to a kid who has never seen cursive with a primmer and after a few weeks they can read it to you just as easily as any one else.
All of this is in English – a rear treat when doing research
But how many kids in our classrooms today are going to grow up to be historians and read original works that have not been scanned, translated into digital text, and put up on the web. Ask most 30 year olds to day if they have ever gone to a library and pulled up a copy of a document from the 1600’s and had to read it for some reason?
And if this is the case – why are we not teaching shorthand so kids can read those documents? Or teaching English from before the great vowel shift so kids can understand those documents? Because very very few people need that specialized knowledge. Are you shocked I know I’m not.
(short hand example http://www.thehistoryblog.com/wp-content/uploads/2014/01/WWI-diary.jpg)

It keeps our brains active in old age

I kid you not – one of the major arguments of wasting time teaching a 2nd grader to write in cursive is so they will have active minds in their old age. using chopsticks and painting does the same thing – no need to learn cursive. Something they won’t likely use in 80 years anyway.

You need it for your job:

These days typing and the ability to use computers is the new literacy, without these skills it is very difficult to remain employed. Cursive? Not so much

There are simply NO jobs that require cursive writing – well almost no jobs… If you teach cursive, such as a grade school teacher, or are a professional calligrapher, or an administrative assistance for some really old guy who still writes in cursive all day, then yes. Your job will require you to read cursive, maybe even write it. I bet you could find perhaps a half dozen jobs in the US paying over $100K per year that require cursive.
Rounding, 0.00% of all US Jobs paying 100K/year or more require cursive handwriting. While nearly 20% of all US jobs making over 100K / year require programming experience. The cost is the same to teach our children a useful skill, or a useless one. Which will it be?



It is simply a waste of time, like teaching kids how to use a slid ruler in order to perform log functions. Did you need to learn how to use a slid rule in high school? No? Then shut up you are too young to be giving the “Back in my day” speech to anyone!
Kids speed over 200 hours learning cursive writing. In 200 hours we could be teaching more math, more science, more logic, engendering and technology skills, how to program, serious home economics and other life skills – like how to manage money and invest! I think if we took 200 hours and thought kid’s serious macro and micro economics our country would be far better off then teaching them cursive hand writing.
Let us face it – very few six figure salaries are held by people who write by hand all day. Most are held by people who type all day.
If you are under 50 and you want to make a decent living and write by hand you need to be a doctor, particularly one with a nurse who translates your hand written into typed notes latter. 10 years ago I said Doctor or lawyer, but now days you are not going to make partner in a decent law firm if cursive is your primary form of written communication.

There is no reason our kids should have to learn writing, a difficult and painful process, twice. And if you are going to pick on, pick print. It is in every book, news paper (remember those?), and web page.

So why the outrage?

So what is the real reason people are upset over the cut of cursive from the curriculum? Something is different then when they were kids and that both scares and angers them. Much like if Pluto is a planet or not.

But this is one old American’s opinion. If you, you want to see what our counterparts across the pond think please read this article by Philip Ball 😉

Curse of cursive handwriting

http://onealeng.com/should-we-teach-cursive-in-school/feed/ 0 223
Advanced MySQL User Permissions http://onealeng.com/advanced-mysql-user-permissions/ http://onealeng.com/advanced-mysql-user-permissions/#respond Fri, 18 Mar 2016 15:21:00 +0000 http://onealeng.com/?p=177 At a recent (the first) Arizona MySQL Professionals Meetup I gave a presentation on Advanced User Grants. Below you will find links to the video (poor quality recordings will improve) and the slides.
Hope you find it useful 🙂

Recording: https://youtu.be/D7jXVYFHI5A

Slides: Advanced User Permissions

Here is a PDF Version of the slides: Advanced User Permissions

http://onealeng.com/advanced-mysql-user-permissions/feed/ 0 177
Managing the OOM Killer among us http://onealeng.com/managing-the-oom-killer-among-us/ http://onealeng.com/managing-the-oom-killer-among-us/#respond Thu, 21 Jan 2016 19:30:32 +0000 http://onealeng.com/?p=139 Like me you have probably had your share off issues when something runs amok and OOM (The default out of memory killer) slaughters something that was behaving well and is, as it turns out, critical to your operation.
In my case MySQL is the primary target for OOM because it consumes the most memory on any given server. But occasionally someones PHP code will start to get out of hand on those same servers. And I want OOM to kill said out of hand code and thus protect my primary application! But oom does not… Instead it kills my one big app to protect the small but hungry programs. And this is not a gentle death either, it is a full blown Kill -9 to the face! So how do I make my app less of a target?

The method changes over the course of OS’s and versions but the two ways showcased here are mostly compatible and work on a wide variety of systems.
The Adjustment, and the Score.

The OOM Adjust

Adjustments are done live, in proc, per pid. You can see why this method has started to loos favor, but oom still respects it so go nuts.

echo -15 > /proc/1234/oom_adj

Valid values are from -15 to +15 where higher values incur more agro from oom killer and thus make you a more likely target. If you go low enough, say -17, you are supposed to be exempt from the relentless blade of our favorite executioner.
However you can see how this would be extremely cumbersome, manually adjusting our frightful pids each restart! As such I quickly added the following to our init script

getpid() {
#Give up to 10 seconds for a pidfile to be created
    for i in {1..20}; do
        if [[ -e $pidfile ]]; then
            pid=$(cat $pidfile 2> /dev/null )
        sleep .5

start() { 
    if [[ -n $pid ]]; then
        echo "-1" > /proc/${pid}/oom_adj

NOTE: How you chose to set your pid variable is up to you. For MySQL we set a pid file in the conf so all I need to do is read it. Others can be found in the default /var/run/ directory, particularly if you are using a init.d script with __pids_var_run(). However for others you may be forced to run a ps | grep to retrieve your pid.
Also this only sets the pid on start. If your app trips and have an agile processes restart you then you lose your buff and can once again draw OOM aggro.


OOM Score

Another method is the oom score conf file! This one also provides finer control with scores ranging from -1000-ish to +1000-ish**. Also this method is seriously easy, just lay down a conf file in /etc/init/${SERVICE}.conf (Yes ${SERVICE} is a variable to be replaced by the name of your service).
The contents of this file runs far and wide and I recommend looking into it, however, the the purpose of this discussion you will want a line that simply says something like oom score -1000. Again the lower the score the better protected you are agents OOM and the higher the score the more likely our oom friend will hunt you down.
While I recommend using puppet (or chef or other configuration management system) to manage this file, or at least in absence of that lay it down with the install rpm, I will provide a init script snipit you can use.

start() {
    if [[ ! -e /etc/init/${SERVICE}.conf ]]; then
        echo "oom score -50" >> /etc/init/${SERVICE}.conf
        #Only set it if not other wise set
        grep oom /etc/init/${SERVICE}.conf &> /dev/null
        if [[ $? -eq 1 ]]; then
            echo "oom score -50" >> /etc/init/${SERVICE}.conf

**The score is based on either the old badness() function or a strait 10x percentage of available memory. If you are running kernel 2.6+ then you are no longer using the badness heuristic making calculations easy.


Panic at the Disco

I would like to point out that some people prefer the full reboot. If this is you you can choose to panic on oom by the flowing means

sysctl vm.panic_on_oom=1
sysctl kernel.panic=X
echo "vm.panic_on_oom=1" >> /etc/sysctl.conf
echo "kernel.panic=X" >> /etc/sysctl.conf


Kill the Killer

You can also disable the OOM Killer entirely. And while I do not recommend this course of action at all, I will show you how to do it.

sysctl vm.overcommit_memory=2
echo "vm.overcommit_memory=2" >> /etc/sysctl.conf


I do believe it’s working!

If you want some immediate gratification you can see the settings in place with a simple

tail /proc/${pid}/oom_*

But you can also allocate a tun of memory and see what gets killed, assuming you are not testing on production servers 😉
Personally I would just  launch something and abuse it to allocate memory because I am cautious, but lazy. Or I would grab a cheep bit of C that does little more then stuff ram or run stress… But this page gives some decent ideas as well


Hope this helps 🙂


This blog post is dedicated to Collectd and the person on our team who wrote the MySQL collector for it in PHP. As well as the person who complained enough about the oom killings one night that I stopped my work making other things better to improve our situation on this front with some quick puppet work 🙂

And no, it does not happen enough to warrant tracking down the memory problems in PHP. At least not for me. I hate PHP 😉

http://onealeng.com/managing-the-oom-killer-among-us/feed/ 0 139
Bash return code fun http://onealeng.com/bash-return-code-fun/ http://onealeng.com/bash-return-code-fun/#respond Fri, 31 Jul 2015 22:00:46 +0000 http://onealeng.com/?p=132 Why local=$(exit 1) will always exit 0

So a few things with return codes mess up beginning bash programmers

  1. Creating variables always returns 0
  2. Return codes are a single byte int

Let me provide a few examples:

Declared Variables

[root@mytest10][~] cat footest1

[enlighter lang=”shell”]function foo() {
echo ‘foobar’
return 123
function bar() {
local rtc #Create local variable for housing the return code
local value=$(foo)
echo ${value}
echo ${rtc}

[root@mytest10][~] bash footest1

Now you can see from the code the return code value from foo as seen in bar should have been 123. But instead it is 0.
This is purely because we assigned value at the same time we created the local variable. And since the explicit declaration of that local variable was successful $? is over written with that successes and 0 is returned. Unfortunately this is not intuitive or easy to track down if you don’t know what is happening in advance.

Here is the debug output

+ bash -x footest1
+ bar
+ local rtc
++ foo
++ echo foobar
++ return 123
+ local value=foobar
+ rtc=0
+ echo foobar
+ echo 0

Here is a counter example:

[root@mytest10][~] cat footest1
function foo() {
echo 'foobar'
return 123
function bar() {
local rtc #Create local variable for housing the return code
local value #Always declair variables before you assigne values
echo ${value}
echo ${rtc}
[root@mytest10][~] bash footest1

Now you can see you get the desired effect!

Conclusion: If you have special deceleration variables* you must declare said variables before assigning values.

*Special deceleration would be anything like local scope, declared array, read only, etc. The creation of standard global variables does not suffer from this overriding return code issue.

It is only an int

So when using bash we are often gluing things together from other applications. Those applications may return a wide, wide range of numeric error codes. You capture these and all is well… most of the time. The problem is when you exceed the range of 0-255 bash will start looping again. So when you get an error code of 512 (Perhaps for something like Auth controller unreachable) you are shocked to see you script think it is full of success and keeps on trucking.
This is because Bash sees the error of 512 as 0, or success. Here, take a look.

[root@mytest10][~] cat footest2
function foo() {
return $1
for i in 254 255 256 257 511 512 513 514 1023 1024 1025 1026; do
foo ${i}
echo $bar
[root@mytest10][~] bash footest2

Here is the debug output

+ for i in 254 255 256 257 511 512 513 514 1023 1024 1025 1026
+ foo 511
+ return 511
+ bar=255
+ echo 255
+ for i in 254 255 256 257 511 512 513 514 1023 1024 1025 1026
+ foo 512
+ return 512
+ bar=0
+ echo 0
+ for i in 254 255 256 257 511 512 513 514 1023 1024 1025 1026
+ foo 513
+ return 513
+ bar=1
+ echo 1

Common Bash Return Codes

Exit Code Number Meaning Example Comments
0 Catchall for success  true By default if you do nothing else and your script or function runs it will implicitly return 0
1 Catchall for general errors let “var1 = 1/0” Miscellaneous errors, such as “divide by zero” and other impermissible operations
2 Misuse of shell builtins (according to Bash documentation) empty_function() {} Missing keyword or command, or permission problem (and diff return code on a failed binary file comparison).
126 Command invoked cannot execute /dev/null Permission problem or command is not an executable
127 “command not found” illegal_command Possible problem with $PATH or a typo
128 Invalid argument to exit exit 3.14159 exit takes only integer args in the range 0 – 255 (see first footnote)
128+n Fatal error signal “n” kill -9 $PPID of script $? returns 137 (128 + 9)
130 Script terminated by Control-C Ctl-C Control-C is fatal error signal 2, (130 = 128 + 2, see above)
255* Exit status out of range exit -1 exit takes only integer args in the range 0 – 255


*Taken from http://www.tldp.org/LDP/abs/html/exitcodes.html


**This post is dedicated to my friends who did not suffer under the regime of PHP or Bash and thus did not dedicate 3 of the last five years into turning bash into full featured object oriented programming language and thus come to me seeking sanity every time something fails and they start hating life and all the legacy code written in bash –
Long live the new paradigm of Python and Go – May you reign of glory never end  😉

http://onealeng.com/bash-return-code-fun/feed/ 0 132
Worlds Largest CRIT Database http://onealeng.com/worlds-largest-crit-database/ http://onealeng.com/worlds-largest-crit-database/#respond Fri, 01 May 2015 02:07:12 +0000 http://onealeng.com/?p=78 Mongo Achievement

Our CRIT database manages over a million signatures and as of December 2014 is the worlds larges and keeps growing. We have plenty of room for it to grow with less then 3% of the clusters theoretical capacity purchased we can keep adding nodes for the foreseeable future. I implemented Mongo for a few projects before this but this was the first large auto shareded system where I had to pre-allocate shards to keep the system from collapsing when the application started. I built these servers from scratch and configured the systems and given the hardware I had to work with I am quite pleased with how it turned out.

http://onealeng.com/worlds-largest-crit-database/feed/ 0 78