The Power of Random

A few years ago I spent a great deal of time thinking about parallel programming. One of the counter intuitive things that emerged was that organisation in a massively parallel system can be detrimental to performance. A recent article on LinkedIn by Don P (To Be More Successful, Study Failures) which mentioned under sampling reminded me of one of my key counter intuitive conclusions.
The key insight when thinking about organising data layout on a massively parallel system is that you are organising it to optimise performance for some circumstance, however, the converse is also true, other circumstances may deliver non-optimal performance. Sometimes the worst case performance is truly terrible and the best case not as frequent or significantly better than average case. When this occurs a random arrangement which yields average case for nearly all queries might be better overall - this is the power of random.

This leads to other insights:
  • Persistent hotspots in a system are a common symptom of an optimised layout that is being used in a way that the optimisation is working against us. Random layouts may develop hotspots but they should not occur in a consistent location.
  • The key question is how often does random layout work. The answer seems to be surprisingly often - particularly when large numbers of processor resources are used and their usage is poorly understood.
  • If you have a broadcast capability for queries random layouts become very attractive

So the counter intuition of using less knowledge about a problem can yield better results under many circumstances.