The Power of Random
08/05/13 10:02 Filed in: Programming
A few years ago I spent a great deal of time thinking about parallel programming. One of the counter intuitive things that emerged was that organisation in a massively parallel system can be detrimental to performance. A recent article on LinkedIn by Don P (To Be More Successful, Study Failures) which mentioned under sampling reminded me of one of my key counter intuitive conclusions.
The key insight when thinking about organising data layout on a massively parallel system is that you are organising it to optimise performance for some circumstance, however, the converse is also true, other circumstances may deliver non-optimal performance. Sometimes the worst case performance is truly terrible and the best case not as frequent or significantly better than average case. When this occurs a random arrangement which yields average case for nearly all queries might be better overall - this is the power of random.
This leads to other insights:
So the counter intuition of using less knowledge about a problem can yield better results under many circumstances.
This leads to other insights:
- Persistent hotspots in a system are a common symptom of an optimised layout that is being used in a way that the optimisation is working against us. Random layouts may develop hotspots but they should not occur in a consistent location.
- The key question is how often does random layout work. The answer seems to be surprisingly often - particularly when large numbers of processor resources are used and their usage is poorly understood.
- If you have a broadcast capability for queries random layouts become very attractive
So the counter intuition of using less knowledge about a problem can yield better results under many circumstances.