The IO Blender

At the VMUG Regional User Conference (#MVMUG Feb 7, 2013), Stephen Foskett (@sfosckett, presented on the IO Blender. The essential idea was that there was a great loss of information that was exploited by the storage arrays because the data was hidden behind the hypervisor. This reminded me about some information theory and compression.
An intuition we use in compression is that compression removes redundancy from data by either exploiting structure (which we capture in the compression model) or repetition. The better the compression the less recognisable repetition is found in the data and the more the data stream resembles a random stream.

The IO blender essentially removes redundancy from a block stream using caches so the better each layer removes redundant accesses the more the operations arriving at the bottom layer (the storage array) looks like a bunch of random accesses and the harder it is for the storage array to provide high performance.

This parallel with information theory provides a useful intuition for understanding the reason for performance loss on storage arrays versus network file systems presented directly to a VM. Hosts directly accessing a network file system (NFS or SMB) presented by a storage array can take advantage of the locality information inherent in the file access and hence the storage array can do a better job of caching and organising access to the data.