This is what a .zip file looks like on the inside

As part of our ongoing quality assurance effort, we subject our hardware and software randomness sources to every test we can get our hands on. This includes top-notch industry standard validators like dieharder, “good effort, have a lollipop” tools like NIST’s Statistical Test Suite, and some of our own in-house applications. One of these is a graphical inspection tool which generates covariance heatmaps to help us separate, at a glance, possible candidates from some classes of immediate discards.

looks promising don't sell the bike shop, Orville

On a less serious note, it got us wondering what covariances look like for non-entropic sources, like the contents of common file types. Most file formats have lots of empty space, or swaths of ASCII which poorly saturate the range of byte values. But encrypted or heavily compressed files might come out alright. Let’s take a look.


zip gzip

Darker images suggest fewer represented byte adjacencies, and gzip is noticeably less saturated. Some of the same -1, -½, and -2 slope artifacts appear in both.


aes-cbc aes-ecb

In this case, a 30MB video file was encrypted twice with the same key but different block cipher modes. It’s not surprising that cipher-block-chaining produces whiter output. This tells us that the source video had several 16-byte sequences that repeated often, and repetition diminishes unpredictability.

Binary Formats

wav file uncompressed initrd

Several byte adjacencies seem to never appear in .wav files, hence the dark horizontal and vertical lines. The grid-like layout of of the initrd image betrays an abundance of 64-bit aligned addresses, which by definition leave the least significant three bits unset.

Quality entropy sources produce noisily homogenous heatmaps without discernable artifacts, which like the AES-CBC example above come out looking like white-washed static. Dark spots, gradients, and straight lines can suggest structure and repitition; they can also bring to light subtle hardware deficiencies, like a biased successive approximation A/D converter. However we root it out, we will not suffer weak entropy to live.

All of the images in this post (except the .wav example) are the unedited output of entrospection, one of our open source validation tools.

Share this Post: