Entropy vs. Value

In thermodynamics, entropy measures disorder, or the amount of unuseful dispersed heat.

In information theory, entropy measures how difficult it is to ZIP a file, i.e. how poorly a lossless compressor performs. This is because Claude Shannon based his definition of entropy upon the probabilities of seeing various outcomes such as various strings of bits.

Shannon Entropy is typically measured in bits. For example, if it is known you always use a lowercase dictionary English word for a password, and your English dictionary has 4096 words, then the entropy of your password is 12 bits, even though storing your password in plaintext may take up, say, 20 characters multiplied by 8 bits per character, for a total of 160 bits.

It may be tempting to think that lowering entropy increases value. For example, say we use a sequence of barometer readings to indicate whether or not to bring an umbrella, and that correctly being umbrella-clad or not is very valuable to us. I.e. we feel very foolish if we bring an umbrella and it's not raining, and very foolish if we don't bring an umbrella and it is raining.

Say we can encode a barometer reading in 128 values (1.28 inches of mercury variation from the mean of 30.00 inches, with 0.01 inch precision), and that we need 48 such hourly readings, in order to make a forecast. That's 7 times 48 bits, which is 336 bits of raw data. Say we can do a lossless compression of that down to 150 bits, based on that measurements are clustered around the mean of 30.00 and that measurements from one hour to the next do not typically vary much.

Our outcome, umbrella vs. no umbrella, can be encoded in a single bit. Indeed, we have reduced entropy from 150 bits to 1 bit, and a bit that is very valuable to us. This corresponds to our notion of entropy from thermodynamics.

But what if our algorithm for predicting rain based on barometer readings is not very good? Then the value of our single bit is reduced.

Thus, it is not possible to simplisticly say that the goal of Data Science is to reduce entropy. The cliche is true, and it is painful to repeat the tired cliche, but the goal of Data Science is to produce value. Producing a single bit recommendation of umbrella vs. no umbrella is certainly a crisp and actionable recommendation, but its value depends upon its accuracy.