michaelmalak's blog

Multi-Tenancy Myths

Although the term has been around Hadoop circles for a few years, 2014 saw its rise to prominence. And while it's fine for an organization mature in Hadoop to adopt multi-tenancy, the technology is still too immature to fire up in an organization new to Hadoop.

Entropy vs. Value

In thermodynamics, entropy measures disorder, or the amount of unuseful dispersed heat.

In information theory, entropy measures how difficult it is to ZIP a file, i.e. how poorly a lossless compressor performs. This is because Claude Shannon based his definition of entropy upon the probabilities of seeing various outcomes such as various strings of bits.

Active vs Passive Data Variety

The "3 V's" are typically portrayed as a problem to be solved by Big Data and Data Science. The trite knee-jerk response is that they're not problems but opportunities.

But what about taking that trite statement to heart? If we did that, we would seek out additional data, rather than be content with the Big Data that happens to be in our Hadoop or that happens to be streaming in from Spark Streaming!

Data Fusion is Data Destruction

Data Fusion originated as a technique from the 1980's defense industry where the goal was to, when doing digital intelligence, to come up with a single, unified truth -- e.g. combining information from multiple sources the belief is that enemy tank serial number A123 is at location X.