How I wrote a beautiful, general, and super fast joint entropy method (in Python).

def entropy(*X): return = np.sum(-p * np.log2(p) if p > 0 else 0 for p in (np.mean(reduce(np.logical_and, (predictions == c for predictions, c in zip(X, classes)))) for classes in itertools.product(*[set(x) for x in X])))

I started with the method to compute the entropy of a single variable. Input is a numpy array with discrete values (either integers or strings).

import numpy as np def entropy(X): probs = [np.mean(X == c) for c in set(X)] return np.sum(-p * np.log2(p) for p in probs)

In my next version I extended it to compute the joint entropy of two variables:

def entropy(X, Y): probs = [] for c1 in set(X): for c2 in set(Y): probs.append(np.mean(np.logical_and(X == c1, Y == c2))) return np.sum(-p * np.log2(p) for p in probs)

Now wait a minute, it looks like we have a recursion here. I couldn't stop myself of writing en extended general function to compute the joint entropy of n variables.

def entropy(*X, **kwargs): predictions = parse_arg(X[0]) H = kwargs["H"] if "H" in kwargs else 0 v = kwargs["v"] if "v" in kwargs else np.array([True] * len(predictions)) for c in set(predictions): if len(X) > 1: H = entropy(*X[1:], v=np.logical_and(v, predictions == c), H=H) else: p = np.mean(np.logical_and(v, predictions == c)) H += -p * np.log2(p) if p > 0 else 0 return H

It was the ugliest recursive function I've ever written. I couldn't stop coding, I was hooked. Besides, this method was slow as hell and I need a faster version for my reasearch. I need my data tommorow, not next month. I googled if Python has something that would help me deal with the recursive part. I fould this great method: itertools.product, I's just what we need. It takes lists and returns a cartesian product of their values. It's the "nested for loops" in one function.

def entropy(*X): n_insctances = len(X[0]) H = 0 for classes in itertools.product(*[set(x) for x in X]): v = np.array([True] * n_insctances) for predictions, c in zip(X, classes): v = np.logical_and(v, predictions == c) p = np.mean(v) H += -p * np.log2(p) if p > 0 else 0 return H

No resursion, but still slow. It's time to rewrite loops to the Python-like style. As a sharp eye has already noticed, the second for loop with the np.logical_and inside is perfect for the reduce method.

def entropy(*X): n_insctances = len(X[0]) H = 0 for classes in itertools.product(*[set(x) for x in X]): v = reduce(np.logical_and, (predictions, c for predictions, c in zip(X, classes))) p = np.mean(v) H += -p * np.log2(p) if p > 0 else 0 return H

Now, we have to remove just one more list comprehension and we have a beautiful, general, and super fast joint etropy method.

def entropy(*X): return = np.sum(-p * np.log2(p) if p > 0 else 0 for p in (np.mean(reduce(np.logical_and, (predictions == c for predictions, c in zip(X, classes)))) for classes in itertools.product(*[set(x) for x in X])))