orange3, python

Computing joint entropy (in Python)

BIOLAB

Jun 15, 2012

How I wrote a beautiful, general, and super fast joint entropy method (in Python).

    def entropy(*X):
    return = np.sum(-p * np.log2(p) if p > 0 else 0 for p in
        (np.mean(reduce(np.logical_and, (predictions == c for predictions, c in zip(X, classes))))
            for classes in itertools.product(*[set(x) for x in X])))

I started with the method to compute the entropy of a single variable. Input is a numpy array with discrete values (either integers or strings).

    import numpy as np

    def entropy(X):
        probs = [np.mean(X == c) for c in set(X)]
        return np.sum(-p * np.log2(p) for p in probs)

In my next version I extended it to compute the joint entropy of two variables:

    def entropy(X, Y):
    probs = []
    for c1 in set(X):
        for c2 in set(Y):
            probs.append(np.mean(np.logical_and(X == c1, Y == c2)))

    return np.sum(-p * np.log2(p) for p in probs)

Now wait a minute, it looks like we have a recursion here. I couldn't stop myself of writing en extended general function to compute the joint entropy of n variables.

    def entropy(*X, **kwargs):
        predictions = parse_arg(X[0])
        H = kwargs["H"] if "H" in kwargs else 0
        v = kwargs["v"] if "v" in kwargs else np.array([True] * len(predictions))

        for c in set(predictions):
            if len(X) > 1:
                H = entropy(*X[1:], v=np.logical_and(v, predictions == c), H=H)
            else:
                p = np.mean(np.logical_and(v, predictions == c))
                H += -p * np.log2(p) if p > 0 else 0
        return H

It was the ugliest recursive function I've ever written. I couldn't stop coding, I was hooked. Besides, this method was slow as hell and I need a faster version for my reasearch. I need my data tommorow, not next month. I googled if Python has something that would help me deal with the recursive part. I fould this great method: itertools.product, I's just what we need. It takes lists and returns a cartesian product of their values. It's the "nested for loops" in one function.

    def entropy(*X):
        n_insctances = len(X[0])
        H = 0
        for classes in itertools.product(*[set(x) for x in X]):
            v = np.array([True] * n_insctances)
            for predictions, c in zip(X, classes):
                v = np.logical_and(v, predictions == c)
            p = np.mean(v)
            H += -p * np.log2(p) if p > 0 else 0
        return H

No resursion, but still slow. It's time to rewrite loops to the Python-like style. As a sharp eye has already noticed, the second for loop with the np.logical_and inside is perfect for the reduce method.

    def entropy(*X):
        n_insctances = len(X[0])
        H = 0
        for classes in itertools.product(*[set(x) for x in X]):
            v = reduce(np.logical_and, (predictions, c for predictions, c in zip(X, classes)))
            p = np.mean(v)
            H += -p * np.log2(p) if p > 0 else 0
        return H

Now, we have to remove just one more list comprehension and we have a beautiful, general, and super fast joint etropy method.

    def entropy(*X):
        return = np.sum(-p * np.log2(p) if p > 0 else 0 for p in
            (np.mean(reduce(np.logical_and, (predictions == c for predictions, c in zip(X, classes))))
                for classes in itertools.product(*[set(x) for x in X])))