adventures in optimization

The charming adventures of an analyst and his solver.

« Thursday, April 14, 2011 »

Reformed JAPHs in Python - Huffman Coding

Python obfuscation and Huffman Coding.

At this point, tricking python into printing strings via ever more pernicious mechanisms got a little boring. So I switched to obfuscating fundamental computer science algorithms. Here's a JAPH that takes in a Huffman coded version of 'just another python hacker', decodes, and prints it.

# Build coding tree
def build_tree(scheme):
    if scheme.startswith('*'):
        left, scheme = build_tree(scheme[1:])
        right, scheme = build_tree(scheme)
        return (left, right), scheme
    else:
        return scheme[0], scheme[1:]

def decode(tree, encoded):
    ret = ''
    node = tree
    for direction in encoded:
        if direction == '0':
            node = node[0]
        else:
            node = node[1]
        if isinstance(node, str):
            ret += node
            node = tree
    return ret

tree = build_tree('*****ju*sp*er***yct* h**ka*no')[0]
print(
    decode(tree, bin(10627344201836243859174935587).lstrip('0b').zfill(103))
)

The decoding tree is built like a true LISP-style tree as a sequence of pairs. '*' represents a branch in the tree while other characters are leaf nodes. This looks like ((((('j', 'u'), ('s', 'p')), ('e', 'r')), ((('y', 'c'), 't'), (' ', 'h'))), (('k', 'a'), ('n', 'o'))) after it's constructed.

The actual Huffman coded version of our favorite string looks like 0000000001000100101011010111011101010111001000110110000110100001010111111110011001111010100110000100011, which in base-2 encoding represents around a 50% savings in space. Well worth all the effort, right?

There's a catch here, which is that this is hard to obfuscate unless we turn it into a single expression. This means that we have to convert build_tree and decode into lambda functions. Unfortunately, they are recursive and lambda functions don't do that easily. Fortunately, we can use Y combinators and then the rest is simple. These are worth some study since they will pop up again later.

Y = lambda g: (
    lambda f: g(lambda arg: f(f)(arg))) (lambda f: g(lambda arg: f(f)(arg))
)

build_tree = Y(
    lambda f: lambda scheme: (
        (f(scheme[1:])[0], f(f(scheme[1:])[1])[0]),
        f(f(scheme[1:])[1])[1 ]
    ) if scheme.startswith('*') else (scheme[0], scheme[1:])
)

decode = Y(lambda f: lambda x: x[3]+x[1] if not x[2] else (
    f([x[0], x[0], x[2], x[3]+x[1]]) if isinstance(x[1], str) else (
        f([x[0], x[1][0], x[2][1:], x[3]]) if x[2][0] == '0' else (
            f([x[0], x[1][1], x[2][1:], x[3]])
        )
    )
))

tree = build_tree('*****ju*sp*er***yct* h**ka*no')[0]
print(
    decode([
        tree,
        tree,
        bin(10627344201836243859174935587).lstrip('0b').zfill(103), ''
    ])
)

The final version is really just a condensed (and expanded, weirdly) version of the above (again, make sure to use Python 3.2):

print((lambda t,e,s:(lambda g:(lambda f:g(lambda arg:f(f)(arg)))(lambda f:
g(lambda arg: f(f)(arg))))(lambda f:lambda x: x[3]+x[1]if not x[2]else f([
x[0],x[0],x[2],x[3]+x[1]])if isinstance(x[1],str)else f([x[0],x[1][0],x[2]
[1:],x[3]])if x[2][0]=='0'else f([x[0],x[1][1],x[2][1:],x[3]]))([t,t,e,s])
)((lambda g:(lambda f:g(lambda arg:f(f)(arg)))(lambda f:g(lambda arg:f(f)(
arg))))(lambda f:lambda p:((f(p[1:])[0],f(f(p[1:])[1])[0]),f(f(p[1:])[1])[
1])if p.startswith('*')else(p[0],p[1:]))('*****ju*sp*er***yct* h**ka*no')[
0],bin(10627344201836243859179756385-4820798).lstrip('0b').zfill(103),''))