How to use pytest in automatic code generation

This notebook shows you how to write a plugin for Pytest. This allows us to use the pytest functionaltiy, e.g. test-discovery, in our automatic programming scenario. Another advantage is that many developers are already familar with pytest. Therefor, it would be much easier for those people to apply this development technique.
We have seen in the post about Abstract Syntax Trees, that programs can be describe in a tree structure, with each node being an expression. Each permutation of the expressions is different program. Therefor, the number of all possible programs growth exponetially with the length of the program. This makes creating executable code from given tests a hard problem.
This problem can only be solved by using machine learning or statistics techniques. The usage of these approaches will be introduced in a future post. This post is about pytest and how to utilize it in our scenario. Nevertheless, we are going to generate a simplistic function in this notebook. The function gets three arguments as input and returns the sum of these arguments. We generate new functions until one is found that fullfills the four given tests. These tests are defined in the module. The following cell shows these tests. Initially I had planned to implement these tests within this notebook, however I couldn’t find an easy solution to run these tests with pytest.

In [1]:
import os

import templates.TestSpecialAdd as TestSpecialAdd

tests_path = os.path.splitext(TestSpecialAdd.__file__)[0] + '.py'
with open(tests_path, 'r') as tests_file:
def test_a():
    assert special_add(2, 3, 4) == 9  # @UndefinedVariable

def test_b():
    assert special_add(1, 1, 0) == 2  # @UndefinedVariable

def test_c():
    assert special_add(10, 2, 4) == 16  # @UndefinedVariable

def test_d():
    assert special_add(0, -5, 6) == 1  # @UndefinedVariable

The class CodeGenerator creates new possible functions which are confrontet with the tests. The code generator is only able to create a small number of functions from the whole set of all possible python programs. These functions have always three arguments x, y, z and they return the result of an operation. The operation is an addition, subtraction, multiplication or division and is sampled randomly. An operation gets two variables as input. The variable is in 40% of the cases one the arguments and otherwise the result of another operation. In this way functions with varying length are produced in a recursive manner.

In [2]:
import ast
import random

class CodeGenerator():
    def __init__(self):
        self.arguments = [ast.Name(id='x', ctx=ast.Param()),
                          ast.Name(id='y', ctx=ast.Param()),
                          ast.Name(id='z', ctx=ast.Param())]

        self.variables = [ast.Name(id='x', ctx=ast.Load()),
                          ast.Name(id='y', ctx=ast.Load()),
                          ast.Name(id='z', ctx=ast.Load())]

        self.operations = [ast.Add(), ast.Sub(), ast.Mult(), ast.Div()]

    def sample_variable(self):
        return random.choice(self.variables)

    def sample_operation(self):
        return random.choice(self.operations)

    def get_operation_or_variable(self):
        if random.uniform(0.0, 1.0) > 0.6:
            return self.get_operation()
            return self.sample_variable()

    def get_operation(self):
        return ast.BinOp(op=self.sample_operation(), left=self.get_operation_or_variable(), right=self.get_operation_or_variable())

    def sample_function(self):
        arguments = ast.arguments(args=self.arguments, vararg=None, kwarg=None, defaults=[])
        func = ast.FunctionDef(name='special_add', args=arguments, body=[ast.Return(value=self.get_operation())], decorator_list=[])
        ast_module = ast.fix_missing_locations(ast.Module(body=[func]))
        return ast_module

The function has_passed checks whether a generated function passes a certain test or not. It runs the function and return true if no exception is raised, which would be the case when the test failed, otherwise it returns false.

In [3]:
import copy

def has_passed(special_add, tests):
    for test in tests:
        current_func = copy.deepcopy(test.function)
        current_func.func_globals['special_add'] = special_add
        except Exception as e:
            return False
    return True

Next we exploit pytest to generate source code for the unittests. Pytest allows us to define plugins, which hook to certain events, e.g. before a test is running or before a file searched for tests. We utilize the itemcollected hook to store found tests to further use and run the code generation in the runtestloop hook. New code is generated in a loop as long as no function is found that passes all tests. The code is compiled and run with the exec keyword and build-in compile function. At the end the resulting code is printed.

In [4]:
import pytest
import codegen

class CodeGeneratorPlugin(object):
    def __init__(self):
        self.tests = []

    def pytest_itemcollected(self, item):

    def pytest_runtestloop(self, session):
        code_generator = CodeGenerator()

        found = False
        num_samples = 0
        while not found:
            ast_module = code_generator.sample_function()
            exec compile(ast_module, '<string>', 'exec') in globals(), locals()
            found = has_passed(special_add, self.tests)
            num_samples += 1

        print('number of samples: {}\n'.format(num_samples) )

pytest.main([tests_path], plugins=[CodeGeneratorPlugin()])
============================= test session starts =============================
platform win32 -- Python 2.7.13, pytest-3.0.5, py-1.4.32, pluggy-0.4.0
rootdir: C:\Users\Artur\Development\workspace\MachineLearning, inifile: 
collected 4 items
number of samples: 229

def special_add(x, y, z):
    return z + x + y

..\templates\ ....

========================== 4 passed in 0.12 seconds ===========================

We see that the code generation works quite well. After several randomly generated functions, a function is found that actually passes all tests and is in fact a proper implementation of what we wanted to achive.
In this case four tests are enough to specify how the implementation should look like. I have tried to generate code with less than four tests. The generator found quite often solutions, where the tests passed but it was nevertheless a wrong implementation.
A difficulty is to deal with trivial variations. For example the return value is always the same, if it is multiplied 1. A 1 is generated easily, by creating an e.g. x/x. The biggest challenges are how to scale to much more versatile generated code and to a huge number of tests. As I already meantioned this could be handled by machine learning or statistics techniques. We will try it out in a future post.