This blog post follows my introductory post to Automatic programming. The syntax of a programming language is defined as the set of rules that defines correctly structured code. There are many possible syntax errors. Some examles are:
Opened parantheses are not closed
Indendations are not correct
Variable names are written wrong
It is hard to generate syntactial correct source code. We can make our life easier by not generating source code directly, but generating a different representation of code, that doesn’t have these difficulties. I’m talking about Abstract syntax trees (AST). An AST is a tree representation of the abstract syntactic structure of source code. Source code exists only because it is readable and writeable by humans. AST is a representation that fits better two how a computer works. It exists as an intermediate representation of the program that the compiler requires. In comparison to source code ASTs doesn’t contain inessential punctuation and delimiters, like braces, semicolons and parentheses. Abstract syntax trees are often used in program analysis and program transformation.
There are python libraries to parse and unparse source code to and from ASTs. This allows us to take existing source code, parse them to their AST representation and use them as examples for our machine learning model. It also makes it easy to create human readable source code from automatically generated ASTs.
The following jupyter notebook shows us how the AST representation of given python source code looks like, and how to write our own AST.
The Python Standard Library gives us the ast module to create and use ASTs. With the parse function Python source code is parsed into it’s tree representation. The dump function prints the AST. The output shows how to create the AST directly with the ast module.
The first example is a simple addition. There is an expression with the two variable x and y and the binary operation add.
The PyPI package showast has a nice visualization function for ASTs. The visualization shows the tree structure.
In this example the addition is packed into a function. Here we see how actual python code is represented as an AST. I don’t show the dump of the AST, since it is already very convoluted, but with such a dump you could see how to create the AST directly from the ast module.
The visualization shows us, that there are several new nodes which define the function, arguments, return statement and variable definitions. We get the source code from a function with the inspect module. It would be also possible to visualize a function directly with show_source from the showast module.
Python itself doesn’t provide a way to turn a compiled code object into an AST. There are third party tools that can do this. Astor is such a package.
def add_func(x, y):
return (x + y)
The last examples shows how to create an AST directly with the ast module. Here you see the implementation of the leibniz formula. This is a formula to calculate the value of pi. The number of iterations corresponds to the accuracy of the result. Here we see why we are writing programs in a programming language and not as an AST. The tree representation is nearly unreadable. You can imagine that modules with hundreds or thousands lines of code result in humongous trees.