Automatic programming

Automatic programming is a type of computer programming with the goal to generate a computer program. This allows the programmer to write the code in a more abstract way. It can be something like switching to a higher level programming language, which compiles to a lower level language or programming declaratively. Declarative programming  is any style of programming that is not imperative. That means the algorithms are not implemented in explicit steps, but it is described what the program must accomplish. For example logical statements can be executed by searching for proofs of the statement.

There are tools which automatically generate code from UML diagrams, from drag and drop GUI generators and many more. These tools are not very flexible and it is hard to assure, wether the requirements are fulfilled. So, what is a good method to create code from requirements and is also very flexible? Many software developer follow such a process already. It is called TDD (Test-driven development). The idea is to write the requirements of the program as tests before the actual code, which fulfill theses requirements. TDD consists of five steps:

  • Add a small test
  • Run the tests and check if the newest fails
  • Write most simple code that fulfills the tests
  • Run tests to verify that everything works
  • Refactor code

TDD has the advantage that the developer has to think about the requirements and design before implementing the software. In addition it prevents the code to be brittle. That means changes in one part of the software wont break other parts of the software so easy. The disadvantage is, that tests have to be written first. Although it is said that TDD saves time in the long run, it feels as if it slows you down by writing everything two times. Furthermore the tests have to be refactored, when the program is refactored. That doesnt feels right.

Wouldnt it be much nicer if you only have to write tests and the code is generated automatically? It would free you from the disadvantages of TDD and would also make the refactoring step obsolete. Readability doesn’t count anymore, since no human is reading and writing code in this scenario.

In comparison to generating code completely out of nowhere  generating code from unittests should be relatively easy. Each new test leads to only small changes in the code. The model that generates code from unittests doesnt has to be always right at the first try. We can validate the result by running the tests. Many different possible programs can be generated and checked for correctness. For that the tests have to be very fast, that is already a requirement in TDD, so it is already fulfilled. Sampling from a probability distribution (see my other post Introduction to Probabilistic Programming) looks to be a nice way to generate code. Such a model could be trained with all the open source github projects, which also utilize unittests.

In the following posts I want to investigate further into this idea. The main topics will be answering the questions: How to use abstract syntax trees? How to exploit existing test frameworks? How to crawl github? And how to model the code generator?

One Comment