Introduction to Probabilistic Programming

I love the idea of Probabilistic Programming. It is basically a language to describe probabilistic models and then perform automatic inference on those models. That means that you as an expert write a simulation of your problem in a Bayesian sense. For example, a text simulation could be to first sample the number of words, next the main topics and last the words dependent on the main topics. This model is known as Latent Dirichlet allocation. With Probabilistic Programming the problem is inverted automatically. So instead of sampling the main topics and afterwards the words, the words are observed and the topics are estimated. Many problems can be expressed as an inverse problem. For example you computer vision can be seen as the inverse problem of computer graphics.

In a Bayesian setting such an inversion is called inference. It can be expressed very simple by the Bayes’ rule.
P(A|B)=\frac{P(B|A)P(A)}{P(B)}
However, performing the Bayes’ rule exactly is most of the time very hard or even impossible. This problem can be tackeled with approximation methods like Markov chain Monte Carlo (MCMC) or Expectation-maximization (EM), but they have certain flaws. EM can only be used for certain problems and the math has to be done all over again if something in the model has been changed. MCMC is general and flexible but very computation heavy and it is not clear when the calcualtions are actually finished.

In the last years statisticans have done a great work in approximating the Bayes’ rule with a method called Variational Inference. Variational Inference is not new, but now it is possible to combine it with Deep learning, to obtain the best of the statistican and machine learning world. From the statisticans we get uncertainties, a natural way of handling missing data and a white box solution (it is possible to explain the results) and from the machine learning community we get fast algorithms, proper software libraries and to the software matching hardware (like GPUs). I think both is equally important to solve real world problems.

Im am very interested in models like Gaussian Mixture Models (GMM) and Hidden Markov Model (HMM). These are standard models, which can be solved quite efficient with EM and the Kalman Filter. However, changing these models slightly, makes them intractable. Variational Inference could be the solution.

There are several probabilistic programming languages, existing today. As I am programming in Python the most prominent examples are PyStan and PyMC3. So far I have tried out PyMC3, as it is entirely written in Python. Therefor, the entrance level in this kind of programming should be lower. My experience so far is that it is relatively easy to use MCMC as inference method, but it is very slow for bigger datasets. Variational Inference is to some kind allready implemented. However, using it is still quite hard. The reason for this is either the implementation is still not mature enough or my knowledge about this topic is improveable. I think at the moment both are true. As I am not really able to enhance the implementation, I will try to enhance my understanding by dedicating the next post to one of the main contributions in Variational Inference: Auto-Encoding Variational Bayes. I am sure I will come to PyMC3 back again at a later time.

One Comment

Add a Comment

Your email address will not be published. Required fields are marked *