Discrete energy minimization has recently emerged as an indispensable tool for computer vision problems. It enables inference of the maximum a posteriori solutions of Markov and conditional random fields, which can be used to model labelling problems in vision. When formulating such problems in an energy minimization framework, there are three main issues that need to be addressed: (i) How to perform efficient inference to compute the optimal solution; (ii) How to incorporate prior knowledge into the model; and (iii) How to learn the parameter values. This thesis focusses on these aspects and presents novel solutions to address them. As computer vision moves towards the era of large videos and gigapixel images, computational efficiency is becoming increasingly important. We present two novel methods to improve the efficiency of energy minimization algorithms. The first method works by "recycling" results from previous problem instances. The second simplifies the energy minimization problem by "reducing" the number of variables in the energy function. We demonstrate a substantial improvement in the running time of various labelling problems such as, interactive image and video segmentation, object recognition, stereo matching. In the second part of the thesis we explore the use of natural image statistics for the single view reconstruction problem, where the task is to recover a theatre-stage representation (containing planar surfaces and their geometrical relationships to each other) from a single 2D image. To this end, we introduce a class of multi-label higher order functions to model these statistics based on the distribution of geometrical features of planar surfaces. We also show that this new class of functions can be solved exactly with efficient graph cut methods. The third part of the thesis addresses the problem of learning the parameters of the energy function. Although several methods have been proposed to learn the model parameters from training data, they suffer from various drawbacks, such as limited applicability or noisy estimates due to poor approximations. We present an accurate and efficient learning method, and demonstrate that it is widely applicable.