In this paper we present a framework for pixel-wise object segmentation of road
scenes that combines motion and appearance features. It is designed to handle
street-level imagery such as that on Google Street View and Microsoft Bing
Maps. We formulate the problem in a CRF framework in order to probabilistically
model the label likelihoods and the a priori knowledge. An extended set of
appearance-based features is used, which consists of textons, colour, location
and HOG descriptors. A novel boosting approach is then applied to combine the
motion and appearance-based features. We also incorporate higher order
potentials in our CRF model, which produce segmentations with precise object
boundaries. We evaluate our method both quantitatively and qualitatively on the
challenging Cambridge-driving Labeled Video dataset. Our approach shows an
overall recognition accuracy of 84% compared to the state-of-the-art accuracy
of 69%.