In December I completed my first simple object tracker as a project for the “Intelligent Multimedia Systems” course. It wasn’t a very advanced one but it is based on very simple ideas and technologies and very effective given the right environment. The tracker was a mean-shift one, which, as opposed to a brute-force tracker does not search for the object in all pixels around last known position. The mean-shift applies a gradient to the last known position thus determining what is the most probable direction the object just moved and shifting the gradient until the object is re-centred. This results in way less calculations being made and better performance.
The whole process goes like this in short:
We have a set of frames (either as a movie or as a sequence of images) and we choose an object to be tracked in the first frame. This can be done either by giving the coordinates of a rectangle or any other location. I chose to pick the object completely by having a mask for the first frame. Below is an example of first frame and mask:
This is the input data. Then the processing starts. Based on the given mask, the object will be retrieved from the first frame, its histogram will be computed and used throughout the whole application as a reference. The histogram reflects the set of colours in the given object. Now the complicated math comes in: to find the same object in the next frame, a kernel will be multiplied with the pixels of the second frame corresponding to the position of the object in the first frame.
The kernel I used is the Epanechnikov Kernel and it is a weighted function that looks like this:
What it means is that points in the middle of the object are more important than points on the sides or corners. When multiplying this function by the values of the pixels and comparing to the colour histogram, we will see a change in the values towards the direction our object is moving and then we will need to re-centre our position to maximize the result.
It is a very effective way provided the colour is not confusing, for example the object might have a very similar colour to the background. Below are some examples of final movies with objects tracked. Two of them are from the set everybody received, a football match, and the third one I had to find myself. For that I used a set of images from the PETS 2006 International Workshop. The purpose of that workshop was to track people that left their luggage unattended for more than 30 seconds, so most of them are from stations or airports CCTV footage but it suited me just fine.
First Football match tracking clip:
Second Football match tracking clip:
CCTV tracking clip:
Of course problems can appear when using this technique if colours of other objects are similar. For example there is a spike in the second clip when the tracked player passes behind another one and both contain some black color in their histogram. But, all in all it is a very powerful way of tracking and doesn’t use a lot of processing power.