Quis custodiet ipsos custodes? is a Latin phrase literally translated as “Who will guard the guards themselves?” but commonly rendered as “Who watches the watchmen?” made famous paraphased from its original latin in Orwell’s 1984.
With the recent events such as the London Riots and the bombing in Boston, its slowly becoming clear that the technology exists which supplants CCTV as the all seeing benefactor, but the public themselves act as the Eye of Providence. It falls upon the Computer Scientist to provide the technological missing link.
During these events, there are an ever increasing amount of people with cameras or regular camera-phones who take pictures, sometimes of friends, family or surroundings. Each in turn may shed light on the surroundings; effectively a piece in a puzzle.
Virtually all cameras now imprint Exif metadata into the picture, usually the camera model, focal length, aperture but sometimes also the date and time of the shot, and an ever increasing number of smartphones include geo data from the GPS unit.
Piecing this together you can work out when and where pictures were taken. Sites like Flickr already allow uses to search for pictures based on time of shot and / or geographic region.
Technology therefore allows for the first piece of the puzzle, getting all the pictures from a location spanning a certain time period.
Remember watching Enemy of the State? –where they use video footage to construct a 3d model of a bag to see if a drop had been made , and everyone thought that was silly and wouldn’t work in real life.
Well the University of Washington and Microsoft Live labs provided the reel technology behind this next piece of the puzzle.
They developed the ability to analyse a group of pictures and their corresponding metadata to place them within a 3d space to construct a 3d model representation.
The processing power and need now exists to take this concept one stage further. As events such as those in Boston have 1000s of onlookers who have taken 10,000s of pictures over several hours, the sheer amount of source images exists to create these 3d models based on pictures from time slices not just the entire data set. Video footage could also be interweaved and used as effectively it is just a series of images. Variations and inaccuracies of cameras can be either determined via mechanical turk or via analysing the white balance & colour range of the corresponding camera data to determine lighting conditions of the scene and thus interpolated time of day.
What does this all mean? well the technology exists to create a 3d animation of an event based on real images, thus if something is spotted, it can be tracked in temporal and 3d space back to a point where an image was taken that accurately portrays the individual.
…But thats just my 2 cents.