Geolocation currently means "where am I". Can this evolve to add "what I see"?

For example I'm standing at a location and take a picture of a Bird on a Tree using a device. My location is (x1,y1,0) and the location of Bird is (x2,y2,<relative angle,height and distance from me>). Assume the device is able to compute "<relative angle/height and distance from me>", and geotag that picture with such location.

Above case can be generalized for a Video geotagging too.

From: BALA S KAMMELA <bkammela@yahoo.com>
To: "public-webapps@w3.org" <public-webapps@w3.org>; "whatwg@whatwg.org" <whatwg@whatwg.org> 
Sent: Wednesday, April 25, 2012 10:05 AM
Subject: Standard for geotagging video with video-frame metadata

    I wanted to share a blog post requesting standard for tagging video with "device" metadata like location/device orientiation etc.  
Also to add to it, there is a need for user/web-app to access and reference (in terms of space) remote video frames.

http://blog.safe.com/2012/04/gps-meets-video-do-we-need-a-standard-for-geotagging-videos/