Re: POI based Open AR proposal from Alex Hill on 2010-09-06 (public-poiwg@w3.org from September 2010)

From: Alex Hill <ahill@gatech.edu>
Date: Mon, 6 Sep 2010 12:34:43 -0400
To: "Public POI @ W3C" <public-poiwg@w3.org>
Message-Id: <6F2443F7-5F10-4432-8B60-64E64EAD396D@gatech.edu>
On Sep 1, 2010, at 10:23 AM, Thomas Wrobel wrote:

> I'd just like to give my two cents here a bit, I think whatever is
> come up with should be as simple as possible for the creation of
> content.
Agreed. I think it is time to mention our KHARMA framework here [1].
We basically combined KML (to indicate where content is) with HTML.
We added some extensions to orient and scale the HTML content in ways that KML did not provide.
The description of what is available for presentation is the result of geospatial query influenced by the bandwidth/capacity of the device (i.e. request all data in X range about the user at a frequency of Y).
And, the decision about what to show is a reflection of the rendering limitations (i.e. viewing range, LOD) of the device.
KML has some conventions for network queries and we followed them because they allow existing Google Earth content to be loaded/rendered correctly in our client.
These "author defined" update rates are suggestions that the client may not be able to meet and could override.
> Much like how a hyperlink associates data more or less like;
> 
> <a href = "links to this"> (automatically triggered when the user clicks here)
> 
> We should aim for an association more or less like;
> 
> <this data appears> (under these trigger/pattern conditions)
We are using straight HTML for the presentation content and for the user interaction.
We currently only have onClick, onHover as events, but want to add events for proximity and entered/exited KML regions.
We have not added 3D, but one of the problems is that X3D and HTML don't operate in the same context.
Although there is work going on in this area [2], there will no doubt eventually be a 3D standard for the web.
Our current plans are to do a sort of 3D-lite; with KML regions generating events that can (yes in JS) fire animations in COLLADA models defined by the KML standard.
Manipulating the visibility and position of these models is just manipulation (again in JS) of the KML parts of the dom.
I am explaining this work not because I see it as the solution, but because it is a concrete example of trying to approach standards compliant AR.

In short, I think that an AR standard makes a linkage between location and content (i.e. HTML, X3D and the rich interactivity embedded within).
One approach is to see presentation standards (HTML, WebGL, X3D) as beyond our scope.
We could have added something like location or orientation to the HTML standard, but we wanted AR authors to easily appropriate non-customized content.

I think that "triggers" can range from simple (i.e. looking at, walking by) to complex (i.e. browsing related topics, my friends recommend).
I feel that this glue between dynamic content and the user is the other critical ingredient (obviously most agree here).
I can't help but wonder if a significant part of the the actual trigger descriptions won't be authored into the content.
For example, I can imagine that a building or an object triggers "you are looking at me".
And, if the resulting action affects other objects, then this "routing" will be the purview of the "AR authoring".
Consider a few cases:
1) I walk around a corner but don't turn to look - I become visible to some object and it's sequence starts (i.e. Zombie materializes and approaches)
2) I reach to touch an object in front of me - the author would certainly want control over the relationship between hand proximity and "touched" activation
3) I pause to stare at some Facebook content related to a location and it responds with more related/friend's content - here the Facebook content will want some control over what "being gazed upon" means
Just like X3D triggers have routing and potentially code in between user actions and the animations/behaviors that ensue...
We may want to "wrap" existing content (i.e. a webpage expecting onClick, a model ready to fire an animation) with a trigger that routes "looking at"/"touched"/"visible to" to these inherent media capabilities defined using other standards (HTML5 canvas, audio, video, X3D animations).

> 
> (not in formating just in principle).
> 
> I really don't think whatever we come up with should require manual
> coding by all creators. It should be a simple association between
> trigger requirements, and data to be triggered....and it should be
> upto a standards support client to use them correctly.
This is where I am not sure if we are imagining the same things.
Some aspects of AR/3D are going to be well codified by regions, LOD's, etc.
And, the "content" will get rendered into the world in a standards compliant manner.
And, if the data exists on the internet, then some form of AR Google will find it nearby, some form of personalization will filter it, and by virtue of the relationship between client location and the content, it will become visible.
However, almost any other more complicated triggering is likely to need the kind of "routing" and/or the ECMA scripting that X3D provides.
I think we need some concrete examples here.
> 
> So I don't really think the model of how Web2.0 application work is a
> very good fit for this system. Web 2.0 app's really have to do a fair
> bit of work (coding wise) in order to get functionality flowing
> between a server a client. This sets the bar rather high for those
> wanting to make simple collections of geolocated information. In this
> case we really want to move some of that into whats required from the
> client itself.
We think that some of the most interesting things that AR can do will involve accessing web services like Flickr, Twitter, Facebook, etc. to generate content on the fly.
This is why, similar to Google Maps/Google Earth, we are making a client side API to create/manipulate/delete the AR content using JavaScript.
Again, I'm not endorsing this approach, but see it as a reality of doing AR today.
Perhaps in the future, we can hit these sites with an AR browser and get AR Facebook content in the AR standard we develop here.
> 
> Also, by keeping the client in charge of the pulling of data, it means
> one source of data can be more easily suitable for many devices. The
> client knows how often it can/should "refresh" and see if any of the
> conditions have been met. It can recheck based on movement or time,
> without sending that data to a server unnecessarily, or without the
> content creator to code request intervals themselves.
> 
> On 1 September 2010 16:01, Alex Hill <ahill@gatech.edu> wrote:
>> If I understand correctly, you are suggesting that "triggers" should be
>> formulated in a flexible pattern language that can deal with and respond to
>> any form of sensor data.
>> This would be in contrast to the strictly defined "onClick" type of events
>> in JavaScript or the existing VRML trigger types such as CylinderSensor [1].
>> I think this idea has merit and agree that some significant flexibility in
>> the way authors deal with the multiple visual and mechanical sensors at
>> their disposal is vital to creating compelling AR content.
>> However, the flexibility that this approach would give, seems at first
>> glance, to take away some of the ease of authoring that "baked" in
>> inputs/triggers give.
>> And, I it is not obvious to me now how one incorporates more general
>> computation into this model.
>> Take the aforementioned CylinderSensor; how would you describe the behavior
>> of this trigger using patterns of interest?
>> While there may be standards that will eventually support this (i.e. the W3C
>> Sensor Incubator Group [2]), I wonder if this type of "sensor filtering
>> language" is beyond our scope.
>> The second main point you make is that we should reconsider the
>> request-response nature of the internet in the AR context.
>> Again, this is an important idea and one worth seriously considering.
>> But in a similar fashion to my concerns about pattern of interest filtering,
>> I worry that this circumvents an existing model that has merit.
>> The data-trigger-response-representation model you suggest already happens
>> routinely in rich Web 2.0 applications.
>> The difference is that it happens under the programatic control of the
>> author where they have access to a multitude of libraries and resources
>> (i.e. jQuery, database access, hardware, user settings, etc.)
>> (this point is related to another thread about (data)<>-(criteria) [3] where
>> I agree with Jens that we are talking about multiple data-trigger-reponses)
>> I may need some tutoring on what developing standards means, but in my view,
>> things like ECMA scripting are an unavoidable part of complex interactivity.
>> Perhaps you can give an example where the cutoff between the current
>> request-response model ends and automatic data-POI-response-presentation
>> begins?
>> On Aug 20, 2010, at 10:19 AM, Rob Manson wrote:
>> 
>> Hi,
>> 
>> great to see we're onto the "Next Steps" and we seem to be discussing
>> pretty detailed structures now 8)  So I'd like to submit the following
>> proposal for discussion.  This is based on our discussion so far and the
>> ideas I think we have achieved some resolution on.
>> 
>> I'll look forward to your replies...
>> 
>> roBman
>> 
>> PS: I'd be particularly interested to hear ideas from the linked data
>> and SSN groups on what parts of their existing work can improve this
>> model and how they think it could be integrated.
>> 
>> 
>> 
>> What is this POI proposal?
>> A simple extension to the "request-response" nature of the HTTP protocol
>> to define a distributed Open AR (Augmented Reality) system.
>> This sensory based pattern recognition system is simply a structured
>> "request-response-link-request-response" chain.  In this chain the link
>> is a specific form of transformation.
>> 
>> It aims to extend the existing web to be sensor aware and automatically
>> event driven while encouraging the presentation layer to adapt to
>> support dynamic spatialised information more fluidly.
>> 
>> One of the great achievements of the web has been the separation of data
>> and presentation. The proposed Open AR structure extends this to
>> separate out: sensory data, triggers, response data and presentation.
>> 
>> NOTE1: There are a wide range of serialisation options that could be
>> supported and many namespaces and data structures/ontologies that can be
>> incorporated (e.g. Dublin Core, geo, etc.).  The focus of this proposal
>> is purely at a systemic "value chain" level.  It is assumed that the
>> definition of serialisation formats, namespace support and common data
>> structures would make up the bulk of the work that the working group
>> will collaboratively define.  The goal here is to define a structure
>> that enables this to be easily extended in defined and modular ways.
>> 
>> NOTE2: The example JSON-like data structures outlined below are purely
>> to convey the proposed concepts.  They are not intended to be realised
>> in this format at all and there is no attachment at this stage to JSON,
>> XML or any other representational format.  They are purely conceptual.
>> 
>> This proposal is based upon the following structural evolution of
>> devices and client application models:
>> 
>> PC Web Browser (Firefox, MSIE, etc.):
>>   mouse      -> sensors -> dom      -> data
>>   keyboard   ->                     -> presentation
>> 
>> Mobile Web Browser (iPhone, Android, etc.):
>>   gestures   -> sensors -> dom      -> data
>>   keyboard   ->                     -> presentation
>> 
>> Mobile AR Browser (Layar, Wikitude, Junaio, etc.):
>>   gestures   -> sensors -> custom app            -> presentation [*custom]
>>   keyboard   ->                                  -> data [*custom]
>>   camera     ->
>>   gps        ->
>>   compass    ->
>> 
>> Open AR Browser (client):
>>   mouse      -> sensors -> triggers ->  dom      -> presentation
>>   keyboard   ->                                  -> data
>>   camera     ->
>>   gps        ->
>>   compass    ->
>>   accelerom. ->
>>   rfid       ->
>>   ir         ->
>>   proximity  ->
>>   motion     ->
>> 
>> NOTE3: The key next step from Mobile AR to Open AR is the addition of
>> many more sensor types, migrating presentation and data to open web
>> based standards and the addition of triggers.  Triggers are explicit
>> links from a pattern to 0 or more actions (web requests).
>> 
>> Here is a brief description of each of the elements in this high level
>> value chain.
>> 
>> clients:
>> - handle events and request sensory data then filter and link it to 0 or
>> more actions (web requests)
>> - clients can cache trigger definitions locally or request them from one
>> or more services that match one or more specific patterns.
>> - clients can also cache response data and presentation states.
>> - since sensory data, triggers and response data are simply HTTP
>> responses all of the normal cache control structures are already in
>> place.
>> 
>> infrastructure (The Internet Of Things):
>> - networked and directly connected sensors and devices that support the
>> Patterns Of Interest specification/standard
>> 
>> 
>> patterns of interest:
>> The standard HTTP request response processing chain can be seen as:
>> 
>> event -> request -> response -> presentation
>> 
>> The POI (Pattern Of Interest) value chain is slightly extended.
>> The most common Mobile AR implementation of this is currently:
>> 
>> AR App event -> GPS reading -> get nearby info request -> Points Of
>> Interest response -> AR presentation
>> 
>> A more detailed view clearly splits events into two to create possible
>> feedback loops. It also splits the request into sensor data and trigger:
>> 
>>               +- event -+               +-------+-- event --+
>> sensor data --+-> trigger -> response data -> presentation -+
>> 
>> - this allows events that happen at both the sensory and presentation
>> ends of the chain.
>> - triggers are bundles that link a pattern to one or more actions (web
>> requests).
>> - events at the sensor end request sensory data and filter it to find
>> patterns that trigger or link to actions.
>> - these triggers or links can also fire other events that load more
>> sensory data that is filtered and linked to actions, etc.
>> - actions return data that can then be presented.  As per standard web
>> interactions supported formats can be defined by the requesting client.
>> - events on the presentation side can interact with the data or the
>> presentation itself.
>> 
>> sensory data:
>> Simple (xml/json/key-value) representations of sensors and their values
>> at a point in time.  These are available via URLs/HTTP requests
>> e.g. sensors can update these files on change, at regular intervals or
>> serve them dynamically.
>> {
>> HEAD : {
>>   date_recorded : "Sat Aug 21 00:10:39 EST 2010",
>>   source_url : "url"
>> },
>> BODY : {
>>   gps : {  // based on standard geo data structures
>>     latitude : "n.n",
>>     longitude : "n,n",
>>     altitude : "n",
>>   },
>>   compass : {
>>     orientation : "n"
>>   },
>>   camera : {
>>     image : "url",
>>     stream : "url"
>>   }
>> }
>> }
>> NOTE: All sensor values could be presented inline or externally via a
>> source URL which could then also reference streams.
>> 
>> trigger:
>> structured (xml/json/key-value) filter that defines a pattern and links
>> it to 0 or more actions (web requests)
>> [
>> HEAD : {
>>   date_created : "Sat Aug 21 00:10:39 EST 2010",
>>   author : "roBman@mob-labs.com",
>>   last_modified : "Sat Aug 21 00:10:39 EST 2010"
>> },
>> BODY : {
>>   pattern : {
>>     gps : [
>>       {
>>         name : "iphone",
>>         id : "01",
>>         latitude : {
>>           value : "n.n"
>>         },
>>         longitude : {
>>           value : "n.n"
>>         },
>>         altitude : {
>>           value : "n.n"
>>         }
>>       },
>>       // NOTE: GPS value patterns could have their own ranges defined
>>       //       but usually the client will just set it's own at the filter
>> level
>>       // range : "n",
>>       // range_format : "metres"
>>       // This is an area where different client applications can add their
>> unique value
>>     ],
>>     cameras : [
>>       {
>>         name : "home",
>>         id : "03",
>>         type : "opencv_haar_cascade"
>>         pattern : {
>>           ...
>>         }
>>       }
>>     ]
>>   },
>>   actions : [
>>     {
>>       url : "url",
>>       data : {..},  // Support for referring to sensor values
>> $sensors.gps.latitude & $sensors.compass.orientation
>>       method : "POST"
>>     },
>>   ]
>> }
>> ]
>> 
>> data
>> HTTP Responses
>> 
>> presentation
>> client rendered HTML/CSS/JS/RICH MEDIA (e.g. Images, 3D, Video, Audio,
>> etc.)
>> 
>> 
>> 
>> At least the following roles are supported as extensions of today's
>> common "web value chain" roles.
>> 
>>       publishers:
>>       - define triggers that map specific sensor data patterns to
>>       useful actions (web requests)
>>       - manage the acl to drive traffic in exchange for value creation
>>       - customise the client apps and content to create compelling
>>       experiences
>> 
>>       developers:
>>       - create sensor bundles people can buy and install in their own
>>       environment
>>       - create server applications that allow publishers to register
>>       and manage triggers
>>       - enable the publishers to make their triggers available to an
>>       open or defined set of clients
>>       - create the web applications that receive the final actions
>>       (web requests)
>>       - create the clients applications that handle events and map
>>       sensor data to requests through triggers (Open AR browsers)
>> 
>> [1] http://www.web3d.org/x3d/wiki/index.php/CylinderSensor
>> [2] http://www.w3.org/2005/Incubator/ssn/charter
>> [3]
[1] http://research.cc.gatech.edu/kharma
Received on Monday, 6 September 2010 16:34:59 UTC