Re: POI based Open AR proposal from Alex Hill on 2010-09-10 (public-poiwg@w3.org from September 2010)

From: Alex Hill <ahill@gatech.edu>
Date: Fri, 10 Sep 2010 08:31:18 -0400
To: roBman@mob-labs.com
Cc: "Public POI @ W3C" <public-poiwg@w3.org>
Message-Id: <05D0D5C2-79F5-48FF-A4BD-CEA0D94A80F0@gatech.edu>
Rob,
I think this response addresses most of my concerns.
I like the angle that it can potentially be an extension to CSS.
And I like your comment about how it can be first implemented in JS and then integrated later.
It seems that this is a great way to introduce new standards instead of dreaming them up and hoping they get adopted.
I think this could play a part in a new AR standard.
For me, at least, I need to see the nuts-and-bolts of the proposal in action.
This way, I can get a better feel for where it is unwieldy or cumbersome.
I feel you are in a position to flesh out some examples now.
Can we decide on a few canonical cases for which we can develop working examples?
Obviously the freshness/delta movement case is one that applies to Layar, Wikitude, etc.

On Sep 2, 2010, at 4:48 AM, Rob Manson wrote:

> Hey Alex,
> 
>> If I understand correctly, you are suggesting that "triggers" should
>> be formulated in a flexible pattern language that can deal with and
>> respond to any form of sensor data.
> 
> That's a great summary.  I may re-use that if you don't mind 8)
> 
> 
>> This would be in contrast to the strictly defined "onClick" type of
>> events in JavaScript or the existing VRML trigger types such as
>> CylinderSensor [1].
> 
> Well...I see it more as creating a broader, more flexible super-set that
> wraps around current ecma style events, etc.
> 
> 
>> I think this idea has merit and agree that some significant
>> flexibility in the way authors deal with the multiple visual and
>> mechanical sensors at their disposal is vital to creating compelling
>> AR content.
>> However, the flexibility that this approach would give, seems at first
>> glance, to take away some of the ease of authoring that "baked" in
>> inputs/triggers give.
> 
> Well...I think we're generally on the same track here.  But let me
> expand my points a little below.  I hope I can present a case that this
> could make this type of authoring "easier" rather than "less easy".
> 
> 
>> And, I it is not obvious to me now how one incorporates more general
>> computation into this model.
> 
> Attached is a simple diagram of the type of Open Stack that I think will
> enable this type of standardisation.  However, there is a lot of hidden
> detail in that diagram so I would expect we may need to bounce a few
> messages back and forth to walk through it all 8)
> 
> This type of system could easily be implemented in one of the really
> modern browsers simply within javascript, however to get the full
> benefit of the dynamic, sensor rich, new environment it would be built
> as a natively enhanced browser (hopefully by many vendors).  
> 
> My underlying assumption is that all of this should be based upon open
> web standards and the existing HTTP related infrastructure.
> 
> 
>> Take the aforementioned CylinderSensor; how would you describe the
>> behavior of this trigger using patterns of interest?
> 
> That is a good question.  I think tangible examples really help our
> discussions.  CylinderSensor binds (at quite a programmatic level)
> pointer motion to 3D object manipulation/rotation.
> 
> My proposal would allow you to treat the pointer input as one type of
> sensor data.  With a simple pattern language you could then map this (or
> at least defined patterns within this) to specific URIs.  In many ways
> this could be seen as similar to a standardised approach to creating
> listeners.
> 
> So the first request is the sensor data event.
> The response is 0 or more URIs.  These URIs can obviously contain
> external resources or javascript:... style resources to link to dynamic
> local code or APIs.  The values from the sensor data should also easily
> be able to be mapped into the structure of this URI request.  e.g.
> javascript:do_something($sensors.gps.lat)
> 
> The processing of these generated URIs are then the second layer of
> requests.  And their responses are the second layer of response.  These
> responses could be any valid response to a URI.  For standard http:// or
> similar requests the response could be HTML, SVG, WebGL or other valid
> mime typed content.  For javascript: style requests the responses can be
> more complex and may simply be used to link things like orientation to
> the sliding of HTML, SVG or WebGL content in the x dimension to simulate
> a moving point of view.
> 
> But pointers are just one very simple type of input sensor.  I'm sure
> we'd all agree that eye tracking, head tracking, limb/body tracking and
> other more abstract gestural tracking will soon be flooding into our
> systems from more than one point of origin.
> 
> 
>> While there may be standards that will eventually support this (i.e.
>> the W3C Sensor Incubator Group [2]), I wonder if this type of "sensor
>> filtering language" is beyond our scope.
> 
> This could well be true, however I think it would simply be building on
> top of the work from the SSN-XG.  And I also think that by the time we
> completed this work just for a lat/lon/alt based Point of Interest the
> standard would be out-dated as this space is moving so quickly.  From my
> perspective this window is only a matter of months and not years.
> 
> With this simple type of language and the most basic version of this
> Open AR Client Stack a web standards version of any of the current
> Mobile AR apps could easily be built.
> 
> 1. lat/lon/alt/orientation are fed in as sensor data
> 2. based on freshness/deltas then the following requests are composed
> a - GET http://host/PointsOfInterest?lat=$lat&lon=$lon
> b - javascript:update_orientation({ z:$alt, x:$x, y:$y })
> 3. The results from 2a are loaded into a local data store (js object)
> 4. The 2b request updates the current viewport using the orientation
> params and the updated data store.
> 
> NOTE: One key thing is that the current browser models will need to be
> re-thought to be optimised for dynamic streamed data such as
> orientation, video, etc.
> 
> 
>> The second main point you make is that we should reconsider the
>> request-response nature of the internet in the AR context.
>> Again, this is an important idea and one worth seriously considering.
>> But in a similar fashion to my concerns about pattern of interest
>> filtering, I worry that this circumvents an existing model that has
>> merit.
>> The data-trigger-response-representation model you suggest already
>> happens routinely in rich Web 2.0 applications.
>> The difference is that it happens under the programatic control of the
>> author where they have access to a multitude of libraries and
>> resources (i.e. jQuery, database access, hardware, user settings,
>> etc.)
> 
> I think that's the great opportunity here.  To take the best practices
> and benefits from this type of 2.0 interaction...and abstract this out
> to integrate the growing wave of sensor data AND make it more
> accessible/usable to the common web user.
> 
> The type of system outlined in the attached diagram would extend this in
> two ways.  Each of the browser vendors that implement this type of
> solution could compete and innovate at the UI level to make the full
> power of this standard available through simple
> point/click/tap/swipe/etc. style interfaces.  
> 
> They could also compete by making it easy for developers to create
> re-usable bundles at a much more programmatic level.
> 
> Outside of this publishers can simply use the existing open HTML, SVG
> and WebGL standards to create any content they choose.  This leaves this
> space open to re-use the existing web content and services as well as
> benefiting as that space continues to develop.
> 
> And the existing HTTP infrastructure already provides the framework for
> cache management, scalability, etc.  But I'm preaching to the choir here
> 8)
> 
> 
>> (this point is related to another thread about (data)<>-(criteria) [3]
>> where I agree with Jens that we are talking about multiple
>> data-trigger-reponses)
> 
> I agree.  That's why I propose enabling multiple triggers with
> overlapping input criterion that can each create 0 or more linked
> requests delivers just that.
> 
> 
>> I may need some tutoring on what developing standards means, 
> 
> Ah...here I just meant SVG, WebGL and the current expansion that's
> happening in the CaptureAPI/Video space.
> 
> 
>> but in my view, things like ECMA scripting are an unavoidable part of
>> complex interactivity.
> 
> I agree...but would be fantastic if we could open a standard that also
> helped the browser/solution vendors drive these features up to the user
> level.
> 
> 
>> Perhaps you can give an example where the cutoff between the current
>> request-response model ends and automatic
>> data-POI-response-presentation begins?
> 
> In it's simplest form it can really just be thought of as a funky form
> of dynamic bookmark.  But these bookmarks are triggered by sensor
> patterns.  And their responses are presented and integrated into a
> standards based web UI (HTML, SVG, WebGL, etc.).
> 
> 
> I hope my rant above makes sense...but I'm looking forward bouncing this
> around a lot more to refine the language and knock the rough edges off
> this model.
> 
> Talk to you soon...
> 
> 
> roBman
> 
> 
> 
>> 
>> On Aug 20, 2010, at 10:19 AM, Rob Manson wrote:
>> 
>>> Hi,
>>> 
>>> great to see we're onto the "Next Steps" and we seem to be
>>> discussing
>>> pretty detailed structures now 8)  So I'd like to submit the
>>> following
>>> proposal for discussion.  This is based on our discussion so far and
>>> the
>>> ideas I think we have achieved some resolution on.
>>> 
>>> I'll look forward to your replies...
>>> 
>>> roBman
>>> 
>>> PS: I'd be particularly interested to hear ideas from the linked
>>> data
>>> and SSN groups on what parts of their existing work can improve this
>>> model and how they think it could be integrated.
>>> 
>>> 
>>> 
>>> What is this POI proposal?
>>> A simple extension to the "request-response" nature of the HTTP
>>> protocol
>>> to define a distributed Open AR (Augmented Reality) system.
>>> This sensory based pattern recognition system is simply a structured
>>> "request-response-link-request-response" chain.  In this chain the
>>> link
>>> is a specific form of transformation.
>>> 
>>> It aims to extend the existing web to be sensor aware and
>>> automatically
>>> event driven while encouraging the presentation layer to adapt to
>>> support dynamic spatialised information more fluidly.
>>> 
>>> One of the great achievements of the web has been the separation of
>>> data
>>> and presentation. The proposed Open AR structure extends this to
>>> separate out: sensory data, triggers, response data and
>>> presentation.
>>> 
>>> NOTE1: There are a wide range of serialisation options that could be
>>> supported and many namespaces and data structures/ontologies that
>>> can be
>>> incorporated (e.g. Dublin Core, geo, etc.).  The focus of this
>>> proposal
>>> is purely at a systemic "value chain" level.  It is assumed that the
>>> definition of serialisation formats, namespace support and common
>>> data
>>> structures would make up the bulk of the work that the working group
>>> will collaboratively define.  The goal here is to define a structure
>>> that enables this to be easily extended in defined and modular ways.
>>> 
>>> NOTE2: The example JSON-like data structures outlined below are
>>> purely
>>> to convey the proposed concepts.  They are not intended to be
>>> realised
>>> in this format at all and there is no attachment at this stage to
>>> JSON,
>>> XML or any other representational format.  They are purely
>>> conceptual.
>>> 
>>> This proposal is based upon the following structural evolution of
>>> devices and client application models:
>>> 
>>> PC Web Browser (Firefox, MSIE, etc.):
>>>   mouse      -> sensors -> dom      -> data
>>>   keyboard   ->                     -> presentation
>>> 
>>> Mobile Web Browser (iPhone, Android, etc.):
>>>   gestures   -> sensors -> dom      -> data
>>>   keyboard   ->                     -> presentation
>>> 
>>> Mobile AR Browser (Layar, Wikitude, Junaio, etc.):
>>>   gestures   -> sensors -> custom app            -> presentation
>>> [*custom]
>>>   keyboard   ->                                  -> data [*custom]
>>>   camera     ->
>>>   gps        ->
>>>   compass    ->
>>> 
>>> Open AR Browser (client):
>>>   mouse      -> sensors -> triggers ->  dom      -> presentation
>>>   keyboard   ->                                  -> data
>>>   camera     ->
>>>   gps        ->
>>>   compass    ->
>>>   accelerom. ->
>>>   rfid       ->
>>>   ir         ->
>>>   proximity  ->
>>>   motion     ->
>>> 
>>> NOTE3: The key next step from Mobile AR to Open AR is the addition
>>> of
>>> many more sensor types, migrating presentation and data to open web
>>> based standards and the addition of triggers.  Triggers are explicit
>>> links from a pattern to 0 or more actions (web requests).
>>> 
>>> Here is a brief description of each of the elements in this high
>>> level
>>> value chain.
>>> 
>>> clients:
>>> - handle events and request sensory data then filter and link it to
>>> 0 or
>>> more actions (web requests)
>>> - clients can cache trigger definitions locally or request them from
>>> one
>>> or more services that match one or more specific patterns.
>>> - clients can also cache response data and presentation states.
>>> - since sensory data, triggers and response data are simply HTTP
>>> responses all of the normal cache control structures are already in
>>> place.
>>> 
>>> infrastructure (The Internet Of Things):
>>> - networked and directly connected sensors and devices that support
>>> the
>>> Patterns Of Interest specification/standard
>>> 
>>> 
>>> patterns of interest:
>>> The standard HTTP request response processing chain can be seen as:
>>> 
>>> event -> request -> response -> presentation
>>> 
>>> The POI (Pattern Of Interest) value chain is slightly extended.
>>> The most common Mobile AR implementation of this is currently:
>>> 
>>> AR App event -> GPS reading -> get nearby info request -> Points Of
>>> Interest response -> AR presentation
>>> 
>>> A more detailed view clearly splits events into two to create
>>> possible
>>> feedback loops. It also splits the request into sensor data and
>>> trigger:
>>> 
>>>               +- event -+               +-------+-- event --+
>>> sensor data --+-> trigger -> response data -> presentation -+
>>> 
>>> - this allows events that happen at both the sensory and
>>> presentation
>>> ends of the chain.
>>> - triggers are bundles that link a pattern to one or more actions
>>> (web
>>> requests).
>>> - events at the sensor end request sensory data and filter it to
>>> find
>>> patterns that trigger or link to actions.
>>> - these triggers or links can also fire other events that load more
>>> sensory data that is filtered and linked to actions, etc.
>>> - actions return data that can then be presented.  As per standard
>>> web
>>> interactions supported formats can be defined by the requesting
>>> client.
>>> - events on the presentation side can interact with the data or the
>>> presentation itself.
>>> 
>>> sensory data:
>>> Simple (xml/json/key-value) representations of sensors and their
>>> values
>>> at a point in time.  These are available via URLs/HTTP requests
>>> e.g. sensors can update these files on change, at regular intervals
>>> or
>>> serve them dynamically.
>>> {
>>> HEAD : {
>>>   date_recorded : "Sat Aug 21 00:10:39 EST 2010",
>>>   source_url : "url"
>>> },
>>> BODY : {
>>>   gps : {  // based on standard geo data structures
>>>     latitude : "n.n",
>>>     longitude : "n,n",
>>>     altitude : "n",
>>>   },
>>>   compass : {
>>>     orientation : "n"
>>>   },
>>>   camera : {
>>>     image : "url",
>>>     stream : "url"
>>>   }
>>> }
>>> }
>>> NOTE: All sensor values could be presented inline or externally via
>>> a
>>> source URL which could then also reference streams.
>>> 
>>> trigger:
>>> structured (xml/json/key-value) filter that defines a pattern and
>>> links
>>> it to 0 or more actions (web requests)
>>> [
>>> HEAD : {
>>>   date_created : "Sat Aug 21 00:10:39 EST 2010",
>>>   author : "roBman@mob-labs.com",
>>>   last_modified : "Sat Aug 21 00:10:39 EST 2010"
>>> },
>>> BODY : {
>>>   pattern : {
>>>     gps : [
>>>       {
>>>         name : "iphone",
>>>         id : "01",
>>>         latitude : {
>>>           value : "n.n"
>>>         },
>>>         longitude : {
>>>           value : "n.n"
>>>         },
>>>         altitude : {
>>>           value : "n.n"
>>>         }
>>>       },
>>>       // NOTE: GPS value patterns could have their own ranges
>>> defined
>>>       //       but usually the client will just set it's own at the
>>> filter level
>>>       // range : "n",
>>>       // range_format : "metres"
>>>       // This is an area where different client applications can
>>> add their unique value
>>>     ],
>>>     cameras : [
>>>       {
>>>         name : "home",
>>>         id : "03",
>>>         type : "opencv_haar_cascade"
>>>         pattern : {
>>>           ...
>>>         }
>>>       }
>>>     ]
>>>   },
>>>   actions : [
>>>     {
>>>       url : "url",
>>>       data : {..},  // Support for referring to sensor values
>>> $sensors.gps.latitude & $sensors.compass.orientation
>>>       method : "POST"
>>>     },
>>>   ]
>>> }
>>> ]
>>> 
>>> data
>>> HTTP Responses
>>> 
>>> presentation
>>> client rendered HTML/CSS/JS/RICH MEDIA (e.g. Images, 3D, Video,
>>> Audio,
>>> etc.)
>>> 
>>> 
>>> 
>>> At least the following roles are supported as extensions of today's
>>> common "web value chain" roles.
>>> 
>>>       publishers:
>>>       - define triggers that map specific sensor data patterns to
>>>       useful actions (web requests)
>>>       - manage the acl to drive traffic in exchange for value
>>> creation
>>>       - customise the client apps and content to create compelling
>>>       experiences
>>> 
>>>       developers:
>>>       - create sensor bundles people can buy and install in their
>>> own
>>>       environment
>>>       - create server applications that allow publishers to
>>> register
>>>       and manage triggers
>>>       - enable the publishers to make their triggers available to
>>> an
>>>       open or defined set of clients
>>>       - create the web applications that receive the final actions
>>>       (web requests)
>>>       - create the clients applications that handle events and map
>>>       sensor data to requests through triggers (Open AR browsers)
>>> 
>>> 
>> [1] http://www.web3d.org/x3d/wiki/index.php/CylinderSensor
>> [2] http://www.w3.org/2005/Incubator/ssn/charter
>> [3] 
> <OpenARClientStack-01.png>

Alex Hill Ph.D.
Postdoctoral Fellow
Augmented Environments Laboratory
Georgia Institute of Technology
http://www.augmentedenvironments.org/lab
Received on Friday, 10 September 2010 12:31:36 UTC