Re: POI based Open AR proposal from Thomas Wrobel on 2010-09-03 (public-poiwg@w3.org from September 2010)

From: Thomas Wrobel <darkflame@gmail.com>
Date: Fri, 3 Sep 2010 13:27:16 +0200
To: roBman@mob-labs.com
Cc: Alex Hill <ahill@gatech.edu>, "Public POI @ W3C" <public-poiwg@w3.org>
Message-ID: <AANLkTinQaWWGwnSmhhiFyEt_E8Ga2aZrM6p1H+4zwaKf@mail.gmail.com>
"But pointers are just one very simple type of input sensor.  I'm sure
we'd all agree that eye tracking, head tracking, limb/body tracking and
other more abstract gestural tracking will soon be flooding into our
systems from more than one point of origin."

Just a little idea here. Again, maybe beyond our scope, but it could
make things easier.

Rather then ever defining precise (pointer/cursor) input methods,
maybe a general form could be;
X/Y/Z + Radius   which would cover most pointer input types, and
radius would simply by a gauge of their accuracy?

Given their is *huge* scope for different input methods over the next
decade or so, I dont think any designers of content should ever really
have to worry about them. Much like a stylus/mouse on PC basically
amounts to the same thing, a "eye tracked location" or a "figure
pointed location" should be able to be interpreted as the same thing?
Its just the user specifying a location to do an action on, basically.
And I think that would always mostly be an OS thing.




On 2 September 2010 10:48, Rob Manson <roBman@mob-labs.com> wrote:
> Hey Alex,
>
>> If I understand correctly, you are suggesting that "triggers" should
>> be formulated in a flexible pattern language that can deal with and
>> respond to any form of sensor data.
>
> That's a great summary.  I may re-use that if you don't mind 8)
>
>
>> This would be in contrast to the strictly defined "onClick" type of
>> events in JavaScript or the existing VRML trigger types such as
>> CylinderSensor [1].
>
> Well...I see it more as creating a broader, more flexible super-set that
> wraps around current ecma style events, etc.
>
>
>> I think this idea has merit and agree that some significant
>> flexibility in the way authors deal with the multiple visual and
>> mechanical sensors at their disposal is vital to creating compelling
>> AR content.
>> However, the flexibility that this approach would give, seems at first
>> glance, to take away some of the ease of authoring that "baked" in
>> inputs/triggers give.
>
> Well...I think we're generally on the same track here.  But let me
> expand my points a little below.  I hope I can present a case that this
> could make this type of authoring "easier" rather than "less easy".
>
>
>> And, I it is not obvious to me now how one incorporates more general
>> computation into this model.
>
> Attached is a simple diagram of the type of Open Stack that I think will
> enable this type of standardisation.  However, there is a lot of hidden
> detail in that diagram so I would expect we may need to bounce a few
> messages back and forth to walk through it all 8)
>
> This type of system could easily be implemented in one of the really
> modern browsers simply within javascript, however to get the full
> benefit of the dynamic, sensor rich, new environment it would be built
> as a natively enhanced browser (hopefully by many vendors).
>
> My underlying assumption is that all of this should be based upon open
> web standards and the existing HTTP related infrastructure.
>
>
>> Take the aforementioned CylinderSensor; how would you describe the
>> behavior of this trigger using patterns of interest?
>
> That is a good question.  I think tangible examples really help our
> discussions.  CylinderSensor binds (at quite a programmatic level)
> pointer motion to 3D object manipulation/rotation.
>
> My proposal would allow you to treat the pointer input as one type of
> sensor data.  With a simple pattern language you could then map this (or
> at least defined patterns within this) to specific URIs.  In many ways
> this could be seen as similar to a standardised approach to creating
> listeners.
>
> So the first request is the sensor data event.
> The response is 0 or more URIs.  These URIs can obviously contain
> external resources or javascript:... style resources to link to dynamic
> local code or APIs.  The values from the sensor data should also easily
> be able to be mapped into the structure of this URI request.  e.g.
> javascript:do_something($sensors.gps.lat)
>
> The processing of these generated URIs are then the second layer of
> requests.  And their responses are the second layer of response.  These
> responses could be any valid response to a URI.  For standard http:// or
> similar requests the response could be HTML, SVG, WebGL or other valid
> mime typed content.  For javascript: style requests the responses can be
> more complex and may simply be used to link things like orientation to
> the sliding of HTML, SVG or WebGL content in the x dimension to simulate
> a moving point of view.
>
> But pointers are just one very simple type of input sensor.  I'm sure
> we'd all agree that eye tracking, head tracking, limb/body tracking and
> other more abstract gestural tracking will soon be flooding into our
> systems from more than one point of origin.
>
>
>> While there may be standards that will eventually support this (i.e.
>> the W3C Sensor Incubator Group [2]), I wonder if this type of "sensor
>> filtering language" is beyond our scope.
>
> This could well be true, however I think it would simply be building on
> top of the work from the SSN-XG.  And I also think that by the time we
> completed this work just for a lat/lon/alt based Point of Interest the
> standard would be out-dated as this space is moving so quickly.  From my
> perspective this window is only a matter of months and not years.
>
> With this simple type of language and the most basic version of this
> Open AR Client Stack a web standards version of any of the current
> Mobile AR apps could easily be built.
>
> 1. lat/lon/alt/orientation are fed in as sensor data
> 2. based on freshness/deltas then the following requests are composed
>  a - GET http://host/PointsOfInterest?lat=$lat&lon=$lon
>  b - javascript:update_orientation({ z:$alt, x:$x, y:$y })
> 3. The results from 2a are loaded into a local data store (js object)
> 4. The 2b request updates the current viewport using the orientation
> params and the updated data store.
>
> NOTE: One key thing is that the current browser models will need to be
> re-thought to be optimised for dynamic streamed data such as
> orientation, video, etc.
>
>
>> The second main point you make is that we should reconsider the
>> request-response nature of the internet in the AR context.
>> Again, this is an important idea and one worth seriously considering.
>> But in a similar fashion to my concerns about pattern of interest
>> filtering, I worry that this circumvents an existing model that has
>> merit.
>> The data-trigger-response-representation model you suggest already
>> happens routinely in rich Web 2.0 applications.
>> The difference is that it happens under the programatic control of the
>> author where they have access to a multitude of libraries and
>> resources (i.e. jQuery, database access, hardware, user settings,
>> etc.)
>
> I think that's the great opportunity here.  To take the best practices
> and benefits from this type of 2.0 interaction...and abstract this out
> to integrate the growing wave of sensor data AND make it more
> accessible/usable to the common web user.
>
> The type of system outlined in the attached diagram would extend this in
> two ways.  Each of the browser vendors that implement this type of
> solution could compete and innovate at the UI level to make the full
> power of this standard available through simple
> point/click/tap/swipe/etc. style interfaces.
>
> They could also compete by making it easy for developers to create
> re-usable bundles at a much more programmatic level.
>
> Outside of this publishers can simply use the existing open HTML, SVG
> and WebGL standards to create any content they choose.  This leaves this
> space open to re-use the existing web content and services as well as
> benefiting as that space continues to develop.
>
> And the existing HTTP infrastructure already provides the framework for
> cache management, scalability, etc.  But I'm preaching to the choir here
> 8)
>
>
>> (this point is related to another thread about (data)<>-(criteria) [3]
>> where I agree with Jens that we are talking about multiple
>> data-trigger-reponses)
>
> I agree.  That's why I propose enabling multiple triggers with
> overlapping input criterion that can each create 0 or more linked
> requests delivers just that.
>
>
>> I may need some tutoring on what developing standards means,
>
> Ah...here I just meant SVG, WebGL and the current expansion that's
> happening in the CaptureAPI/Video space.
>
>
>> but in my view, things like ECMA scripting are an unavoidable part of
>> complex interactivity.
>
> I agree...but would be fantastic if we could open a standard that also
> helped the browser/solution vendors drive these features up to the user
> level.
>
>
>> Perhaps you can give an example where the cutoff between the current
>> request-response model ends and automatic
>> data-POI-response-presentation begins?
>
> In it's simplest form it can really just be thought of as a funky form
> of dynamic bookmark.  But these bookmarks are triggered by sensor
> patterns.  And their responses are presented and integrated into a
> standards based web UI (HTML, SVG, WebGL, etc.).
>
>
> I hope my rant above makes sense...but I'm looking forward bouncing this
> around a lot more to refine the language and knock the rough edges off
> this model.
>
> Talk to you soon...
>
>
> roBman
>
>
>
>>
>> On Aug 20, 2010, at 10:19 AM, Rob Manson wrote:
>>
>> > Hi,
>> >
>> > great to see we're onto the "Next Steps" and we seem to be
>> > discussing
>> > pretty detailed structures now 8)  So I'd like to submit the
>> > following
>> > proposal for discussion.  This is based on our discussion so far and
>> > the
>> > ideas I think we have achieved some resolution on.
>> >
>> > I'll look forward to your replies...
>> >
>> > roBman
>> >
>> > PS: I'd be particularly interested to hear ideas from the linked
>> > data
>> > and SSN groups on what parts of their existing work can improve this
>> > model and how they think it could be integrated.
>> >
>> >
>> >
>> > What is this POI proposal?
>> > A simple extension to the "request-response" nature of the HTTP
>> > protocol
>> > to define a distributed Open AR (Augmented Reality) system.
>> > This sensory based pattern recognition system is simply a structured
>> > "request-response-link-request-response" chain.  In this chain the
>> > link
>> > is a specific form of transformation.
>> >
>> > It aims to extend the existing web to be sensor aware and
>> > automatically
>> > event driven while encouraging the presentation layer to adapt to
>> > support dynamic spatialised information more fluidly.
>> >
>> > One of the great achievements of the web has been the separation of
>> > data
>> > and presentation. The proposed Open AR structure extends this to
>> > separate out: sensory data, triggers, response data and
>> > presentation.
>> >
>> > NOTE1: There are a wide range of serialisation options that could be
>> > supported and many namespaces and data structures/ontologies that
>> > can be
>> > incorporated (e.g. Dublin Core, geo, etc.).  The focus of this
>> > proposal
>> > is purely at a systemic "value chain" level.  It is assumed that the
>> > definition of serialisation formats, namespace support and common
>> > data
>> > structures would make up the bulk of the work that the working group
>> > will collaboratively define.  The goal here is to define a structure
>> > that enables this to be easily extended in defined and modular ways.
>> >
>> > NOTE2: The example JSON-like data structures outlined below are
>> > purely
>> > to convey the proposed concepts.  They are not intended to be
>> > realised
>> > in this format at all and there is no attachment at this stage to
>> > JSON,
>> > XML or any other representational format.  They are purely
>> > conceptual.
>> >
>> > This proposal is based upon the following structural evolution of
>> > devices and client application models:
>> >
>> >  PC Web Browser (Firefox, MSIE, etc.):
>> >    mouse      -> sensors -> dom      -> data
>> >    keyboard   ->                     -> presentation
>> >
>> >  Mobile Web Browser (iPhone, Android, etc.):
>> >    gestures   -> sensors -> dom      -> data
>> >    keyboard   ->                     -> presentation
>> >
>> >  Mobile AR Browser (Layar, Wikitude, Junaio, etc.):
>> >    gestures   -> sensors -> custom app            -> presentation
>> > [*custom]
>> >    keyboard   ->                                  -> data [*custom]
>> >    camera     ->
>> >    gps        ->
>> >    compass    ->
>> >
>> >  Open AR Browser (client):
>> >    mouse      -> sensors -> triggers ->  dom      -> presentation
>> >    keyboard   ->                                  -> data
>> >    camera     ->
>> >    gps        ->
>> >    compass    ->
>> >    accelerom. ->
>> >    rfid       ->
>> >    ir         ->
>> >    proximity  ->
>> >    motion     ->
>> >
>> > NOTE3: The key next step from Mobile AR to Open AR is the addition
>> > of
>> > many more sensor types, migrating presentation and data to open web
>> > based standards and the addition of triggers.  Triggers are explicit
>> > links from a pattern to 0 or more actions (web requests).
>> >
>> > Here is a brief description of each of the elements in this high
>> > level
>> > value chain.
>> >
>> > clients:
>> > - handle events and request sensory data then filter and link it to
>> > 0 or
>> > more actions (web requests)
>> > - clients can cache trigger definitions locally or request them from
>> > one
>> > or more services that match one or more specific patterns.
>> > - clients can also cache response data and presentation states.
>> > - since sensory data, triggers and response data are simply HTTP
>> > responses all of the normal cache control structures are already in
>> > place.
>> >
>> > infrastructure (The Internet Of Things):
>> > - networked and directly connected sensors and devices that support
>> > the
>> > Patterns Of Interest specification/standard
>> >
>> >
>> > patterns of interest:
>> > The standard HTTP request response processing chain can be seen as:
>> >
>> >  event -> request -> response -> presentation
>> >
>> > The POI (Pattern Of Interest) value chain is slightly extended.
>> > The most common Mobile AR implementation of this is currently:
>> >
>> >  AR App event -> GPS reading -> get nearby info request -> Points Of
>> > Interest response -> AR presentation
>> >
>> > A more detailed view clearly splits events into two to create
>> > possible
>> > feedback loops. It also splits the request into sensor data and
>> > trigger:
>> >
>> >                +- event -+               +-------+-- event --+
>> >  sensor data --+-> trigger -> response data -> presentation -+
>> >
>> > - this allows events that happen at both the sensory and
>> > presentation
>> > ends of the chain.
>> > - triggers are bundles that link a pattern to one or more actions
>> > (web
>> > requests).
>> > - events at the sensor end request sensory data and filter it to
>> > find
>> > patterns that trigger or link to actions.
>> > - these triggers or links can also fire other events that load more
>> > sensory data that is filtered and linked to actions, etc.
>> > - actions return data that can then be presented.  As per standard
>> > web
>> > interactions supported formats can be defined by the requesting
>> > client.
>> > - events on the presentation side can interact with the data or the
>> > presentation itself.
>> >
>> > sensory data:
>> > Simple (xml/json/key-value) representations of sensors and their
>> > values
>> > at a point in time.  These are available via URLs/HTTP requests
>> > e.g. sensors can update these files on change, at regular intervals
>> > or
>> > serve them dynamically.
>> > {
>> >  HEAD : {
>> >    date_recorded : "Sat Aug 21 00:10:39 EST 2010",
>> >    source_url : "url"
>> >  },
>> >  BODY : {
>> >    gps : {  // based on standard geo data structures
>> >      latitude : "n.n",
>> >      longitude : "n,n",
>> >      altitude : "n",
>> >    },
>> >    compass : {
>> >      orientation : "n"
>> >    },
>> >    camera : {
>> >      image : "url",
>> >      stream : "url"
>> >    }
>> >  }
>> > }
>> > NOTE: All sensor values could be presented inline or externally via
>> > a
>> > source URL which could then also reference streams.
>> >
>> > trigger:
>> > structured (xml/json/key-value) filter that defines a pattern and
>> > links
>> > it to 0 or more actions (web requests)
>> > [
>> >  HEAD : {
>> >    date_created : "Sat Aug 21 00:10:39 EST 2010",
>> >    author : "roBman@mob-labs.com",
>> >    last_modified : "Sat Aug 21 00:10:39 EST 2010"
>> >  },
>> >  BODY : {
>> >    pattern : {
>> >      gps : [
>> >        {
>> >          name : "iphone",
>> >          id : "01",
>> >          latitude : {
>> >            value : "n.n"
>> >          },
>> >          longitude : {
>> >            value : "n.n"
>> >          },
>> >          altitude : {
>> >            value : "n.n"
>> >          }
>> >        },
>> >        // NOTE: GPS value patterns could have their own ranges
>> > defined
>> >        //       but usually the client will just set it's own at the
>> > filter level
>> >        // range : "n",
>> >        // range_format : "metres"
>> >        // This is an area where different client applications can
>> > add their unique value
>> >      ],
>> >      cameras : [
>> >        {
>> >          name : "home",
>> >          id : "03",
>> >          type : "opencv_haar_cascade"
>> >          pattern : {
>> >            ...
>> >          }
>> >        }
>> >      ]
>> >    },
>> >    actions : [
>> >      {
>> >        url : "url",
>> >        data : {..},  // Support for referring to sensor values
>> > $sensors.gps.latitude & $sensors.compass.orientation
>> >        method : "POST"
>> >      },
>> >    ]
>> >  }
>> > ]
>> >
>> > data
>> > HTTP Responses
>> >
>> > presentation
>> > client rendered HTML/CSS/JS/RICH MEDIA (e.g. Images, 3D, Video,
>> > Audio,
>> > etc.)
>> >
>> >
>> >
>> > At least the following roles are supported as extensions of today's
>> > common "web value chain" roles.
>> >
>> >        publishers:
>> >        - define triggers that map specific sensor data patterns to
>> >        useful actions (web requests)
>> >        - manage the acl to drive traffic in exchange for value
>> > creation
>> >        - customise the client apps and content to create compelling
>> >        experiences
>> >
>> >        developers:
>> >        - create sensor bundles people can buy and install in their
>> > own
>> >        environment
>> >        - create server applications that allow publishers to
>> > register
>> >        and manage triggers
>> >        - enable the publishers to make their triggers available to
>> > an
>> >        open or defined set of clients
>> >        - create the web applications that receive the final actions
>> >        (web requests)
>> >        - create the clients applications that handle events and map
>> >        sensor data to requests through triggers (Open AR browsers)
>> >
>> >
>> [1] http://www.web3d.org/x3d/wiki/index.php/CylinderSensor
>> [2] http://www.w3.org/2005/Incubator/ssn/charter
>> [3]
>
Received on Friday, 3 September 2010 11:27:56 UTC