Re: thoughts towards a draft AR WG charter from Rob Manson on 2010-08-04 (public-poiwg@w3.org from August 2010)

From: Rob Manson <roBman@mob-labs.com>
Date: Wed, 04 Aug 2010 15:58:22 +1000
To: public-poiwg@w3.org
Message-Id: <1280901502.3376.341.camel@localhost.localdomain>
On Tue, 2010-08-03 at 15:42 +0100, Phil Archer wrote:
> A location, even one that is defined relative to something else as 
> opposed to a fixed point in space, is not a trigger. The trigger is that 
> a device has arrived at that location (or that a recognisable 
> object/marker has come into view, or that a time has arrived or whatever).

Hi Phil...I think I have quite a different cultural perspective...so
forgive me while I dance around a bit and try to clarify my language.

A location from my perspective is an abstract concept that can be
represented in a coordinate system such as lat/lon.  Without other
related information this is simply and abstract concept.

The number of "or"'s used in the second part of your sentence is [I
believe] a clear hint that a broader "sensor bundle" model is needed.
It's such early days that I'd hate for a standard to get fixated just on
POIs.
 

> In Web terms, we're talking about events, no? Things like onclick, 
> onchange etc. /Those/ are the triggers. A Web App that's triggered by 
> absolute geolocation might query the GeoLoc API regularly (say every 
> second) and then you'd query a POI database with a machine version of 
> "what do we know about this place?" That could be enriched with 
> directional and temporal info etc. of course. But, as you say, Matt, 
> that's a /query/ against a dataset.

Well...to step back for a second...what I really honestly think AR is,
is a form of "digital sensory perception".  The term "sensory
perception" can be broken down into two clear concepts.

1. sensory data
This is the raw data collected from an environment by sensors/sensory
organs.

2. perception
After the data is processed a number of "recognised" features,
affordances or "things" are extracted and re-presented.  Perception is
fundamentally a "representational" process that turns raw data into
meaning.  It's also important/relevant to note that in our biological
world "perception" very likely occurs in at least 2 places.

  1. in our sensory organs
  2. in our mind

Some would even say it may happen IN the environment before we sense it
too.  This multi-layered approach maps well to what we're discussing.
Raw data may be turned into perceptible [triggers] either in the
sensors, in the application, in an online service or really anywhere
within the data processing chain.

So I think this is a completely new approach to events.  I would hate to
think we had to keep stacking on different "onVerb" bindings every time
someone wanted to add a new type of sensor interaction/event.


> The term 'trigger' implies dynamism and these are covered by existing 
> working groups who are in various stages of looking at things like 
> GeoLoc, camera, clock etc.). 

True...however many of these are not looking at it from the perspective
we are discussing at all (at least that's how it appears from the
outside).  For example, the camera/capture API [1] simply seems to deal
with embedding a raw camera view into a browser.  

The API itself has a gaping hole from my perspective.  There's a call to
start capturing video and then a call/callback when that is complete.
>From my experience, AR happens almost exclusively BETWEEN those 2 calls.

NOTE: I'm not criticising this groups work, just pointing out our
cultural differences.  This is why I listed them as one of the groups
that I think needs to be intimately engaged in this discussion [2]


> I believe Point of Interest data should be thought of as static. What 
> data you take from that data is triggered by the user's actions, 
> location, time zone or whatever.

I agree with the general point you're making...but perhaps the word
"static" is a bit misleading here.  "Tends to be static"...but not
necessarily.  e.g. a User can just as easily be a POI as a building can
be.  We've done this in systems we've built.  And this User may be
moving, or even be in the past or future (or even through time!).

[1] http://www.w3.org/TR/2010/WD-capture-api-20100401/#uiexamples
[2] http://lists.w3.org/Archives/Public/public-poiwg/2010Jul/0048.html


NOTE: I've included 2 responses in 1 email to reduce my overall SPAM
rating 8)

On Tue, 2010-08-03 at 17:14:54 +0100, Phil Archer wrote:
> I've not heard it used elsewhere (that doesn't mean that it's not used
> elsewhere of course, just that I've not heard it!)

I'm really not tied to the word [trigger].  I've been putting it in []
where possible to denote I'm just using it as a placeholder.


> It's clear to me that that a point of interest does not and should not
> necessarily refer to a location on the Earth's surface (or any other
> planet for that matter). Recognising an image (such as a label on a
> bottle of wine) does not imply a geographical location other than
> one's proximity to said bottle.

This is the key point about [sensor data bundle] vs [location].
Location coordinates are just the archetypal AR [sensor data bundle] but
definitely should not be the only ones.

If this point is accepted then POI is relegated to just the archetypal
AR content type with the door being left open to a rich new set of
content/concept types after that.  As I said before, AR that only
supports POIs would be like HTTP that only supports HTML documents.
That's useful/necessary for the web, but not sufficient.


> The trigger here is that an image captured by the device has been
> passed to some sort of image recognition software that has been able
> to associate it with an identifier that can then be used to access the
> (static) data about that particular bottle.

See my point above about "sensory perception".  And again I'd call out
your use of the word "static".


> > You could, I suppose, think of them as "auto triggers". Triggered 
> > by the user moving, or things coming into their FOV rather then a 
> > click which is a more active form of user triggering. As you say, 
> > these would involving query a database at an interval, but it 
> > would be something automatically done by the browser, and not 
> > something specifically coded for like with onClick style javascript
> > events.
> 
> Not sure I quite agree here. The query (to the browser) might be
> "where am I and which way and I pointing?" That's what GeoLoc
> does/will enable.

This depends upon your cultural perspective.  
You could see this as:

- user behaviour (e.g. movement) drives
- a query to the browser using the GeoLoc API
- that returns lat/lon/orientation

But that's simply the first step in the sensory perception process.
This is extracting raw data from the environment at a point in time.
This raw data then needs to be processed into perception in some way to
make "meaning".  This is essentially what any of the Layar developers do
when they create a service that responds to a getPointsOfInterest()
request.

So I think GeoLoc API fits perfectly into the chain...but again this is
only useful/necessary for AR but not sufficient.


> I might like to have a slightly different query that said "alert me
> when we get to Trafalgar Square" or "alert me if a bottle of 2000 St
> Emillion Grand Cru passes within camera range.

I think "alertness" is a great layer on top of what we have discussed so
far that matches "sensory perception" to "goal seeking behaviour".  But
this is definitely something layered on top.  To apply this to the
[sensory data bundle]/[trigger] model discussed so far this would be a
[trigger] defined by the user as opposed to by the content creator.

BTW: I really strongly agree with Andy Braun's point that the term
"publisher" should be defined as broadly as possible and that the
[trigger] creator may be separate from that as well.


> The other one - tell me when object X is seen - is achievable if you 
> have a universal database of things with an agreed identifier
> structure that is used by all image recognition software. The internet
> of things and all that. The browser API could then have two functions
> "can you give me an identifier for the thing the camera is looking
> at?" and "tell me when an object with x identifier comes into view." 

I'm honestly not trying to be argumentative here 8)
Perhaps I'm reading your language too literally, but it seems to be
hardcoding in a form of tunnel vision.  While we could allow people to
set "alertness" for when a certain object is "seen" within a "camera
view"...the near-term real-world likelihood is that object recognition
will be correlated against a number of parallel sensor input streams.

e.g. The stereo cameras on my wearable display (2 cameras) and the CCTV
camera in the Walmart I'm standing in (1 camera) and the RFID sensor in
my belt (yes...I am batman 8) - 1 sensor) and the RFID sensors on the
shelves (n sensors) all collaborate to tell me that the bottle in front
of me is "over priced plonk".

This is exactly how our biological sensory perception works.
 
 
> > I think Rob Manson expressed it well as a form of triplet;
> >> "Well...if we did use the "trigger" model then I'd express this 
> >> as the following RDFa style triplet:
> >>       this [location] is a [trigger] for [this information]
> I agree with the sentiment here if not the syntax.
> >> POIs in this format would then become the archetypal AR
> relationship.
> >> The most critical and common subset of the broader relationship:
> >  >      this [sensor data bundle] is a [trigger] for [this
> information]
> >> In the standard POIs case the minimum [sensor data bundle] is 
> >> "lat/lon" and then optionally "relative magnetic orientation"."
> 
> That doesn't work as a triple as you have a literal (the sensor data)
> as a subject. 

Hrm...first...I was really just using this as a simplified way to convey
the linking concept 8)

Second...isn't that a matter of perspective.  If the location extracted
as sensory data (lat/lon) from the user is seen as a property of the
user (or object) then I agree with you.  In this case it would be:

[user] [is at] [location]

But linguistically it would be equally valid to see this from a sensory
perception perspective.  Where an idealised [sensory data bundle] IS the
subject.  It is literally turned into a "thing".  A pattern to be
recognised.  It is turned from raw data into a specific pattern.

I do agree however that this would often be a range or a more complex
pattern definition which is where the triplet analogy probably falls
over.  Anyway...based on where our discussion is, I think it's the
linking concept that's important not this specific syntax.


> Yes. I agree. We're linking criteria to data. The data is static. The
> trigger is that the criteria have been met.

There's that s word again 8)


> Thinking about aggregated streams of triggers might be useful in
> future. i.e. a way to say "tell me when my location is within 200
> metres of a shop that sells 2000 St Emillion Grand Cru for less than
> €50. What's aggregated here is the list of criteria and they might
> only be accessible by different sensors and data sources.

This is an excellent example of distributing the "perception" part of
the "sensory perception" across multiple points in the data processing
chain.


> I have no problem at all with the word trigger, I think it's useful.
> My only real point is that data about an object, be it geographically
> fixed or otherwise, is not a trigger. The trigger is that the user has
> moved into a context in which a defined subset of the available data
> is relevant and needs to be displayed.

I was with you right up to the point you said "user".  [sensory data
bundle] is flexible and also covers/enables User-less agents that are
just as feasible and relevant.


I see how some people could see my points as a dilution of the concept
of POIs and some may even see this type of discussion as a distraction
from "the goal".  

My simple perspective is that with a little reframing we can get all of
the benefits of the POI model while leaving the door open for really
innovative new concepts that are only just now starting to be explored.

But I would re-iterate my point:

        "AR is digitally mediated sensory perception"

And NOT just:

        "A geolocated or camera based POI system"


Looking forward to hearing people's responses 8)


roBman
Received on Wednesday, 4 August 2010 06:01:14 UTC