Re: thoughts towards a draft AR WG charter from Phil Archer on 2010-08-04 (public-poiwg@w3.org from August 2010)

From: Phil Archer <phila@w3.org>
Date: Wed, 04 Aug 2010 09:33:35 +0100
To: roBman@mob-labs.com
CC: public-poiwg@w3.org
Message-ID: <4C5925DF.2050204@w3.org>
Very interesting, thanks, Rob.

I think I've said enough for now and am keen to step back into the 
shadows again. All I would emphasise is that there is "violent 
agreement" that POIs are not necessarily geolocated. A POI, be it on 
object, a building or the car in front, can become relevant through any 
number of sensory inputs.

Cheers

Phil

Rob Manson wrote:
> On Tue, 2010-08-03 at 15:42 +0100, Phil Archer wrote:
>> A location, even one that is defined relative to something else as 
>> opposed to a fixed point in space, is not a trigger. The trigger is that 
>> a device has arrived at that location (or that a recognisable 
>> object/marker has come into view, or that a time has arrived or whatever).
> 
> Hi Phil...I think I have quite a different cultural perspective...so
> forgive me while I dance around a bit and try to clarify my language.
> 
> A location from my perspective is an abstract concept that can be
> represented in a coordinate system such as lat/lon.  Without other
> related information this is simply and abstract concept.
> 
> The number of "or"'s used in the second part of your sentence is [I
> believe] a clear hint that a broader "sensor bundle" model is needed.
> It's such early days that I'd hate for a standard to get fixated just on
> POIs.
>  
> 
>> In Web terms, we're talking about events, no? Things like onclick, 
>> onchange etc. /Those/ are the triggers. A Web App that's triggered by 
>> absolute geolocation might query the GeoLoc API regularly (say every 
>> second) and then you'd query a POI database with a machine version of 
>> "what do we know about this place?" That could be enriched with 
>> directional and temporal info etc. of course. But, as you say, Matt, 
>> that's a /query/ against a dataset.
> 
> Well...to step back for a second...what I really honestly think AR is,
> is a form of "digital sensory perception".  The term "sensory
> perception" can be broken down into two clear concepts.
> 
> 1. sensory data
> This is the raw data collected from an environment by sensors/sensory
> organs.
> 
> 2. perception
> After the data is processed a number of "recognised" features,
> affordances or "things" are extracted and re-presented.  Perception is
> fundamentally a "representational" process that turns raw data into
> meaning.  It's also important/relevant to note that in our biological
> world "perception" very likely occurs in at least 2 places.
> 
>   1. in our sensory organs
>   2. in our mind
> 
> Some would even say it may happen IN the environment before we sense it
> too.  This multi-layered approach maps well to what we're discussing.
> Raw data may be turned into perceptible [triggers] either in the
> sensors, in the application, in an online service or really anywhere
> within the data processing chain.
> 
> So I think this is a completely new approach to events.  I would hate to
> think we had to keep stacking on different "onVerb" bindings every time
> someone wanted to add a new type of sensor interaction/event.
> 
> 
>> The term 'trigger' implies dynamism and these are covered by existing 
>> working groups who are in various stages of looking at things like 
>> GeoLoc, camera, clock etc.). 
> 
> True...however many of these are not looking at it from the perspective
> we are discussing at all (at least that's how it appears from the
> outside).  For example, the camera/capture API [1] simply seems to deal
> with embedding a raw camera view into a browser.  
> 
> The API itself has a gaping hole from my perspective.  There's a call to
> start capturing video and then a call/callback when that is complete.
>>From my experience, AR happens almost exclusively BETWEEN those 2 calls.
> 
> NOTE: I'm not criticising this groups work, just pointing out our
> cultural differences.  This is why I listed them as one of the groups
> that I think needs to be intimately engaged in this discussion [2]
> 
> 
>> I believe Point of Interest data should be thought of as static. What 
>> data you take from that data is triggered by the user's actions, 
>> location, time zone or whatever.
> 
> I agree with the general point you're making...but perhaps the word
> "static" is a bit misleading here.  "Tends to be static"...but not
> necessarily.  e.g. a User can just as easily be a POI as a building can
> be.  We've done this in systems we've built.  And this User may be
> moving, or even be in the past or future (or even through time!).
> 
> [1] http://www.w3.org/TR/2010/WD-capture-api-20100401/#uiexamples
> [2] http://lists.w3.org/Archives/Public/public-poiwg/2010Jul/0048.html
> 
> 
> NOTE: I've included 2 responses in 1 email to reduce my overall SPAM
> rating 8)
> 
> On Tue, 2010-08-03 at 17:14:54 +0100, Phil Archer wrote:
>> I've not heard it used elsewhere (that doesn't mean that it's not used
>> elsewhere of course, just that I've not heard it!)
> 
> I'm really not tied to the word [trigger].  I've been putting it in []
> where possible to denote I'm just using it as a placeholder.
> 
> 
>> It's clear to me that that a point of interest does not and should not
>> necessarily refer to a location on the Earth's surface (or any other
>> planet for that matter). Recognising an image (such as a label on a
>> bottle of wine) does not imply a geographical location other than
>> one's proximity to said bottle.
> 
> This is the key point about [sensor data bundle] vs [location].
> Location coordinates are just the archetypal AR [sensor data bundle] but
> definitely should not be the only ones.
> 
> If this point is accepted then POI is relegated to just the archetypal
> AR content type with the door being left open to a rich new set of
> content/concept types after that.  As I said before, AR that only
> supports POIs would be like HTTP that only supports HTML documents.
> That's useful/necessary for the web, but not sufficient.
> 
> 
>> The trigger here is that an image captured by the device has been
>> passed to some sort of image recognition software that has been able
>> to associate it with an identifier that can then be used to access the
>> (static) data about that particular bottle.
> 
> See my point above about "sensory perception".  And again I'd call out
> your use of the word "static".
> 
> 
>>> You could, I suppose, think of them as "auto triggers". Triggered 
>>> by the user moving, or things coming into their FOV rather then a 
>>> click which is a more active form of user triggering. As you say, 
>>> these would involving query a database at an interval, but it 
>>> would be something automatically done by the browser, and not 
>>> something specifically coded for like with onClick style javascript
>>> events.
>> Not sure I quite agree here. The query (to the browser) might be
>> "where am I and which way and I pointing?" That's what GeoLoc
>> does/will enable.
> 
> This depends upon your cultural perspective.  
> You could see this as:
> 
> - user behaviour (e.g. movement) drives
> - a query to the browser using the GeoLoc API
> - that returns lat/lon/orientation
> 
> But that's simply the first step in the sensory perception process.
> This is extracting raw data from the environment at a point in time.
> This raw data then needs to be processed into perception in some way to
> make "meaning".  This is essentially what any of the Layar developers do
> when they create a service that responds to a getPointsOfInterest()
> request.
> 
> So I think GeoLoc API fits perfectly into the chain...but again this is
> only useful/necessary for AR but not sufficient.
> 
> 
>> I might like to have a slightly different query that said "alert me
>> when we get to Trafalgar Square" or "alert me if a bottle of 2000 St
>> Emillion Grand Cru passes within camera range.
> 
> I think "alertness" is a great layer on top of what we have discussed so
> far that matches "sensory perception" to "goal seeking behaviour".  But
> this is definitely something layered on top.  To apply this to the
> [sensory data bundle]/[trigger] model discussed so far this would be a
> [trigger] defined by the user as opposed to by the content creator.
> 
> BTW: I really strongly agree with Andy Braun's point that the term
> "publisher" should be defined as broadly as possible and that the
> [trigger] creator may be separate from that as well.
> 
> 
>> The other one - tell me when object X is seen - is achievable if you 
>> have a universal database of things with an agreed identifier
>> structure that is used by all image recognition software. The internet
>> of things and all that. The browser API could then have two functions
>> "can you give me an identifier for the thing the camera is looking
>> at?" and "tell me when an object with x identifier comes into view." 
> 
> I'm honestly not trying to be argumentative here 8)
> Perhaps I'm reading your language too literally, but it seems to be
> hardcoding in a form of tunnel vision.  While we could allow people to
> set "alertness" for when a certain object is "seen" within a "camera
> view"...the near-term real-world likelihood is that object recognition
> will be correlated against a number of parallel sensor input streams.
> 
> e.g. The stereo cameras on my wearable display (2 cameras) and the CCTV
> camera in the Walmart I'm standing in (1 camera) and the RFID sensor in
> my belt (yes...I am batman 8) - 1 sensor) and the RFID sensors on the
> shelves (n sensors) all collaborate to tell me that the bottle in front
> of me is "over priced plonk".
> 
> This is exactly how our biological sensory perception works.
>  
>  
>>> I think Rob Manson expressed it well as a form of triplet;
>>>> "Well...if we did use the "trigger" model then I'd express this 
>>>> as the following RDFa style triplet:
>>>>       this [location] is a [trigger] for [this information]
>> I agree with the sentiment here if not the syntax.
>>>> POIs in this format would then become the archetypal AR
>> relationship.
>>>> The most critical and common subset of the broader relationship:
>>>  >      this [sensor data bundle] is a [trigger] for [this
>> information]
>>>> In the standard POIs case the minimum [sensor data bundle] is 
>>>> "lat/lon" and then optionally "relative magnetic orientation"."
>> That doesn't work as a triple as you have a literal (the sensor data)
>> as a subject. 
> 
> Hrm...first...I was really just using this as a simplified way to convey
> the linking concept 8)
> 
> Second...isn't that a matter of perspective.  If the location extracted
> as sensory data (lat/lon) from the user is seen as a property of the
> user (or object) then I agree with you.  In this case it would be:
> 
> [user] [is at] [location]
> 
> But linguistically it would be equally valid to see this from a sensory
> perception perspective.  Where an idealised [sensory data bundle] IS the
> subject.  It is literally turned into a "thing".  A pattern to be
> recognised.  It is turned from raw data into a specific pattern.
> 
> I do agree however that this would often be a range or a more complex
> pattern definition which is where the triplet analogy probably falls
> over.  Anyway...based on where our discussion is, I think it's the
> linking concept that's important not this specific syntax.
> 
> 
>> Yes. I agree. We're linking criteria to data. The data is static. The
>> trigger is that the criteria have been met.
> 
> There's that s word again 8)
> 
> 
>> Thinking about aggregated streams of triggers might be useful in
>> future. i.e. a way to say "tell me when my location is within 200
>> metres of a shop that sells 2000 St Emillion Grand Cru for less than
>> €50. What's aggregated here is the list of criteria and they might
>> only be accessible by different sensors and data sources.
> 
> This is an excellent example of distributing the "perception" part of
> the "sensory perception" across multiple points in the data processing
> chain.
> 
> 
>> I have no problem at all with the word trigger, I think it's useful.
>> My only real point is that data about an object, be it geographically
>> fixed or otherwise, is not a trigger. The trigger is that the user has
>> moved into a context in which a defined subset of the available data
>> is relevant and needs to be displayed.
> 
> I was with you right up to the point you said "user".  [sensory data
> bundle] is flexible and also covers/enables User-less agents that are
> just as feasible and relevant.
> 
> 
> I see how some people could see my points as a dilution of the concept
> of POIs and some may even see this type of discussion as a distraction
> from "the goal".  
> 
> My simple perspective is that with a little reframing we can get all of
> the benefits of the POI model while leaving the door open for really
> innovative new concepts that are only just now starting to be explored.
> 
> But I would re-iterate my point:
> 
>         "AR is digitally mediated sensory perception"
> 
> And NOT just:
> 
>         "A geolocated or camera based POI system"
> 
> 
> Looking forward to hearing people's responses 8)
> 
> 
> roBman
> 
> 
> 
> 

-- 


Phil Archer
W3C Mobile Web Initiative
http://www.w3.org/Mobile

http://philarcher.org
@philarcher1
Received on Wednesday, 4 August 2010 08:34:19 UTC