Re: Fwd: Splitting vs. Interpreting from David Booth on 2009-06-18 (www-tag@w3.org from June 2009)

From: David Booth <david@dbooth.org>
Date: Thu, 18 Jun 2009 02:45:01 -0400
To: "Sean B. Palmer" <sean@miscoranda.com>
Cc: www-tag@w3.org
Message-Id: <1245307501.6894.2810.camel@dbooth-laptop>
Hi Sean,

Thanks for your comments and observations.  Responses are below.

On Tue, 2009-06-16 at 20:24 +0100, Sean B. Palmer wrote:
> Sorry, looks like I had an old address on file for you.
> 
> ---------- Forwarded message ----------
> From: Sean B. Palmer <sean@miscoranda.com>
> Date: Tue, Jun 16, 2009 at 8:23 PM
> Subject: Splitting vs. Interpreting
> To: David Booth <dbooth@hp.com>
> Cc: www-tag@w3.org
> 
> You write about ambiguous and specific references here:
> 
> http://dbooth.org/2007/splitting/
> 
> When I worked on EARL in 2002, we had to solve httpRange-14, and we
> did it in a practical way which your splitting document reminds me of.
> 
> We might want to evaluate a tool of some kind in EARL, say the W3C
> Validator. But then we didn't know whether validator.w3.org was the
> tool itself or a page about the tool. That's httpRange-14 in a
> nutshell, before it was “solved” with the 303 hack.
> 
> So what we did was this:
> 
> <http://validator.w3.org/> earl:tool _:Validator .
> 
> The clever bit is that the earl:tool property says: if the subject is
> a Document (i.e. an IR), then the object is the Tool described by that
> document; whereas if the subject is a Tool, then the object is simply
> the same thing as the subject.
> 
> And as you can imagine, this is extensible to interpreting ambiguous
> resources in all kinds of ways. Now the TAG finding says that it's
> removed a certain level of ambiguity, but there are other ambiguities
> one might want to resolve when a page 303s and then still doesn't
> define carefully what's at the end of it. So the EARL method is much
> more practical.

Yes, that's one way to side-step the ambiguity issue.  But I want to
point out that it is actually *avoiding* the ambiguity issue rather than
addressing it.  In your example above, there is no tool/document
ambiguity.  The reason is that the same URI is never used to *directly*
denote both a tool and a web page, even though the domain of earl:tool
is the union of Tools and Documnets.  If a particular URI denotes a
Document, then it always denotes a document, but the earl:tool property
uses that document to indirectly identify a tool.

> 
> You might also want to think a bit harder about statements such as
> “there is no architectural need for Person and IR to be considered
> disjoint”. 

It occurs to me that I may not have been clear that I was referring to
web architecture as a whole -- not to the architecture of a specific
application.  Some specific applications may indeed need to treat Person
as disjoint from IR; others may not.  And for web architecture as a
whole, there is no architectural need to take a position about whether
or not they are disjoint.

> Consider if you were using Facebook and it started
> conflating people with groups and games and so on. But of course
> people break the rules of the web until they matter, and since there's
> no Semantic Web User Agent this rule doesn't matter.

The key observation behind that statement is that ambiguity depends on
application: what is clear enough to one application may be ambiguous to
a different application that needs finer distinctions.  The choice of
how finely to define a resource is an engineering trade-off.  There is a
cost involved in defining a resource very finely.  Actually, there are
two kinds of cost involved.  One is the extra burden of writing and
processing the additional constraints required to define it very finely.
But the other is that the more finely a resource is defined -- the more
constraints that are imposed on it -- the less flexible it is for reuse.
On the other hand, if the definition is too loose then it may be too
ambiguous for many applications.  For example, if Facebook needs to
distinguish between people and games then it obviously won't work very
well to have a URI that conflates the two.  But this is not an issue.
It is a matter of one application making engineering choices that are
not appropriate for another application.

One may politely ask: "please distinguish between resources and tools,
because *my* application and many others need this distinction".  But
the fact is that there are also many applications that do *not* need
this distinction.  So although one might politely ask, I don't think we
can blame those who decline such a request.

> 
> I'm not saying that the TAG finding should be canned because you can
> use the kind of interpretation properties that I've described as a way
> around it. The point is rendered moot by various architectural
> problems. But you ought to compare the 2002 and 2009 architectural
> solutions carefully.

I'm not suggesting that the httpRange-14 decision should be canned
either.  I actually *agree* with it.  The flaw that I think should be
fixed is the definition of "information resource" (IR) in the AWWW:
http://www.w3.org/TR/webarch/#id-resources
"all of their essential characteristics can be conveyed in a message".
There are two aspects of that definition that I think are flawed: 

1. The text (later on) suggests that the class of IRs is disjoint with
the class of people: "cars and dogs . . . are not information
resources".  As I explained above, there is no architectural need for
this constraint.  Furthermore, trying to define the boundary of exactly
what is and what is *not* an IR is problematic, as the many discussions
around this question have illustrated. For these reasons I think this
constraint should be eliminated.

2. When considering IRs that are based on CGI scripts, I don't think
"all of their essential characteristics can be conveyed in a message" is
quite right criterion.  What matters architecturally is that IRs can
have "representations" (in the AWWW sense).  In HTTP, this means that an
IR can emit a 200 response.  That's what matters architecturally.  And
the reason it matters architecturally is that there are a number of
specifications and protocols that suddenly come into play if a resource
can have "representations": mime types, character encodings, content
negotiation, etc.


-- 
David Booth, Ph.D.
Cleveland Clinic (contractor)

Opinions expressed herein are those of the author and do not necessarily
reflect those of Cleveland Clinic.
Received on Thursday, 18 June 2009 06:45:35 UTC