Re: TAG minutes 2005-05-31 for review [fragmentInXML-28, SchemeProtocols-49, httpRange-14] from Patrick Stickler on 2005-06-01 (www-tag@w3.org from June 2005)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Wed, 1 Jun 2005 14:13:07 +0300
To: W3C TAG <www-tag@w3.org>
Message-Id: <2c629aa3191eab3c1a562a39622d5e69@nokia.com>
Hi folks. Just wanted to offer some comments
regarding the minutes recorded below...

Regards,

Patrick


On May 31, 2005, at 23:33, ext Dan Connolly wrote:

>
> TAG Weekly 31 May 2005
>
> ...
> httpRange-14
>
>    <Zakim> DanC, you wanted to swap in unanswered mail from HT
>
>    [24]making progress on httpRange-14 -- yet another suggestion
>
>      [24] http://lists.w3.org/Archives/Public/www-tag/2005May/0010.html
>
>    <ht> DanC: dc:title is the URI that's mentioned in the SWBPG message
>    to us
>
>    <ht> It's a hashless URI for a non-information-resource, i.e. an RDF
>    property
>
>    <ht> But you don't get a 200 if you try to retrieve it.

I don't see that this has any direct bearing on the hash/slash issue.

E.g. the URI
http://sw.nokia.com/id/4f7b6805-47d7-4914-885c-6ef2b487adf6/ 
Series_60_Platform_Designing_XHTML_MP_Content_v1_4_en.pdf
identifies an information resource, yet an HTTP GET results in
a redirection, not a 200 OK response. Surely our use of redirection
here is in no way inappropriate.

In like manner, the URI http://sw.nokia.com/FN-1/published
identifies a property, and an HTTP GET results in a
redirection, not a 200 OK response. Thus, the behavior
is consistent regardless as to whether the resource is or
is not an information resource.

In both cases, we could have just as well simply returned
representations directly; but for various reasons decided
that redirection offered more utility and flexibility (I
won't go into the details as they are not relevant to this
particular discussion).

IMO, a 200 response to an HTTP GET request simply means
"Here you go. Done." whereas a redirection resoponse means
"Can't directly provide a representation, but you can likely
get one via this other URI. Not done yet."

But neither response has anything to do whatsoever with the nature of
the resource identified by the URI. They have to do with
accessing representations of the resource identified by
the URI, regardless of the nature of that resource.

>
>    <ht> you get a redirect. . . They're evidently sensitive to claiming
>    dc:title has representations.

???  I don't think that's the case. You may wish to ask
the DC community about that.

> So a hashless URI is more trouble when
>    it comes to publishing in that way.

Again. ???

We have found that redirection is a very useful and powerful
mechanism for a number of reasons, as have many others.

Is all PURL-like technology then "trouble"? How? Evidence?

> If they didn't set up a redirect,
>    a 200 from a hashless URI is a claim that the web page is identical  
> to
>    the RDF property,

No. This is not correct.

In such a case, the hashless URI would identify *either*
the property *or* the web page -- and if the hashless URI
identifies the property, as it does the case of dc:title, and
a 200 response was actually provided, returning e.g. an HTML
instance as a representation, then that would be a representation
of the property, and in no way does returning an HTML instance
as a representation of that property obfuscate the meaning of
that URI or equate the resource identified with the representation
returned. It would not equate the property with the web page.

Likewise, neither does a redirection response to an HTTP GET
request for such a URI obfuscate the meaning of the URI or
equate the resource identified with whatever resource is
identified by the redirection URI.

The HTTP response codes have nothing to do with the nature
of the resource identified by a given URI. They concern the
accessibility of representations of those resources. One
cannot directly access resources via HTTP (even if a given
system might change the state of a given resource based on
the interaction with representations, it's still not direct
access) and responses from an HTTP server have nothing to do
with whether an http: URI can or cannot identify a particular
type of resource, but whether representations of some
resource are accessible and how.

HTTP should be resource agnostic.

All HTTP should have to deal with are representations.

On the OFWeb, it is humans utilizing HTTP as a means of communcation
who are concerned with which resource is identified by a given URI.
Humans decide which resources particular URIs identify and humans
provide for access to representations of those resources via those
URIs, but HTTP itself need not care about the nature of the actual
URIs, only about the accessibility of representations via
those URIs, and thus every HTTP response code should be considered
to be entirely irrelevant to this debate.

This issue is about whether the range of resources identified
by http: URIs should be constrained or unconstrained, and if
constrained, being the more complex option, there should be
overwhelming justification for imposing such additional
complexity on the system.

I still see no concrete arguments based on real impact to
real applications being put forth to support imposing a
constraint on the range of http: URIs.

> which causes trouble for some consumers.

What trouble? I'm really puzzled how you are coming to these  
conclusions.
Can you provide any pointers to reports of the trouble you refer to? If
I've missed some key point of failure for either hashless URIs  
identifying
properties, or PURL-like URIs redirecting to representations, I  
certainly
want to know about it.

More importantly, is what trouble either approach might
cause for applications, which is a far more tangible and
measurable affect than the subjective impressions and
interpretations of human users. If the applications are
functioning properly, we can address a great deal of any
subsequent human confusion through education and improved
interfaces. Let's focus on the applications, and then
it will be easier to sort out the usability and perception
issues.

>
>    <ht> DanC: When asked how to choose/publish RDF properties, I say --
>    pick a part of webspace, divide it up, slap a hash on the end,  
> that's
>    your name, then put something useful at the URI w/o the #

Dan, the issue is not about what your personal preference or practice
is, or what you recommend others to do, but about whether a common  
practice
(which appears not to coincide with your personal preference) is in any  
way
architecturally unsound or detrimental to the industry.

I continue to see arguments in favor of hash and against slash that
are nothing more than preferance, with no hard, testable evidence
that the alternative view is in any way detrimental to the functioning
of the web -- yet hard, testable evidence *has* been put forward
regarding non-trival scalability issues with the hash approach.

Can we please move away from talking about philosophy and preference
and focus on the real world impact to real applications?

>
>    <ht> NM: [missed the question]
>
>    <ht> DanC: leads to confusion about e.g. 'author' assertions about
>    that property vs. 'author' assertions about the document describing  
> it

No more so than confusion will arise from any URI used in communcation
between users where it is unclear to any of those users which resource
is actually identified by that URI -- and this includes ambiguity  
between
multiple *information* resources.

>
>    <ht> NM: Indeed my concern was about 200 codes
>
>    NM: so far we've talked about dividing between InformationResources
>    and others...
>    ... so if I get a 200 response for /noah , that seems kinda fishy,
>    since I didn't really contact Noah, but rather a proxy for [or
>    description of] Noah.

If you have a hashless http: URI that identifies Noah, and you send
an HTTP GET request to the web authority of that URI, and get a 200
response, you have not "contacted Noah" (directly) insofar as the
HTTP interchange is concerned. All that 200 OK response means is that
the web authority was able to successfully provide you with a
representation of the resource identified by
the URI of the GET request, i.e. a representation of Noah.

Now, that representation may very well satisfy a particular need you
have regarding contacting Noah. Great. The web has provided value.
But insofar as HTTP and that http: URI is concerned, all that has
happened is that you have accessed a representation of some resource
identified by that http: URI (whatever that resource might be).

Again, all HTTP need concern itself with are representations. Nothing
about the nature of any resource identified by an http: URI need have
any affect whatsoever on the proper functioning of HTTP, or the
unambiguous interpretation of HTTP responses, all of which concern
access to representations, not resources.

>    ... [missed some...] but consider { ?SOMETHING measures:wieghtInLbs
>    200 } ...
>
>    <Zakim> ht, you wanted to ask what you _get_ with your 200
>
>    NM: consider an actual computer...
>    ... that responds to HTTP GETs about itself
>    ... in the case of a computer, though it's clearly not an
>    InformationResource, the 200 OK response doesn't seem to introduce
>    ambiguity
>
>    <ht> 200 for dc:title amounts to identifying the property with the
>    page, which is a realistic confusion

I'm sorry, but this is simply false. You are confusing the resource
identified by a given URI with the representation accessible via
that URI. These have always been distinct. And hash-vs-slash does
not in any way change that.

E.g. if you have a hashless http: URI which identifies a poem (an
information resource) and you send an HTTP GET request and get
back a "web page" (another information resource) as a
representation of that poem, that does not introduce any
ambiguity between the poem and the web page (insofar as the
HTTP specifications are concerned -- though if the user is
operating under the false presumption that URIs identify what
they GET, then, well, that's just "operator error"). The
URI in question identifies the poem. The web page is a representation
of the poem. If you want to talk about the web page specifically,
it needs its own distinct URI (or at least a distinct node in
an RDF graph, albeit a blank node) as the web page is a distinct
resource from the poem.

But insofar as the HTTP GET request is concerned, there is
no architectural ambiguity. You have asked for a representation
of the resource identified by a URI, and you got it. If you
equate what you got with what is identified by the URI, that's
your error, not a fault with the web architecture.

And the nature of the resource identified by that http: URI
in no way matters in such a case.

>
>    <ht> [that was DanC]
>
>    <ht> DanC: 200 for computer is not confusing, because everything  
> true
>    about the computer is true about [what]????
>
>    <Zakim> timbl, you wanted to say that a computer is not an  
> information
>    resource, 200 would be innapropriate.
>
>    TBL: to me, it's quite clear: the computer is not an information
>    resource,

Fair enough. I agree. A computer is not an information resource.

> and hence a hashless http URI for it, and a 200 OK response,
>    is inappropriate.

Firstly, as to a hashless http URI being inappropriate for
identifying that computer, you are here simply re-stating
your opinion, without providing any arguments. Exactly *why*
should one conclude that a hashless URI is inappropriate to
identify a computer?

Secondly, as argued above, the 200 OK response has no relevance
whatsoever to the nature of the resource identified by any
http: URI. It is neither appropriate or inappropriate per
the nature of the resource identified because the nature
of the resource will have no affect on the established usage
and usability of the web, as the access of representations
does not depend on the nature of the resource in question.

--

I see a problem with this entire discussion, which is namely
that the issue is not being discussed in a sufficiently
objective manner, rather, various conclusions are being
reached which presuppose a particular answer to the key
question -- rather than arguments being presented which
help decide that question.

If a 200 OK response is "inappropriate", state *how* in terms
of what negative impact it has to *applications*, identifying
specifically which concrete problems arise from such a case,
and how excluding that behavior or practice solves those
problems.

If it truly is "inappropriate" for a hashless http: URI
to identify a computer, then demonstration *how* some
application breaks when it encounters such a URI or
how some process or solution loses value or integrity
in light of such URIs.

This debate has again fallen into the quagmire of philosophy and
personal preference and even though it is circling around
apparently concrete and real machinery such as HTTP response
codes, nothing is being said about actual impact to existing
or future applications -- only about whether one interpretation
is offensive/uncomfortable as opposed to providing a nice
warm fuzzy feeling.

If someone feels that a 200 response to an HTTP GET request
is "inappropriate" if the request URI identifies a property,
then state what the actual, real, significant impact is to
*applications* (not users' perceptions/interpretations), and
preferably, provide links to real, deployed applications and/or
studies which substantiate those concerns.

Otherwise, it's just more spinning wheels in the mud, and I'm
sure I'm not the only one who sincerely wishes to see the
TAG reach closure on this issue and move on to other things.

>
>    NM: ok, so this conversation confirms that there are a couple ways  
> to
>    look at this which are each internally consistent...

Yes, so now can we have discussion about the real impact to
real applications; as well as practical considerations such
as established practices embodied in deployed solutions and
the impact to the web and sw communities and W3C constituency
by choosing one option over the other?

>
>    <ht> Towers of abstraction are a long-standing problem for
>    AI/Knowledge Representation
>
>    where HT wrote "not confusing" I meant to say "not formally
>    contradictory". I do think it's confusing.
>
>    [missed some...]
>
>    <ht> Right, Roy favours the "far context" approach to  
> disambiguation,
>    i.e. information about the RDF property of the triple in which the  
> URI
>    appears
>
>    NM: what about documents about documents?
>
>    TimBL: sure... <a> and <b>. <a#foo> might denote <b>.
>
>    resuming with [25]HT's msg
>
>      [25] http://lists.w3.org/Archives/Public/www-tag/2005May/0010.html
>
>    <ht> "far context" is from [26]my initial message
>
>      [26] http://lists.w3.org/Archives/Public/www-tag/2005Apr/0086.html
>
>    DC: as to "OK -- why do we need or want to maintain that notion of
>    identity across the SemWeb/OFWeb boundary?" I think webarch speaks  
> to
>    the value of a global space. I'm somewhat conflicted about this; I
>    wonder if the principle has limitations.
>
>    TBL: [missed]
>
>    NM: this is an easy one for me, the traditional
>    Metcalf/economy-of-scale arguments convince me.
>
>    <Zakim> ht, you wanted to ask about the history
>
>    HT: in some histories of RDF, RDF statements were metadata, i.e.  
> data
>    about documents.
>    ... nowadays, that's less emphasized, and RDF statements are more
>    about things in the world... biotech and such...
>    ... in the "RDF is for metadata" world, yes, it's nutso not to take
>    the identifier spaces the same...
>
>    <Vincent> MarkN is Dave
>
>    <timbl> TBL: We have written about the importance of an unambiguous
>    identifier throughout the OFWweb, and the semantic web depends in it
>    throughout the SemWeb. We could, yes, have an architecture in which
>    the two were separated: the same URI string would identifying
>    different things as a OFURI and as a SWURI. That would mean putting  
> a
>    membrane between the two worlds, never mixing them. [I think this
>    would be a major drawback and very expensive]

I personally think that if the OFweb and SemWeb do not share the
same set of identifiers, it will catastrophic. And I also see
no reason why such a partitioning would be necessary.

1. URIs identify resources.

2. The OFWeb is concerned with access to representations of resources.
    The true identity and nature of the resource identified by a given
    URI is not relevant to the functionality of the OFWeb.

3. The SemWeb is concerned with statements about resources. Whether
    any representation of the resource identified by a given URI is
    accessible via that URI is not relevant to the functionality
    of the SemWeb.

One may combine the functionalities of both the OFWeb and the
SemWeb to access and consume information in a globally
distributed manner, and the shared set of URIs with (presumably)
consistent meaning is the key to that integration between the OFWeb
and SemWeb for such applications.

It's only when one tries to constrain the nature of resources
which can have directly accessible representations on the OFWeb
(a needless constraint seemingly motivated solely by personal
philosophy and preference, not by any technical basis) that we
see any tension in the integration of the OFWeb and SemWeb, which
otherwise would and should be a perfectly clean and resource agnostic
integration based on a shared set of identifiers with generic,
resource agnostic machinery provided at either layer which is
optimized for the goals of each layer.

If a given http: URI identifies an information resource, fine,
just say so on the SemWeb layer, but don't make the OFWeb layer
more complicated than it has to be and thereby disrupt the
otherwise clean, generic functional division between the OFWeb
and SemWeb.

>
>    HT: but it's less obvious when you get to lifesci etc.
>    ... have I got the history right?
>
>    TBL: in a sense; to me, RDF was always a generic thing, but the
>    initial motivation and funding was metadata. So yes, the "center of
>    gravity" has shifted.
>
>    <ht> Thanks, that helps
>
>    <noah> From AWWW:
>
>      Software developers should expect that sharing URIs across
>      applications will be useful, even if that utility is not initially
>      evident.
>
>
>     [27]webarch/#identification
>
>      [27] http://www.w3.org/TR/webarch/#identification
>
>    <timbl> But remember that pre RDF, there was MCF and various KR  
> things
>    which were more general KR oriented.
>
>    <noah> I actually believe this.
>
>    <noah> This suggests that SemWeb and OFWeb should share an
>    identification space

Definitely.

--

Patrick Stickler
Senior Architect
Forum Nokia
Hatanpäänkatu 1 A
33900 Tampere Finland

phone:  +358 40 801 9690
fax:    +358 7180 75700
email:  patrick.stickler@nokia.com

Forum Nokia provides a wealth of resources to mobile
developers. For the latest on mobile tools, devices and
technologies, go to http://www.forum.nokia.com
Received on Wednesday, 1 June 2005 11:13:30 UTC