Re: URIs vs. URIviews (was: Agenda for RDFCore WG Telecon 2002-02-15) from Aaron Swartz on 2002-02-15 (w3c-rdfcore-wg@w3.org from February 2002)

From: Aaron Swartz <me@aaronsw.com>
Date: Thu, 14 Feb 2002 23:12:52 -0600
To: Dan Connolly <connolly@w3.org>
CC: RDF Core <w3c-rdfcore-wg@w3.org>
Message-ID: <B891F4F4.20BF5%me@aaronsw.com>
On 2002-02-14 10:29 PM, "Dan Connolly" <connolly@w3.org> wrote:

>>>>  o The WG resolves that the use of absolute URIs with fragment IDs is a
>>>>    to identify Web resources is relatively incompatible with current Web
>>>>    architecture.
>>> 
>>> ?????
>>> 
>>> Er.. it's the very heart of web architecture:
>>> 
>>> The principle that anything, absolutely anything, "on the Web"
>>> should identified distinctly by an otherwise opaque string
>>> of characters (A URI and possibly a fragment identifier) is
>>> core to the universality.
>>> 
>>> -- http://www.w3.org/DesignIssues/Architecture
>> 
>> RFC2396 would agree with you except for the "and possibly a fragment
>> identifier" bit.
> 
> RFC2396 does not say that a URI+fragid does not identify anything.
> RFC2396 is silent on what a URI+fragid identifies.

I never said that it did. Even so, RFC2396 is the latest spec in this field.
Feel free to wake me up when you've gotten a new one thru IETF process.
 
>> Same with Roy Fielding's dissertation (which clearly
>> explains the reasons why this was an explicity design decision)
> I was there, and I don't think it was an explicit design decision;
> it was consensus by exhaustion.

Well, that sort of implies that there were folks who wanted it out (why else
would they remove something that was in the previous version). I'd expect
Roy Fielding was one of those folks, and he seems to believe it was a good
design decision.

> The software for URIs handles fragment identifiers.

Huh? My HTTP client (and every client I know of) does not pass the fragment
to my HTTP server. My server (Apache) throws it away if it's there. This is
some widely deployed software for URIs. This is a real-world problem. I'm
actually running into it:

I want to create an HTTP proxy which I can host. Users who use this as their
main HTTP proxy will see an RDF-annotated view of the Web, by typing URIs
into their browser like normal. However, they'll get back a pretty version
of all RDF assertions using that URI as its subject. Unfortunately, since
RDF allows fragments as subjects there's no way for them to specify a
fragment in their query, since the Web browser doesn't send it along (by
design according to TimBL, see previous message). Thus they're not able to
access all that information without big kludges in my software.
 
>> the many people who have invested in URI syntax, and don't want to go back
>> and fix their HTTP clients, proxies, servers or other software (and maybe
>> hardware) to support this addition of fragment identifiers.
> All the software works with fragment IDs today. It has since 1989
> or so.

Can you clarify this? It seems to disagree strongly with my experience.

>> a) In the REST architectural model (which the TAG seems to be agreeing
>> about) fragment identifiers only make sense within the context of an HTTP
>> response (a bag of bits).
> 
> I disagree: a URI with a fragment makes sense as an identifier
> in the global scope of the web.

I'd be very interested in seeing your explanation of why this is so (i.e. a
refutation of the relevant portions of the REST, TimBL's Web model, and
current implementations).
 
>> They identify parts of a document, not general
>> Resources like full URIs.
> 
> A part of a document is 'something with identity', i.e. a resource.

I'm not disagreeing. All I'm saying is that the Web model makes it pretty
clear what resource these things identify. RDF is trying to say they
identify abstract concepts across all negotiation. The current Web model (as
I understand it) says that they refer to a part of an HTTP response and are
dealt with by the presentation module.
 
>> b) Deployed code doesn't support fragment identifiers as first-class objects
> 
> yes, it does.
> 
>> -- I can't ask an HTTP proxy about them, I can't query an HTTP server about
>> them, etc.
> 
> Why would you expect to be able to? That's not how they work.

Now I'm really confused. What's a first class object?

I expect first-class objects to be things I can ask an HTTP server/proxy
about. (HTTP being one of many possible protocols to get information about
Resources.)
I can't do this with URI-references.
Therefore, I don't think URI-references are first-class objects in the
currently-deployed world.

>> And this is by design...
>> 
>> <MikeM> fragmetns are client side thing.....
>>  - in #rdfig
>> 
>> Exactly! RDF has created this problem by taking what in Web Architecture is
>> designed to be a client-side thing, the last step of resolution. TimBL
>> explained this at the first W3C technical plenary: "[an HTTP client] puts
>> the fragment ID in its pocket".
> 
> Yes, and how is that a problem?

Because we've pulled this client-side, last resolution piece of WebArch
which has always been used to identify portions of a document into center
stage, using it to identify anything we want (the position that was reserved
for real URIs before RDF). That seems almost like abuse.
 
> Can you give a specific example of a failure mode?

Not sure what a failure mode is in this context. Here's an inconsistency in
your model (as I understand it) that doesn't exist in the REST model:

I define http://w3.example.org/timXML2000#1TOII as identifying the "test of
independent invention". The timXML2000 resource can be downloaded in a
number of different formats:

  There's RDFx, which describes the Resource I've created.
  There's RDF/XML where it identifies an XPath nodeset (according to the
current XPointer spec).
  There's an HTML file in which the identifier "1TOII" is illegal, but if it
was legal, it would identify a bunch of the HTML transcript of Tim's speech
where he talked about the TOII.
  There's an audio file, in which it identifies a sound clip from Tim's
XML2000 talk where he describes the TOII.

All of these are different Resources. Giving them the same identifier makes
no sense to me and seems inconsistent.

Even worse, the audio file has fragment identifiers for every possible
subsection of audio. There's no way I can possibly map those identifiers
into the other formats I present things in, they don't really make sense in
all of them, and even if they did, it'd be an gargantuan task. So I really
have no choice but to create a whole ton of URIviews which effectively 404
with HTTP requests that prefer other formats over the audio file.

If you want a real-world story, that's above.
-- 
[ "Aaron Swartz" ; <mailto:me@aaronsw.com> ; <http://www.aaronsw.com/> ]
Received on Friday, 15 February 2002 00:12:59 UTC