Being Clear [was: Re: Namespaces Without "#"] from Sean B. Palmer on 2001-12-01 (uri@w3.org from December 2001)

From: Sean B. Palmer <sean@mysterylights.com>
Date: Sat, 1 Dec 2001 22:39:04 -0000
To: "Roy T. Fielding" <fielding@ebuilt.com>
Cc: <www-rdf-interest@w3.org>, <uri@w3.org>
Message-ID: <014801c17ab8$fe409120$3b540150@localhost>
> What you are looking for is a header field that defines the
> relationship between representation and resource in a manner that is
> simple to understand and relatively standard.  You should be able to
> do that with a typed Link to a standard resource that represents
> that relationship, and for which some form of RDF could be a
> reasonable representation of that relationship.

This is a very good idea, especially in that it could solve a handful
of problems at once. It would be possible to define a new header file
that links to a profile syntax, using some canonical form of RDF
(perhaps NTriples).

So we could define something like:-

   ResChar = "Resource-Characteristics" ": " URI

That harks back to the URC days a bit. The URI production above would
be a URI that denotes a resource whose resource characteristics are to
some extent known: a single time-invariant associated representation,
in NTriples format. The alternative would be to actually put the
NTriples in the headers itself :-)

It's not as if this idea doesn't get raised every so often; for
example, Sandro mentioned it to myself, Mark Nottingham, and TimBL:-

[[[
02:27:02 <sandro>         X-Formal-Language-URI:
http://www.w3.org/2001/10/x
[...]
03:16:14 <timbl-lap> Why not boostrap RDF metadat with just
03:16:20 <sbp> is it going to be something that's explicitly
retrivable?
03:16:40 <timbl-lap> RDF-prop:  <http://www.w3.org/2001/FLD>  foo.bfg
03:16:56 <sandro> I'm thinking in terms of retreivability at the
moment, but there's probably a place for 3rd-party information, too.
03:16:57 <mnot> HTTP Headers are problematic; there aren't any good
UIs in Web servers for associating metadata with resources, and often
the authors don't have administrative control
03:17:13 <timbl-lap> (wonder what happens when th econtent-type and
fld don't match - security hole?)
03:17:17 <sbp> ooh, RDF in the headers. Just like CC/PP
03:18:06 <timbl-lap> N3: <http://www.w3.org/2001/FLD>  <foo.bfg>;
<bar.bfg>.
[...]
03:19:45 <timbl-lap> An RDF mapping of HTTP and SMTP headers is well
overdue.  DanC has of course written a bunch of larch about it [n]
]]] - http://ilrt.org/discovery/chatlogs/rdfig/2001-10-25.txt

Sandro's paper [1] is very relevant, and discusses numerous methods
for formally identifying a language, including HTTP headers. This has
also been discussed in the REST dissertation:-

[[[
Data Element | Modern Web Examples
resource | the intended conceptual target of a hypertext reference
resource identifier | URL, URN
representation | HTML document, JPEG image
representation metadata | media type, last-modified time
resource metadata | source link, alternates, vary
control data | if-modified-since, cache-control
]]] -
http://www1.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.ht
m
 Table 5-1

Coming up with a decent metadata framework for resource and
representation metadata (further to that which HTTP already gives us)
is a good idea; the difference will come when someone actually does
it, and so I'd like to discuss the requirements and deployment
infrastructure, w.r.t. all of the recent URI/Resource/MIME related
discussions.

An ResChar document will be something that states the characteristics
of a resource (independent of the representations), and also of the
relationship between the representation (the entities) and the
resource. It will have to encompass tricky things such as content and
language negotiation, and have a consistent view of Web architecture.

One thing that it can help to address is how fragment identifiers are
defined across the representations. At the moment, according to the
URI RFC, the meaning of a fragment identifier is dependent upon the
MIME type of the representation returned. Some people have taken it
upon themselve to claim that as the semantics of fragment identifiers
can only be found on derferencing, that they are somehow inconsistent,
and broken. IMO, nothing could be further from the truth. Just because
the URI RFC delegates the definition to other specifications, it
doesn't mean that these definitions are going to be inconsistent. The
only inconsistency comes from when you (for example) have the same
fragment IDs meaning different things depending upon content
negociation. In other words, when you serve some HTML with an ID
declared, or some RDF with the same ID declared, from the same URI,
then you're breaking that URI just as much as you would be if you sold
your domain to another company and let them change the page in its
entirety.

I'm claiming that URI references are bound to whatever the
representation under jurisdiction of its content type *says* that they
are bound to. URI references can denote anything, they should be used
consistently, and they are bound to things that are defined by the
resource. So I'm going with a liberal interpretation of both Roy and
Tim's views; it's the only way out of the "hash vs. slash" mess, as
far as I'm concerned. I'm way past caring about that issue, I'm just
going to decide namespace use by the flip of a coin. But when I have
decided, I want to be able to assert, in the metadata attached to any
representations of the resource that my server sends back, just what
the resource either is (if I choose slash), and/or what the type of
the resource ranges through (if I choose hash).

A problem emerges from not being able to specify how IDs declared
within different MIME types relate to one another. For example, in any
serialization of RDF, when you point to a FragID within that space,
you can use it to identify anything; any resource. The RDF
specification should say so, the RDF MIME typs should say so.
Everybody recognizes that fact. But the problem is that you can't say
how consistent the IDs are... unless, for example, you come up with a
new +rdf suffix for MIME types. That is inconvenient.

We should be able to be very specific when talking about content
types: when we come up with a new serialization of RDF, it should be a
simple matter of coming up with an identifier for the syntax (a new
identifier), and then reusing a common W3C chosen URI for the RDF
model, such that you end up with:-

[ :syntax <http://example.org/#someSerialization>;
   :model <http://www.w3.org/2001/12/RDFModel> ] .

So, in our ResChar file (O.K., I'm using NTriples with prefixes...),
we get:-

@prefix r: <http://example.org/resChar/> .
<> r:contentType _:x .
_:x r:syntax <http://example.org/#someSerialization> .
_:x r:model <http://www.w3.org/2001/12/RDFModel> .
_:x r:tree <http://iana.org/media-types/application> .

Delegating on the responsibility of defining aspects of a content type
becomes interesting when you get to XML... Here's an interesting quote
on the subject:-

[[[
14:54:07 <tim-lurk> Sandro, the URI spec is the one which defines the
relationship between a  URI and its meaning.
14:54:28 <tim-lurk> For URIs starting with "http:", it hands off to
the HTTP spec.
14:54:50 <tim-lurk> The HTTP spec allows format negotaiation, and then
hands off to the MIME type registry.
14:55:10 <tim-lurk> The MIME type registry fopr application/xlm hands
of to the XML spec.
14:55:28 <tim-lurk> The XML spec hands off to the namespace URI.
14:55:35 <tim-lurk> Goto 1
]]] - http://ilrt.org/discovery/chatlogs/rdfig/2001-10-24.txt

If namespaces are indeed handled in this way, then it gives us an
extra headache. Quesions such as "what does it mean to embed some RDF
in an XHTML document" are rife, confounded by the fact that XHTML is
sometimes seen as being servable as text/html, text/xml, and
application/xhtml+xml. Do we need application/xhtml+xml+rdf? :-)

The boundaries between content types are not as clear cut as they used
to be. A content type refers to both the syntax and the semantics of
the document, and these are difficult things to pin together. IMO,
content types are starting to lose their edge, and should be replaced
by URIs, so that we can state the relationships between different
types of content, and define new content types, more easily. This,
once again, appears to be an opinion shared by a number of people.

Of course, we end up going in a full circle, from the representation
characteristics, through to how the respresentation characteristics
link to the resource characteristics, and on to the resource
characteristics themselves.

So, another requirement of ResChar is that it should let us say in
more direct terms what a URI denotes. TimBL has been stating that
unless a change is made to HTTP, an HTTP URI necessarily identifies a
"document", or a "generic document" or (in the PIM Doc namespace) a
"work". He claims that it is useful for the Semantic Web to be founded
on documents in this fashion. While I vehemently agree with the latter
statement, the former statement had had some rather bogus
conclusions:-

[[[
A client which understands the http: protocol can immediately conclude
that the fragementid-less URI is a generic document. This is true even
if the publisher (owner of the DNS name) has decided not to run a
server.
]]] - http://www.w3.org/DesignIssues/Fragment

Well, even if HTTP did necessarily identify generic documents,
acknowledging that a change to HTTP could be made means that the above
statement can not be taken as a fact. I don't think that it is a good
idea to base the Semantic Web on such quaky assumptions, but I do
think that it's a good idea to make sure that what is identified is
clear. By asserting that all HTTP URI identified resources are
"documents", you get a kind of implicit security... but it's not a
good approach because it's not true, and so ResChar may be able to
fill the gap.

So, what kinds of thigns do we want to be able to say using ResChar?
There is a problem in the fact that it's difficult to taxonomize the
type of things that we want to identify such that it will be of use to
anyone dereferencing the URI and getting a representation back. Let's
take Aaron's URI http://logicerror.com/myWeavingTheWeb as a good
example. This URI, according to Aaron, denotes Aaron's copy of the
book "Weaving The Web". So, the thing identified by that URI is a book
(I'll use ":myWTW" as an abbreviation for Aaron's URI):-

   :myWTW a :PhysicalBook .

Of course, the representation that you get back is certainly not his
copy of Weaving The Web. I think that there are a few different things
that people might want to identify associated (for want of a better
word) with that URI:-

* The resource itself: this is the only thing that the URI denotes,
and it is Aaron's copy of Weaving The Web (according to Aaron)
* The representation on a certain date; a certain representation
depending upon content or language negotiation
* The set of entities that correspond to the resource over a set
period of time

I think that the last one on the list above is synonymous to the thing
that TimBL calls a "document", it is concept of the the set of things
that Aaron will publish at that address over time. He may only ever
publish that one page, or he may publish a set of things. In any case,
one of the main things that he is asserting is that his resource is
not a "work", so:-

   :PhysicalBook daml:disjointWith doc:Work .

Many of the relationships that we want to state are given in a message
from Daniel LaLiberte:-

   http://lists.w3.org/Archives/Public/uri/2000Sep/0020

Some of these relationships are a bit suspect, since they only apply
to "works", but not to abstact concepts (how can "love" include a work
of art? type mismatch), so it's clear that for some, the domain with
be doc:Work, and for others, rdfs:Resource. ResChar should allow us to
make the distinction, but once again, it will mean providing a strict
taxonomy for "generic documents".

ResChar is useful in letting people know what is identified, but they
might also be able to provide separate URIs for representations, or
entity threads. For example, if I want to talk about Aaron's set of
documents that say "this representation corresponds to the resource
that is my copy of Weaving The Web", then all you need is something in
the header to the effect:-

   <> :entitySet :SomeURI .

This is different to the "Content-Location" or other entity type
headers, in that we're still pointing some some non-content, the
abstract work denoted by a set of entities. Note that Aaron defined
the taxonomy thus:-

[[[
Resource (Thing) --> Representation (Object) --> Entity (State) -->
Content (Serialization)
Example: Sean B. Palmer --> A Homepage of Sean B. Palmer --> A
Homepage of Sean B. Palmer on 2001-11-25T00:00Z. --> A Homepage of
Sean B. Palmer on 2001-11-25T00:00Z in XHTML and English
]]] - from http://ilrt.org/discovery/chatlogs/rdfig/2001-11-25.txt

I think that some of the terms used therein are inconsistent with the
terminology use in the URI RFC, and that in fact the following is
true:-

Resource (abstract): Sean B. Palmer
Resource (work): A Homepage of Sean B. Palmer
Entity: A Homepage of Sean B. Palmer on 2001-11-25T00:00Z.
Representation: A Homepage of Sean B. Palmer on 2001-11-25T00:00Z in
XHTML and English

The author has to choose between the kinds of resource that the page
denotes. Another example is Mark's URI for "brilliance": the resource
denoted by that URI is (according to Mark) the concept of brilliance,
but many people would misinterpret it as being the work that is the
set of entities over a period of time, i.e. a "document describing
brilliance".

Intuitively, this is quite an easy thing to grasp, but on a
specification-strict level, it's going to be very difficult to encode.
Still, I hope that we can clear up the RDF associated identification
issues.

Cheers,

[1] http://www.w3.org/2001/06/blindfold/langIdent

--
Kindest Regards,
Sean B. Palmer
@prefix : <http://webns.net/roughterms/> .
:Sean :hasHomepage <http://purl.org/net/sbp/> .
Received on Saturday, 1 December 2001 17:39:13 UTC