RE: on documents and terms [was: RE: [WNET] new proposal WN URIs and related issues] from Mark Birbeck on 2006-05-04 (public-swbp-wg@w3.org from May 2006)

From: Mark Birbeck <mark.birbeck@x-port.net>
Date: Thu, 4 May 2006 10:52:54 +0100
To: <public-swbp-wg@w3.org>
Message-ID: <06e101c66f60$83e40460$0e01a8c0@Jan>
David,

> But that is the crucial difference!  Sure, a *single* weather 
> report can
> be conveyed in a message.   But 
> http://weather.example.com/oaxaca is not
> merely identifying a *single* weather report issued at 2005-03-12
> 23:11:36.236 UTC or any other particular time.  It identifies a
> *function* from time to weather reports.  I don't know any 
> way to transmit "all of [the] essential characteristics"[10] 
> of that particular function in a message or even a finite set 
> of messages.

But a weather report and a web page are two different resources. Hence they
need two different URIs. As I tried to show in my recent post and blog, it's
tempting to conflate the two, but if you do that you can never make
statements about the web page, you can only ever talk about 'the weather'.
(Of course...we British do that anyway...)

The thing I find interesting about this is that at first sight it seems like
it's no big deal to use the same URI for both the document *about* the
weather, and the weather itself. But when you work it through it actually
quickly falls apart, leading, in my opinion, to the unavoidable conclusion
that the TAG approach is most definitely the correct one.

The reason it falls apart is that any processor reading the metadata in a
web-page would have no way of knowing whether the web-page represents itself
or some other resource, and therefore what to apply the metadata to. For
example, if we put geo:region="London" in the head of the document, do we
mean that the web-page is published in London, or that the weather report
that the page contains is *for* London? (Or even that the company that is
producing the weather report is based in London?) Similarly, should the
dc:publisher value be "BBC" or "God"? ;)

Sure, you can probably guess in the case of the weather report which bit of
metadata applies to the document and which to the weather, but when you do
that you are layering 'implied knowledge' onto something that in reality you
are not able to infer from the raw data. If you took this approach you would
also need to have the same kind of 'implied knowledge' not just for the
weather, but for cars, holidays, flights, planets, chemical compounds,
people, conferences...you get the point. (Using 'implied knowledge' is the
GRDDL/microformats approach where you need to know what you are going to
process before you process it, but as should seem pretty obvious...it just
doesn't scale, and means we can pretty much forget about the semantic web!)

Note by the way that you don't get this problem with RDF/XML, because an
RDF/XML document never represents itself. So if you place an RDF/XML
document at http://a/b/c then all the statements in that document are
'about' the resource called 'http://a/b/c'; it's as if there were no
web-page at that location. (Actually, you get the opposite problem with
RDF/XML to the web-page one...you can't actually say anything about the RDF
document itself, such as who wrote it or when it was updated, since it is a
phantom!)

The web-page v. the weather problem only actually arises when you want the
web-page to also carry metadata; i.e., you want the web-page to tell you
both where it was published, *and* where the weather report applies to. For
that you need to ruthlessly keep the two URIs apart (the one for the
web-page and the one for the weather). This is a growing practice, but the
only technique that allows you to do this without using 'implied knowledge'
(i.e., the GRDDL/microformats approach) or by having two separate documents
(the RDF/XML approach) is RDFa.

NOTE: For ease of explanation I've avoided using the term 'information
resource' in favour of a more specific resource type--'web-page'. I totally
agree with the term 'information resource', but I think the key concepts can
be understood just by looking narrowly at web-pages. I feel this is
important because in my opinion, none of the recent discussion has got to
the root of the issue, and without this, talk of trying to 'clarify' what
'information resource' means leads to a lot of confusion. Issues like
whether the *content* of the resource change over time, etc., are as far as
I can see, irrelevant to the issue. However, the term *is* used and
discussed in the blog and tutorial that I posted references to recently.

Regards,

Mark


Mark Birbeck
CEO
x-port.net Ltd.

e: Mark.Birbeck@x-port.net
t: +44 (0) 20 7689 9232
b: http://internet-apps.blogspot.com/
w: http://www.formsPlayer.com/

Download our XForms processor from
http://www.formsPlayer.com/
Received on Thursday, 4 May 2006 09:54:12 UTC