RE: Site metadata; my preference from Patrick.Stickler@nokia.com on 2003-02-20 (www-archive@w3.org from February 2003)

From: <Patrick.Stickler@nokia.com>
Date: Thu, 20 Feb 2003 10:27:44 +0200
To: <paul@prescod.net>
Cc: <timbl@w3.org>, <distobj@acm.org>, <www-archive@w3.org>
Message-ID: <A03E60B17132A84F9B4BB5EEDE57957B5FBB33@trebe006.europe.nokia.com>
> A semantic description of a thing _is_ a representation of the thing.

I completely disagree. And I see this view as a major obstacle to
SW agents.

It's one thing to talk about a representation which is machine
readable, such as a meeting event document with representations
both in HTML as well as RDF/XML.

But the *description* of that document is *not* part of any
representation of that document. The size in bytes is not
part of that meeting document and should not be inherent in
any representation of that meeting document.

We *are* talking about METAdata here. And yes, metadata
describing data is also data, and there can be metadata
about the metadata about the metadata, etc.

But the distinction between data and metadata, for a given
resource is clear and must be preserved. Treating metadata
as a representation of the data is simply wrong.

> > It is for *anything*, whether there is a web-accessible 
> representation
> > or not.
> 
> There should always be a web-accessible representation.

Should != Must

> > Binding the SW to mechanisms optimized for accessing representations
> > is simply lopsided.
> 
> "Dictionaries should not go in the library!"

No. You misunderstand me. I'm not at all saying that.

Rather, I'm saying cards from the card catalog do not belong
on the shelves of the library.

If you did not understand that that was what I am saying,
then perhaps you should re-read my posting about books,
dictionaries and cards...

> >>Furthermore, there is a big problem with the MGET proposal. 
> >>It does not 
> >>provide a way for a browser to be told that there is metadata 
> >>available 
> >>with a particular URI. This implies you need a header that says 
> >>"MGET-available: true".
> > 
> > Eh? Why?! 
> > 
> > How does a browser know that a particular representation is 
> available?
> > It asks for it.
> 
> It can. Or the server can announce what representations are 
> available: 
> "With agent-driven negotiation, selection of the best 
> representation for 
> a response is performed by the user agent after receiving an initial 
> response from the origin server. Selection is based on a list of the 
> available representations of the response included within the header 
> fields or entity-body of the initial response, with each 
> representation 
> identified by its own URI. Selection from among the 
> representations may 
> be performed automatically (if the user agent is capable of 
> doing so) or 
> manually by the user selecting from a generated (possibly 
> hypertext) menu"

True, and the server can also announce that a description
is available via MGET.

Again, you continue to levy criticisms against MGET for
problems not inherent in MGET, but in larger processes or
server architecture.

> > And the browser doesn't have to be given some different URI 
> from that
> > of the resource to get the metadata. It just does an MGET 
> on the *same*
> > URI, and if it gets something, great. If not, so what.
> 
> So browsers should go around "testing" for whether metadata is 
> available? Now you're doubling the amount of traffic going over the 
> _original_ Web. I can guarantee you that this is more expensive than 
> doubling the amount of traffic on the semantic web (which is not the 
> case, regardless)

Eh? What? There is no greater waste for either case. If one does
a HEAD to get the URI denoting the description, and there is no
URI specified, then that's one wasted system call. If one does
an MGET to get the description, and gets nothing, that's one
wasted system call. *BUT* if there is a description available,
the agent using MGET already has it. But the agent using GET
*still* has to make one system call to get the description.

MGET is no more inefficient than GET when no description is available
and is twice as efficient than GET when a description is available.

> >...
> > On the contrary. I think that very quickly metadata will live
> > in metadata management systems, not in individual files, and any
> > solution that presumes that there will be a *separate* file for
> > *every* resource named on a given site is kidding themselves.
> 
> Yeah, I've been hearing that for years. I'm sorry, but the 
> solution for 
> robots.txt-type stuff should not depend on science fiction 
> semantic web 
> data management systems that nobody has deployed.
> 
> > The header tag approach is strongly biased towards large, monolithic
> > files describing large sets of resources. If I ask about some
> > resource, I don't want 8000% of the information I need. But that's
> > what I'm going to get when each tag for e.g. a DC term points to
> > a single massive RDF Schema defining the entire vocabulary.
> 
> What prevents you from having a separate file for each 
> schema? 
> Or from 
> having your metadata management system generate separate 
> URIs. 

Nothing at all. But should it really be a *requirement*.

> Are you 
> really serious that you think that this is a challenge?

Of course not. I've never said it was a challenge. It's a
trivial hack. It's a matter of whether (a) the architecture
should *require* it, and (b) whether the architecture would
*depend* on it, at the cost of reduced efficiency, requiring
two system calls rather than one to obtain resource descriptions.

I consider the header tag approach to be a kludge. Do we want
a kludge as the foundation of the future SW? I hope not.

> Surely it is 
> logical to make life a tiny bit harder for the sophisticated metadata 
> server implementor if it makes life easier for the unsophisticated 
> Apache user.

Well, if you mandated that all descriptions be given distinct URIs
by the server then there is no difference between MGET and HEAD+Meta: 
in that regard, as both will identify the body of knowledge describing the
resource in their response.

However, MGET will be twice as efficient as HEAD+Meta: since the requesting
agent will already have what it needs whereas with HEAD+Meta: it still
has to go GET the description.

And there is no reason why content negotiation can't work with MGET
and friends. In fact, I've got it in my prototype.

And MPUT, MDELETE, etc. will work properly without semantic conflict and
risk of data corruption whereas PUT+Meta: and DELETE+Meta: will
have potentially catastrophic results if used with servers that don't
grok the Meta: tag as they will then replace or delete the data
rather than the metadata!

Patrick
Received on Thursday, 20 February 2003 03:27:57 UTC