Re: Site metadata; my preference from Paul Prescod on 2003-02-19 (www-archive@w3.org from February 2003)

From: Paul Prescod <paul@prescod.net>
Date: Wed, 19 Feb 2003 12:36:18 -0800
To: Patrick.Stickler@nokia.com
CC: timbl@w3.org, distobj@acm.org, www-archive@w3.org
Message-ID: <3E53EAC2.8080602@prescod.net>
Patrick.Stickler@nokia.com wrote:
> 
> ....
> But if I want a DESCRIPTION of the resource denoted by a URI, it
> takes *two* system calls. One to get a HEAD that has the URI
> denoting the metadata, and one to get the metadata.
> 
> If all I care about is gathering descriptions of resources (and
> many SW agents will be doing just that for many applications)
> it is twice as expensive for them to navigate the SW as it is
> for web agents to navigate the Web.

The semantic web is not the web of descriptions of web pages. The 
semantic web is the web of documents that are machine readable. Some of 
those documents will be metadata. Most probably will not be.

If an agent really is interested only in metadata, it will probably care 
about the metadata embedded in the resources themselves, such as their 
TITLEs, and LINK elements etc.

>...
> I frequently care about the metadata without caring about the
> representations -- especially when I am dealing with resources
> that have *no* representation, such as abstract or non-web-accessible
> resources.

There are no resources that have no logical representation. "I am a 
walrus" is a representation and a damn useful one.

> Folks need to stop thinking that the SW is limited to web resources.

A semantic description of a thing _is_ a representation of the thing.

> It is for *anything*, whether there is a web-accessible representation
> or not.

There should always be a web-accessible representation.

> Binding the SW to mechanisms optimized for accessing representations
> is simply lopsided.

"Dictionaries should not go in the library!"

>>Furthermore, there is a big problem with the MGET proposal. 
>>It does not 
>>provide a way for a browser to be told that there is metadata 
>>available 
>>with a particular URI. This implies you need a header that says 
>>"MGET-available: true".
> 
> Eh? Why?! 
> 
> How does a browser know that a particular representation is available?
> It asks for it.

It can. Or the server can announce what representations are available: 
"With agent-driven negotiation, selection of the best representation for 
a response is performed by the user agent after receiving an initial 
response from the origin server. Selection is based on a list of the 
available representations of the response included within the header 
fields or entity-body of the initial response, with each representation 
identified by its own URI. Selection from among the representations may 
be performed automatically (if the user agent is capable of doing so) or 
manually by the user selecting from a generated (possibly hypertext) menu"

> And the browser doesn't have to be given some different URI from that
> of the resource to get the metadata. It just does an MGET on the *same*
> URI, and if it gets something, great. If not, so what.

So browsers should go around "testing" for whether metadata is 
available? Now you're doubling the amount of traffic going over the 
_original_ Web. I can guarantee you that this is more expensive than 
doubling the amount of traffic on the semantic web (which is not the 
case, regardless)

>...
> On the contrary. I think that very quickly metadata will live
> in metadata management systems, not in individual files, and any
> solution that presumes that there will be a *separate* file for
> *every* resource named on a given site is kidding themselves.

Yeah, I've been hearing that for years. I'm sorry, but the solution for 
robots.txt-type stuff should not depend on science fiction semantic web 
data management systems that nobody has deployed.

> The header tag approach is strongly biased towards large, monolithic
> files describing large sets of resources. If I ask about some
> resource, I don't want 8000% of the information I need. But that's
> what I'm going to get when each tag for e.g. a DC term points to
> a single massive RDF Schema defining the entire vocabulary.

What prevents you from having a separate file for each schema? Or from 
having your metadata management system generate separate URIs. Are you 
really serious that you think that this is a challenge? Surely it is 
logical to make life a tiny bit harder for the sophisticated metadata 
server implementor if it makes life easier for the unsophisticated 
Apache user.

  Paul Prescod
Received on Wednesday, 19 February 2003 15:36:47 UTC