RE: MGET Again [was: web proper names] from Patrick.Stickler@nokia.com on 2004-09-22 (www-rdf-interest@w3.org from September 2004)

From: <Patrick.Stickler@nokia.com>
Date: Wed, 22 Sep 2004 10:19:00 +0300
To: <jon@hackcraft.net>, <zednenem@psualum.com>, <daniel.oconnor@gmail.com>
Cc: <www-rdf-interest@w3.org>
Message-ID: <1E4A0AC134884349A21955574A90A7A50ADCCB@trebe051.ntc.nokia.com>
> -----Original Message-----
> From: ext Jon Hanna [mailto:jon@hackcraft.net]
> Sent: 21 September, 2004 17:41
> To: Stickler Patrick (Nokia-TP-MSW/Tampere); zednenem@psualum.com;
> daniel.oconnor@gmail.com
> Cc: www-rdf-interest@w3.org
> Subject: MGET Again [was: web proper names]
> 
> 
> > But that doesn't mean that the representation you
> > get back is a *description* of the thing denoted.
> > 
> > The resource denoted could be, e.g., a particular
> > ontology, and the RDF/XML returned is an expression
> > of (representation of) that ontology, *not* a description
> > of that ontology.
> > 
> > Eh?
> 
> The RDF/XML is a representation of that ontology, and as such a
> description of the terms it defines. There is no reason why 
> the ontology
> should contain a description of the ontology itself in the RDF/XML,
> indeed this is both common practice and IMHO good practice. 

It's true that there's no reason why the RDF/XML representation of
an ontology can't include statements about the ontology itself, and
in fact, we follow that practice in Nokia for most ontologies and
other RDF/XML instances.

However, there are many reasons why such a "best practice" cannot
be considered a component of the foundational architecture of the
SW.

1. Size. It may be that the RDF/XML representation of that ontology
is many megabytes in size (e.g. Wordnet, Cyc, etc.) and so if all one
wants/needs is a description of the ontology itself, it's pretty 
impractical to ask for the whole enchilada.

I.e., while one may consider the entire set of descriptions of all
terms (resources) in a given ontology a valid representation of 
that ontology, it would not correspond to a concise bounded description
of the ontology alone.

2. Ownership/Management: The publisher of a given representation may
not be the owner, but merely have rights to publish via a particular
URI, and thus it is impractical or even impossible to introduce or
augment a description of the resource in question into the RDF/XML
representation. Even if the publisher owns the representation, it
may still be infeasible to modify the description insofar as publication
is concerned via a particular URI due to a complex and distributed
content management infrastructure.

3. A robust, efficient, and globally ubiquitous semantic web needs
a more precise, well engineered foundation, providing concise bounded
descriptions of resources with clear determination of success or failure;
rather than a crap shoot with representations simply hoping for the best.

Sifting through representations obtained via a URI for information about 
the resource denoted by that URI, hoping that folks have employed reasonably
good practices and hoping that something useful can be gleaned is IMO a
pretty sloppy way to go about things, from an engineering perspective.
IMO, much better to be able to ask *exactly* for what is needed, and
know *explicitly* from the response whether it has been provided.

Yes, the approach you outline *can* be made to work in certain contexts
where there is total ownership and control of all components of the
solution, but it breaks down in other critical application areas, and
thus is IMO unsuitable as a part of the foundational architecture of
the semantic web which we will have to live with for many, many years 
to come.

For a real world example, compare the response to

GET /schemas/nokia/MARS-3.1.rdf
Host: sw.nokia.com
Accept: application/rdf+xml

e.g. curl -H "Accept: application/rdf+xml" -L http://sw.nokia.com/schemas/nokia/MARS-3.1.rdf

with

MGET /schemas/nokia/MARS-3.1.rdf
Host: sw.nokia.com

e.g. curl -X MGET -L http://sw.nokia.com/schemas/nokia/MARS-3.1.rdf

which I think very well illustrates my points above.

Note that <http://sw.nokia.com/schemas/nokia/MARS-3.1.rdf> denotes
an RDF/XML instance (a representation, a document) and not an ontology,
even though its RDF/XML representation happens to describe (partially)
many resources comprising an ontology. The description of the RDF/XML 
document is also included within the RDF/XML representation, reflecting 
your suggested "good practice", but the description of the RDF/XML 
document itself is a very, very small fraction of the total content 
embodied by its RDF/XML representation.

Thus, clearly, a solution such as URIQA does not in any way lessen
the utility and "goodness" of the practice of describing resources
within RDF/XML representations of those resources. In fact, that is one
very good way for a server to obtain concise bounded descriptions of
those resources (without having to potentially force-feed the client 
megabytes of data to be sifted through on the client end) and how
we do it on the Nokia Semantic Web Server in many cases.

> The only reason I can see for not including triples with the 
> URI of the
> ontology itself in
> the ontology is that you don't care to describe it. 

As I pointed out above, in large publication environments, where there
are complex legal agreements regarding content (which may very well
include RDF/XML representations) the publisher may simply not have
the ability to insert into the representation itself what it considers 
essential information about the resource, insofar as publication via 
their URI is concerned.

I also consider it a very "good practice" to keep metadata about
resources and the resources (or representations) clearly distinct,
and most large scale CM systems I've either built, used, or reviewed
also embrace such a practice, and in fact employ different kinds
of metadata about the same resources at different layers with
differing visibility/access.

Requiring inclusion of representation descriptions within RDF/XML
representations simply won't scale to a globally ubiquitous solution.

URIQA allows for either approach, and favors neither over the other.

> > Content negotation *cannot* be used to reliably
> > accomplish what URIQA seeks to provide.
> > 
> > And when you want a description in N-Triples, XTM,
> > TriX, N3, etc. how will you ask for it, if conneg
> > is already (improperly) busy doing something else?
> 
> The parenthecal "(improperly)" is where we disagree I think. Still, I
> thank you for the CBD I think it's a very useful concept in deciding
> what goes into an RDF/XML document (put in the CBD first, then think
> about what else, if anything, is justified).

By "improperly" I mean that content negotiation is intended to provide
informationally equivalent representations in alternative encodings.

I do not see the distinction between an arbitrary RDF/XML representation
and a concise bounded representation as falling within that scope.

I played around with using a distinct MIME type and conneg in a very
early incarnation of URIQA, where one could ask for something like
application/rdf+xml+cbd individually from application/rdf+xml to
request a concise bounded description rather than just some arbitrary
(to the client) RDF/XML instance but concluded that it conflicted
with the intended purpose of MIME/conneg (and that the distinct
HTTP methods were much cleaner from an engineering perspective).

Thus, e.g.

GET /foo/bar HTTP/1.1
Host: www.example.com
Accept: application/rdf+xml

would return whatever RDF/XML representation the publisher wanted
to provide, which may very well be huge, and describe alot more
resources than just http://www.example.com/foo/bar; whereas

GET /foo/bar HTTP/1.1
Host: www.example.com
Accept: application/rdf+xml+cbd

would be synonymous with

MGET /foo/bar HTTP/1.1
Host: www.example.com


I have my doubts, though, about proper/reliable failure of

GET /foo/bar HTTP/1.1
Host: www.example.com
Accept: application/rdf+xml+cbd

by servers which do not implement conneg, or do so in too
"helpful" a manner, and while not recognizing the specialized
content time, nevertheless return a representation that is
not a CBD.

I know at least that if a server has not implemented the
explicit URIQA methods, that the request

MGET /foo/bar HTTP/1.1
Host: www.example.com

will result in a clear failure, and if it has implemented
the URIQA methods, a successful response should be reasonably
trustworthy as a CBD. That's, to me, far more satisfying from a
large scale systems engineering perspective.

The semantic web is complicated enough without having to make
our agents guess about and sift through arbitrary representations.

Patrick
Received on Wednesday, 22 September 2004 07:27:55 UTC