Re: Server and client burden for URIQA vs. Link: from Patrick.Stickler@nokia.com on 2009-02-28 (www-tag@w3.org from February 2009)

From: <Patrick.Stickler@nokia.com>
Date: Sat, 28 Feb 2009 17:10:30 +0100
To: <richard@cyganiak.de>
CC: <jar@creativecommons.org>, <www-tag@w3.org>
Message-ID: <C5CF2E96.E178%patrick.stickler@nokia.com>
On 2009-02-27 20:41, "ext Richard Cyganiak" <richard@cyganiak.de> wrote:

> 
> 
> On 27 Feb 2009, at 16:20, <Patrick.Stickler@nokia.com> wrote:
>> Sorry Richard. For some reason, this particular message got trapped
>> by my
>> spam filter. No clue why.
>> 
>> Our server sw.nokia.com was undergoing some maintenance upgrades and
>> not all
>> of the nodes in the farm were fully configured.
>> 
>> It should be working fine now.
> 
> It does. Nice! Here are two different ways of querying the server with
> off-the-shelf Unix command line tools:
> 
> richard@cygri:~$ telnet sw.nokia.com 80
> MGET /MARS-3 HTTP/1.0
> Host: sw.nokia.com
> 
> richard@cygri:~$ curl -X MGET http://sw.nokia.com/MARS-3
> 
> 
> I get the following response. I like how the server gives me a normal
> GETable URI for the description via Content-Location:
> 
> 
> HTTP/1.1 200 OK
> Cache-Control: no-cache
> Connection: Close
> Content-Location:
> http://sw.nokia.com/uriqa?uri=http%3a%2f%2fsw%2enokia%2ecom%2fMARS%2d3
> Content-Type: application/rdf+xml; charset=UTF-8
> Date: Fri, 27 Feb 2009 18:17:29 GMT
> Set-Cookie: S_ID=B16BCD05F74E44500D95FE2C; path=/
> URIQA-authority: http://sw.nokia.com/uriqa
> Server: rdfgateway/3.000 SI
> Content-Length: 3976
> 
> <?xml version="1.0" encoding="utf-8"?>
> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>           xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
>           xmlns:owl="http://www.w3.org/2002/07/owl#"
>           xmlns:dc="http://purl.org/dc/elements/1.1/"
>           xmlns:dcterms="http://purl.org/dc/terms/"
>           xmlns:rss="http://purl.org/rss/1.0/"
>           xmlns:syn="http://purl.org/rss/1.0/modules/syndication/"
>           xmlns:voc="http://sw.nokia.com/VOC-1/"
>           xmlns:web="http://sw.nokia.com/WebArch-1/"
>           xmlns:sw="http://sw.nokia.com/SWArch-1/"
>           xmlns:uriqa="http://sw.nokia.com/URIQA-1/"
>           xmlns:mars="http://sw.nokia.com/MARS-3/"
>           xmlns:nc="http://sw.nokia.com/NC-1/"
>           xmlns:dp="http://sw.nokia.com/DP-1/"
>           xmlns:fn="http://sw.nokia.com/FN-1/">
> <rdf:Description rdf:about="http://sw.nokia.com/MARS-3">
>        <voc:term rdf:resource="http://sw.nokia.com/MARS-3/Actor"/>
> [...snip...]
> 
> 
> In practical terms, from a client's point of view, I don't see much
> difference between
> 
>     curl -X MGET
> 
> and
> 
>     curl -H "Accept: application/rdf+xml"
> 
> and I would assume that both are about equally easy or hard to do from
> the client side. On the server side it might be a somewhat different
> story I believe, most web development frameworks are quite opiniated
> about request methods.

A very significant practical problem I see with using content negotiation to
ask for formal descriptions of resources (where description is the narrow
sense such that an RDF graph can be derived which is useful to a semantic
web agent) is that what if the resource in question is e.g. an ontology?

Yes, an RDF/XML instance can include statements about the ontology, as well
as statements that comprise the ontology proper, but think of a poor mobile
client (or a desktop client having to deal with limited bandwith) which
wants to get some information about the ontology and the ontology itself has
thousands of terms and tens of thousands of statements.

If it just asks for the RDF/XML encoded representation of the ontology, in
the hopes it will have some information about the ontology, it can easily
choke on what is returned since the server has no idea that the client only
is looking for facts about the ontology and not the complete RDF/XML
representation of the ontology. There *must* be a precise way for clients to
efficiently ask for the facts, and nothing but the facts, about a particular
resource, and not have to play "semantic web roulette" and hope it doesn't
get served orders of magnitude more information than it wants or needs.

This is not to say that including statements about an ontology in the actual
schema representations is not a good thing. Or that embedding triples using
RDFa or other microformats in larger information contexts is not a good
thing. Choosing any particular methdology is going to take into
consideration not only the processing burden, but also the authoring burden
(and in my experience in implementing semantic web solutions, the authoring
and maintenance burden nearly always trumps the processing burden).

Too often, a great idea/proposal/solution is technically and philosophically
attractive, but woefully impractical in the "real world" by not fully
addressing (or addressing at all) the "human element" of the design.

URIQA is designed to provide a means for resource owners to consistently,
efficiently and unambiguously serve a formal description any resource
independently of how any of its Web published representations are created,
managed, or published (since not all resources will necessarily be created,
managed, or published following the same process, nor will the same persons
necessarily be involved in more than one of those processes. Whether the
description is embedded within a particular representation of the resource,
or defined separately from any representation intended to be served via GET
is irrelevant to the agent asking about the resource. Whether the formal
description is created at the same time as one or more representations, or
whether it is defined later, is irrelevant to the agent. Whether a resource
has many variant representations or only one, or none, is irrelevant to the
agent.

At Forum Nokia, we employ a number of different tools, processes, and
methodologies to define the formal metadata descriptions about resources,
depending on the type of resource, who "owns" it, the nature of its
representations, etc. Sometimes, the metadata is embedded. Sometimes it is
defined via a separate tool. But in all cases, it is accessible to agents
without any distinction of how it was defined.


> 
> So much for the technical side. On the architectural side, I have to
> say that I see the appeal of URIQA in scenarios such as metadata for
> media files (images, videos etc) where it's really hard or awkward to
> add links to metadata or embed the metadata into the file.

Exactly.

And if you can't easily add them into the representation, you are going to
have to have a solution for associating them at the server level, and thus,
in the end, the amount of implementation and management effort imposed by a
link based solution as the *primary* way that agents request descriptions of
resources will be equal to or greater than that for URIQA.

But if folks want to use links to redirect to descriptions, fine. URIQA is
fully compatible with the use of links to associate descriptions with
resources. It's just one more way that descriptions can be discovered,
harvested, and syndicated into a knowledgebase which a server can use to
respond to semantic web queries.


> But the web
> is held together by HTML documents (and, in a hypothetical future I
> sort of hope for, RDF documents), and it is easy to embed metadata
> directly into them. Even access paths to media files usually lead
> through some HTML page that can provide the metadata about the target
> image or video, in prose or embedded microformats or RDFa.

Yes, but just because one can embed RDF triples within representations does
not mean that the most efficient and scalable approach to serving
descriptions to semantic web agents is via those representations.

Another practical issue with approaches such as link that seems to get
overlooked is that it places the burden on every semantic web agent to know
how to locate and extract the triples of interest for every commonly used
method to publish such descriptions, yet the particular methods used to
associate descriptions with resources is a choice of the resource owner (and
there may be many different resource owners with different needs and
processes all publishing on the same server, such as in our case) and will
vary widely from case to case, even on the same site, and more from site to
site. Why expect agents to deal with that variety? Rather, employ semantic
desktop harvesting techniques at the server level so that semantic web
agents can ask about resources using a simple, efficient, consistent
protocol like URIQA that is independent of any metadata publication method.

Then, as various representation encodings and microformat embedding
methodologies change and evolve over time, all of the worlds semantic web
agents don't have to change as well.

> 
> So, proposed addition to the (nicely thorough) FAQ section at [1]:
> 
> Why not embed the metadata in the document described by metadata? Or
> in the document where the agent found the URIQA-enabled URI?

I may add some wording capturing the thoughts above which answer that
question.

... 
>>> (Let me tell you that URIQA might be more successful if its
>>> proponents
>>> were producing more demos and running code examples and less words!)

Missed this in your earlier post...

Not sure if you're aware that Nokia released our entire semantic web server
code base as open source some time ago. The open source version is a bit
old, but the core functionality of our semantic web server component has not
changed much since then.

C.f. http://sourceforge.net/projects/sws-uriqa/

Regards,

Patrick
Received on Saturday, 28 February 2009 16:08:34 UTC