Re: SWBPD WG Resolution Regarding httpRange-14

Hi Tim,

my personal responses to your technical questions ...
inline, perhaps in rather too much detail (sorry).

Summary:
Answering questions about the resources identified by URIs (rather than 
the representations returned) is in scope of the Semantic Web. We 
identify many, possible contradictory, claims about these resources from 
any number of available SemWeb sources, of varying credibility. We then 
choose between these claims, as to which ones we will treat as facts, 
depending on our particular task and application.

Jeremy

Tim:
> Clearly the SWBPWG has an architecture in mind.
> Could the SWBPWG, in proposing an architecture, like to
> propose an ontology of Web architecture?
> 

I initially wanted to decline this but I think this e-mail reflects 
various thoughts of an architectural nature, although I wouldn't 
describe them as an ontology of Web architecture

> Could they for example please explain, in their
> ontology, semantics of an HTTP 200 response?
> 

Looking at the RFC I read:
[[
  GET an entity corresponding to the requested resource is sent in the 
response;
]]

my understanding of 'corresponding' is that a representation of the 
resource identified by the URI is returned.

> Could the SWBPWG please answer also answer the following:

In answering these questions which are about metadata, I will think how 
a semantic web agent might answer these. My ideal model uses the 
semantic web as a distributed knowledge base, with a trust architecture 
following Chris Bizer's ideas, for example, as described in our paper 
with Pat Hayes and Patrick Stickler.

http://www.hpl.hp.com/techreports/2004/HPL-2004-57.html

(to be presented at WWW 2005)

I believe Chris has been chatting with you recently about his work on trust.

Since these questions are about metadata it seems appropriate to think 
of them from a SemWeb point of view.


> 
> 1. Who was the creator <http://www.w3.org/2005/moby/dick> ?
> 

My agent would first look in its knowledge base from trusted sources to 
answer this question. Let us assume there is nothing.

We could then try an HTTP GET asking for application/xml+rdf mime type 
... that appears to not return anything useful.

We can retrieve an html page from a GET. My agent would look at, and see 
if metadata is encoded using techniques such as:
   a link to an RDF/XML document, as described in RDF Syntax
   RDF/A encoding of metadata in XHTML
   GRDDL

We draw a blank again.

Realistically, at this point my agent would give up, but for the sake of 
this thought experiment, we can assume that it has a natural language 
component that manages to make some sense of the HTML page.

Maybe, looking at the <address> element, it could conclude something:
<address>
   Tim BL, 2005
</address>

Maybe it would recognize Tim BL as Tim Berners-Lee, maybe not.
Maybe it would conclude that Tim Berners-Lee was the 
http://purl.org/dc/elements/1.1/creator of the page.
i.e.
G1: {
<http://wwww.w3.org/2005/moby/dick>
   <http://purl.org/dc/elements/1.1/creator>
   "Tim Berners-Lee" .
}

However, since this is based on guesswork, within the trust 
architecture, this claim would be treated as not very dependable, and 
would not be used, for example, as the basis of a financial transaction.

Perhaps, the natural language component would read the text:

<p>The URI  "http://wwww.w3.org/2005/moby/dick" identifies a book, "Moby
Dick", written by Herman Melville. The book starts as follows.</p>

and translate this into RDF say as

G2: {
<http://wwww.w3.org/2005/moby/dick> rdf:type eg:Book .
<http://wwww.w3.org/2005/moby/dick>
   <http://purl.org/dc/elements/1.1/title>
   "Moby Dick" .
<http://wwww.w3.org/2005/moby/dick>
   <http://purl.org/dc/elements/1.1/creator>
   "Herman Melville" .
}

Again, this would be marked as not very reliable, and not suitable for 
use by high value applications.

> 2. What is the year of creation of <http://www.w3.org/2005/moby/dick> ?

Following 1. the natural language analysis component may analyze the 
<address> field and make the following claim

G3: {
<http://www.w3.org/2005/moby/dick>
    <http://purl.org/dc/elements/1.1/date>
    "2005" .
}

This claim is also supported by analyzing the URL itself, being aware of 
some W3C policies, being aware that the current year is 2005, and hence 
concluding that the URL was coined in 2005, but that doesn't really tell 
us about the resource identified by the URI. Also Web Architecture tells 
us that inspecting URIs is not a good thing to do.

http://www.w3.org/TR/2004/REC-webarch-20041215/#uri-opacity

(Aside: why does this not apply to matching /http:.*#.*/)

On the other hand, having hypothesized G2 (above)
we may look this up in a bibliographic database and conclude that:

G4: {
<http://wwww.w3.org/2005/moby/dick>
    <http://purl.org/dc/elements/1.1/date>
    "1851-10-18" .
}

The process of bibliographic look up is likely to be fairly reliable, so 
the claim in G4 is about as reliable as that in G2.

Since G4 and G2 together are at least surprising, if not simply 
contradictory, our level of trust in the natural language agent is 
getting fairly low by this point. The trust architecture allows the 
application to choose between:

a) trusting to some extent G1 and G3
b) trusting to some extent G2 and G4
c) trusting neither enough to use

The essence of the problem here is that the representation chosen of the 
information resource, the book, called "Moby Dick" seems to not be a 
very good one, in some ways quite misleading. I prefer:

http://etext.lib.virginia.edu/etcbin/toccer-new2?id=Mel2Mob.sgm&images=images/modeng&data=/texts/english/modeng/parsed&tag=public&part=all


> 
> 3. Who was the creator <http://www.w3.org/2005/moby/xyz> ?
> 
Going through a similar process to 1. we conclude, with low confidence,

G6 {
<http://wwww.w3.org/2005/moby/xyz>
   <http://purl.org/dc/elements/1.1/creator>
   "Tim Berners-Lee" .
}

There is no analogue to G2.

> 4. What is the year of creation of <http://www.w3.org/2005/moby/xyz> ?

Again we might use a process as under 2. to get to

G7 {
<http://wwww.w3.org/2005/moby/xyz>
   <http://purl.org/dc/elements/1.1/date>
   "2005" .
}

again with low confidence.

However, the absence of contradictory information may cause us, in 
practice, if we need to make a guess in order to do something, to go 
with G6 and G7 in cases where we would not go with G1 G2 G3 G4 and G5.

Maybe, in the agent's knowledge base of trusted facts, will be the 
following:

G0 {
<http://wwww.w3.org/2005/moby/xyz>
    rdf:type
    eg:AcademicExample .
<http://wwww.w3.org/2005/moby/dick>
    rdf:type
    eg:AcademicExample .
}

and have rules that any claims made about URIs known to be of type 
eg:AcademicExample should be ignored, and so the agent will know that it 
is not really worth answering your questions posed above, at least not 
for any real application function. But of course *my* agent wouldn't 
have such a rule, because I'm always playing with academic examples.
These resources <http://wwww.w3.org/2005/moby/xyz> 
<http://wwww.w3.org/2005/moby/dick> are useful and have meaning for this 
discussion thread, but other applications are likely to find greater 
utility in other resources and other URIs.

If my agent is particularly aware of my current task, trying to 
articulate a position on httpRange-14, it may choose G2 and G4 over G1 
and G3, since these are more consistent with my position.


> 
> This is not to say that the is issue is simple, or that the present 
> practice
> does not include that the SWBP describes. It asks for a consistent
> and worked out alternative.

I think the alternative I'm groping towards here is one that handles 
inconsistency, rather than seeing the Web as even asymptopically consistent.

> 
> I had the hope, after the face-face meeting at the TP, that the
> task the group was taking on was to lay out that architecture.
> 
> Tim BL

Jeremy

Received on Friday, 1 April 2005 14:41:17 UTC