- From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
- Date: Fri, 01 Apr 2005 15:41:08 +0100
- To: Tim Berners-Lee <timbl@w3.org>
- Cc: David Wood <dwood@mindswap.org>, www-tag@w3.org, public-swbp-wg@w3.org
Hi Tim, my personal responses to your technical questions ... inline, perhaps in rather too much detail (sorry). Summary: Answering questions about the resources identified by URIs (rather than the representations returned) is in scope of the Semantic Web. We identify many, possible contradictory, claims about these resources from any number of available SemWeb sources, of varying credibility. We then choose between these claims, as to which ones we will treat as facts, depending on our particular task and application. Jeremy Tim: > Clearly the SWBPWG has an architecture in mind. > Could the SWBPWG, in proposing an architecture, like to > propose an ontology of Web architecture? > I initially wanted to decline this but I think this e-mail reflects various thoughts of an architectural nature, although I wouldn't describe them as an ontology of Web architecture > Could they for example please explain, in their > ontology, semantics of an HTTP 200 response? > Looking at the RFC I read: [[ GET an entity corresponding to the requested resource is sent in the response; ]] my understanding of 'corresponding' is that a representation of the resource identified by the URI is returned. > Could the SWBPWG please answer also answer the following: In answering these questions which are about metadata, I will think how a semantic web agent might answer these. My ideal model uses the semantic web as a distributed knowledge base, with a trust architecture following Chris Bizer's ideas, for example, as described in our paper with Pat Hayes and Patrick Stickler. http://www.hpl.hp.com/techreports/2004/HPL-2004-57.html (to be presented at WWW 2005) I believe Chris has been chatting with you recently about his work on trust. Since these questions are about metadata it seems appropriate to think of them from a SemWeb point of view. > > 1. Who was the creator <http://www.w3.org/2005/moby/dick> ? > My agent would first look in its knowledge base from trusted sources to answer this question. Let us assume there is nothing. We could then try an HTTP GET asking for application/xml+rdf mime type ... that appears to not return anything useful. We can retrieve an html page from a GET. My agent would look at, and see if metadata is encoded using techniques such as: a link to an RDF/XML document, as described in RDF Syntax RDF/A encoding of metadata in XHTML GRDDL We draw a blank again. Realistically, at this point my agent would give up, but for the sake of this thought experiment, we can assume that it has a natural language component that manages to make some sense of the HTML page. Maybe, looking at the <address> element, it could conclude something: <address> Tim BL, 2005 </address> Maybe it would recognize Tim BL as Tim Berners-Lee, maybe not. Maybe it would conclude that Tim Berners-Lee was the http://purl.org/dc/elements/1.1/creator of the page. i.e. G1: { <http://wwww.w3.org/2005/moby/dick> <http://purl.org/dc/elements/1.1/creator> "Tim Berners-Lee" . } However, since this is based on guesswork, within the trust architecture, this claim would be treated as not very dependable, and would not be used, for example, as the basis of a financial transaction. Perhaps, the natural language component would read the text: <p>The URI "http://wwww.w3.org/2005/moby/dick" identifies a book, "Moby Dick", written by Herman Melville. The book starts as follows.</p> and translate this into RDF say as G2: { <http://wwww.w3.org/2005/moby/dick> rdf:type eg:Book . <http://wwww.w3.org/2005/moby/dick> <http://purl.org/dc/elements/1.1/title> "Moby Dick" . <http://wwww.w3.org/2005/moby/dick> <http://purl.org/dc/elements/1.1/creator> "Herman Melville" . } Again, this would be marked as not very reliable, and not suitable for use by high value applications. > 2. What is the year of creation of <http://www.w3.org/2005/moby/dick> ? Following 1. the natural language analysis component may analyze the <address> field and make the following claim G3: { <http://www.w3.org/2005/moby/dick> <http://purl.org/dc/elements/1.1/date> "2005" . } This claim is also supported by analyzing the URL itself, being aware of some W3C policies, being aware that the current year is 2005, and hence concluding that the URL was coined in 2005, but that doesn't really tell us about the resource identified by the URI. Also Web Architecture tells us that inspecting URIs is not a good thing to do. http://www.w3.org/TR/2004/REC-webarch-20041215/#uri-opacity (Aside: why does this not apply to matching /http:.*#.*/) On the other hand, having hypothesized G2 (above) we may look this up in a bibliographic database and conclude that: G4: { <http://wwww.w3.org/2005/moby/dick> <http://purl.org/dc/elements/1.1/date> "1851-10-18" . } The process of bibliographic look up is likely to be fairly reliable, so the claim in G4 is about as reliable as that in G2. Since G4 and G2 together are at least surprising, if not simply contradictory, our level of trust in the natural language agent is getting fairly low by this point. The trust architecture allows the application to choose between: a) trusting to some extent G1 and G3 b) trusting to some extent G2 and G4 c) trusting neither enough to use The essence of the problem here is that the representation chosen of the information resource, the book, called "Moby Dick" seems to not be a very good one, in some ways quite misleading. I prefer: http://etext.lib.virginia.edu/etcbin/toccer-new2?id=Mel2Mob.sgm&images=images/modeng&data=/texts/english/modeng/parsed&tag=public&part=all > > 3. Who was the creator <http://www.w3.org/2005/moby/xyz> ? > Going through a similar process to 1. we conclude, with low confidence, G6 { <http://wwww.w3.org/2005/moby/xyz> <http://purl.org/dc/elements/1.1/creator> "Tim Berners-Lee" . } There is no analogue to G2. > 4. What is the year of creation of <http://www.w3.org/2005/moby/xyz> ? Again we might use a process as under 2. to get to G7 { <http://wwww.w3.org/2005/moby/xyz> <http://purl.org/dc/elements/1.1/date> "2005" . } again with low confidence. However, the absence of contradictory information may cause us, in practice, if we need to make a guess in order to do something, to go with G6 and G7 in cases where we would not go with G1 G2 G3 G4 and G5. Maybe, in the agent's knowledge base of trusted facts, will be the following: G0 { <http://wwww.w3.org/2005/moby/xyz> rdf:type eg:AcademicExample . <http://wwww.w3.org/2005/moby/dick> rdf:type eg:AcademicExample . } and have rules that any claims made about URIs known to be of type eg:AcademicExample should be ignored, and so the agent will know that it is not really worth answering your questions posed above, at least not for any real application function. But of course *my* agent wouldn't have such a rule, because I'm always playing with academic examples. These resources <http://wwww.w3.org/2005/moby/xyz> <http://wwww.w3.org/2005/moby/dick> are useful and have meaning for this discussion thread, but other applications are likely to find greater utility in other resources and other URIs. If my agent is particularly aware of my current task, trying to articulate a position on httpRange-14, it may choose G2 and G4 over G1 and G3, since these are more consistent with my position. > > This is not to say that the is issue is simple, or that the present > practice > does not include that the SWBP describes. It asks for a consistent > and worked out alternative. I think the alternative I'm groping towards here is one that handles inconsistency, rather than seeing the Web as even asymptopically consistent. > > I had the hope, after the face-face meeting at the TP, that the > task the group was taking on was to lay out that architecture. > > Tim BL Jeremy
Received on Friday, 1 April 2005 14:41:18 UTC