Re: Is 303 really necessary? (dealing with ambiguity) from Tore Eriksson on 2010-11-09 (public-lod@w3.org from November 2010)

From: Tore Eriksson <tore.eriksson@po.rd.taisho.co.jp>
Date: Tue, 09 Nov 2010 10:51:45 +0900
To: David Booth <david@dbooth.org>
Cc: public-lod@w3.org
Message-Id: <20101109105145.8BA3.9D98B4E7@po.rd.taisho.co.jp>
Hi again David,

> On Mon, 2010-11-08 at 16:18 +0900, Tore Eriksson wrote:
> > David Booth wrote:
> [ . . . ]
> > > And others may well make statements
> > > about that web page.  For example, someone crawling the web may make a
> > > statement saying that <http://iandavis.com/2010/303/toucan> returned
> > > 1027 bytes in response to a GET request.  They may not say it in RDF --
> > > they might say it in XML or any other language.
> > 
> > As long as they they are aware that they are talking about a specific
> > representation of this resource I can't see any problem with this. If
> > they think they are stating something about the resource itself, well
> > they would be wrong even if the current URI was an "information
> > resource". They apparently need to learn more about web technology -
> > representations, caching, con-neg, &c.
> 
> How about:
> 
>   "Ian Davis owns web page <http://iandavis.com/2010/303/toucan>."
> 
>   "The content at <http://iandavis.com/2010/303/toucan> was last updated
> 7-Nov-2010."
> 
>   "<http://iandavis.com/2010/303/toucan> has a page rank of
> 123,456,789."
> 
> Those statements are not talking about any specific representations, nor
> are they talking about the toucan.  All are completely reasonable
> statements for someone knowing nothing about RDF to make.

In order:

1. "Ian Davis owns web page <http://iandavis.com/2010/303/toucan>."

I'm not sure about this one, since I don't have a good intuition about
what it means to "own" a resource. I'd rather say that he owns the URI.
If we would get to know that this resource is owl:sameAs another, say 
<http://dbpedia.org/resource/Toucan>, this assertion would be wrong. How
about:
1'. 'Ian Davis owns "http://iandavis.com/2010/303/toucan"^^xsd:anyURI'

2. "The content at <http://iandavis.com/2010/303/toucan> was last
updated 7-Nov-2010."

Sounds like a statement talking about a representation (the content) to
me - the only representation available maybe, but still not the resource.

3. "<http://iandavis.com/2010/303/toucan> has a page rank of 123,456,789."

Since the TAG has decided that anything can have a URI, it follows that
anything can have a page rank. As in (1), it's probably the URI that has
a page rank though. Redirecting by 303 doesn't affect linking anyway so
this statement is irrelevant to the current question.

Of course, people will make mistakes in modelling and producing RDF. In
a perfect world people would have a good grasp of RDF theory before
producing it, but internet culture isn't that strict so in the end we
have to deal with bad data like the one you show above. If http-range-14
had any chance of improving the situation I would be the first to join
in, but I can't really see any such effect.

> > [ . . . ]
> > > So I don't think it is reasonable or realistic to think that we can
> > > *avoid* creating an ambiguity by returning additional RDF statements
> > > with the 200 response.  Rather, the heuristic that you propose is a way
> > > for applications to *deal* with that ambiguity by tracking the
> > > provenance of the information: if one set of assertions was derived from
> > > an HTTP 200 response code, and another set of assertions was derived
> > > from an RDF document that you trust, then ignore the assertions that
> > > were derived from the HTTP 200 response code.
> > 
> > By not drawing ill-founded conclusions about the nature of the resource
> > through the response code, ambiguity could have been avoided in the
> > first place.
> 
> Apparently you and I disagree about what it means to be a web page.  I
> personally know of no better qualification criterion for something being
> a web page than if that thing returns a 200 status code in response to a
> GET request.  Perhaps one would characterize this as duck typing:
> http://en.wikipedia.org/wiki/Duck_typing
> What other criteria would you use?  

I don't think that your definition of a web page - or is it the
definition of a IR? - is wrong per se, I just don't see how it is
relevant for the semantic web since it is to broad. In no modelling I
have done, nor in any ontologies I have used, has the IR/non-IR
distinction ever been an issue, and I think that is what Ian's proposal
boils down to.

Regards,
Tore Eriksson

_______________________________________________________________
Tore Eriksson [tore.eriksson at po.rd.taisho.co.jp]
Received on Tuesday, 9 November 2010 01:52:20 UTC