Re: another crack at 'information resource'

To date the main effect of the httpRange-14 resolution has been a
cascade of messages on various LOD lists bullying newcomers to RDF
with taunts like "you're not a web page, are you?"  We have the stick
of public censure on pedantic-web and such places, but, as has been
pointed out with some justification, no carrot, no clear practical
benefit that ensues from adhering to the
2xx-means-information-resource rule. Furthermore the rule gets
rightfully ridiculed for not being accompanied by any sensible
definitions of "information resource" or "is a representation of". It
feels arbitrary and pointless to those who have not already bought
into it - a sort of cult.

The carrot I want to offer is that there *are* valuable things that
you can do if the rule is followed, but that you can't do in the
intended way and with confidence if it's not. And in fact these are
things that are already being done on the web, with no particular
normative justification. So, we can think of the rule as being
something that preserves and encourages a particular existing salutary
practice.

What I have in mind are metadata assertions of the kind that RDF (the
'resource description format' of the W3C 'Metadata Initiative',
remember?  back when 'resource' meant what we now call 'information
resource'?) was originally designed to encode, things like Dublin Core
and BIBO and FRBR properties, or more generally any property that is
true or false on the basis of a resource's "representations". TimBL's
genont ontology has properties of this sort too, even though they're
not what one would usually call metadata.

We can talk in practical terms about what happens with and without the
2xx/IR rule. With the rule, I can use my knowledge of a resource's
representations to make assertions about the resource. I can encode my
knowledge of GET U/200 Z exchange patterns as RDF statements with
subject <U>, just as the Metadata Initiative meant for me to do. I can
write such statements with confidence regardless of whether the URI
owner has ever heard of RDF or cares about ontology or
"identification" or anything else. Without the rule, I am constantly
in doubt - I have to look over my shoulder and ask, does this URI mean
the thing that we observe via HTTP, or does it mean some other entity?
 How would I even find out? Yes, you can invent answers to these
questions, but the answers are ad hoc, complex, brittle, unreliable,
and incompatible with current metadata practice.  The effect of
detaching the use of the subject <U> from HTTP would be a chill, the
injection of FUD, in our declarative treatment of what's on the web.

For example, suppose I know that 200 responses from URI U will always
give me responses with media type RDF/XML. I might say so using
something like <U> :mediaTypeAlways media:application-rdf-xml. In any
reasonable interpretation this would contradict <U> rdf:type
foaf:Person because it's nonsense (i.e. highly undesirable from an
engineering viewpoint) to say that a person has a media type. If the
Person assertion were found in the RDF delivered at U and given
credence, we'd have a contradiction, and a perfectly good metadata
assertion would be under siege.

If you find the Person example unconvincing, consider the more direct
case where the fetched RDF uses <U> to designate an "information
resource" that is observably different from the one that has the RDF
representation, e.g. <U> :mediaTypeAlways media:text-html.  Since
:mediaTypeAlways is functional this would be a contradiction not
requiring a judgment of nonsensicalness. (This is pretty much the same
as the Communist Manifesto vs. Wikipedia articla example I gave
before.)

If you find media type unconvincing, it can be replaced by any other
similar property such as dc:creator or dc:title.

Note I'm not presupposing any particular meaning for "information
resource" or "representation of" but rather would try to figure out
what definitions these words would need to have in order to make this
kind of use case work.

The 200/IR rule is not logically necessary. There are other possible
architectures. E.g. we could have - and I think we probably should
have - a property "isLocatedAt" that connects an IR to a URI where
it's deployed, and instead of saying <U> we could say [:isLocatedAt
"U"], the IR located at U, and use that as the subject in metadata
assertions. But this is not how people currently write DC and BIBO and
genont (etc), and it is so awful that no one ever would. We've already
gone down the 200/IR path; I think it's just a matter of publicizing
the reason why. It's not that the rule necessarily benefits the URI
owner. It's because general respect for the rule benefits those who
are doing metadata curation, and those who would make use of such
metadata.

I'm looking for general encouragement or other feedback, not of the
details but of the general approach.

Jonathan

On Wed, Nov 17, 2010 at 4:51 PM, David Booth <david@dbooth.org> wrote:
> On Thu, 2010-11-11 at 17:23 -0500, Jonathan Rees wrote:
>> [ . . . ] could enumerate lots of properties ("content properties") that I
>> think should follow the  universal quantification rule - e.g. most or
>> all of the DC and BIBO properties - but (in light of the negations of
>> these properties) I don't know how to enable someone else to
>> generalize to additional properties, i.e. what the boundaries of the
>> meta-category of content properties are.
>>
>> Before spending a lot of time trying to figure that out, though, I
>> want to convince at least one other person that this idea (universal
>> quantification to extend representation properties to IR properties)
>> has promise as a possible way to motivate the httpRange-14 rule.
>
> I don't follow this at all.  How would it motivate the httpRange-14
> rule?
>
>
>
> --
> David Booth, Ph.D.
> Cleveland Clinic (contractor)
> http://dbooth.org/
>
> Opinions expressed herein are those of the author and do not necessarily
> reflect those of Cleveland Clinic.
>
>

Received on Friday, 19 November 2010 14:25:30 UTC