Re: Squaring the HTTP-range-14 circle [was Re: Schema.org in RDF ...] from Pat Hayes on 2011-06-14 (public-lod@w3.org from June 2011)

From: Pat Hayes <phayes@ihmc.us>
Date: Tue, 14 Jun 2011 08:55:09 -0700
To: William Waites <ww@styx.org>
Cc: Danny Ayers <danny.ayers@gmail.com>, Richard Cyganiak <richard@cyganiak.de>, Alan Ruttenberg <alanruttenberg@gmail.com>, Linked Data community <public-lod@w3.org>, Michael Hausenblas <michael.hausenblas@deri.org>
Message-Id: <87C78DC1-9DA5-489B-BC45-8281EBC8FCE1@ihmc.us>
On Jun 14, 2011, at 1:54 AM, William Waites wrote:

> * [2011-06-13 20:33:47 -0700] Pat Hayes <phayes@ihmc.us> écrit:
> 
> ] > So there is some relationship between a description of the Eiffel
> ] > tower and the tower itself. The relationship is akin to similarity in
> ] > a very specific way - they are similar enough that someone thought it
> ] > made sense to write down that the tower was 356m tall.
> ] 
> ] What has that got to do with the tower being similar to its description? 
> 
> Simply that they are similar enough (in the relevant respects etc)
> that one can write ":eiffel :height 324" for either and (reasonably?)
> expect the reader not to be confused.

Well, you have got me confused. Are you saying here that it does in fact make sense to say that a description of the eiffel tower is 356M tall? So that your triple here is actually ambiguous, but one can rely of reader's common sense to figure out which one is meant? I had always thought that when people used a name of a name instead of the name of a thing, they were usually just blurring the use/reference distinction, not that they genuinely weren't sure whether they were talking about things or names.

> 
> ] First, you seem to be assuming here that the tower and its description
> ] are NOT similar, contrary to what you said earlier and Danny seems to 
> ] be insisting upon. Second, this hypothetical person is, we both agree,
> ] confused. They made a mistake, what they said was wrong. Correct? I ask,
> ] because many people seem to want to say that they were NOT confused or
> ] wrong, just kind of less correct than if they used the right URI. 
> 
> Confused or speaking loosely, not bothering to make the distinction
> because it seems to them that they are being clear enough that any
> reader will understand what they mean. If you call them on it they

> will probably agree that, yes, "what I really meant was ... but to
> have written that out would have seemed excessively pedantic" in
> exactly the same way that I wasn't confused when I wrote "confused"
> but I admit to being inexact :)

Maybe we just produced the Web situation in miniature, because whether or not you were confused, I certainly was (and still am) trying to figure out what you are saying here. 

> 
> So I agree with these many people who want to say that there are a lot
> of inexact statements that are not made by confused people just by
> people with perhaps unreasonably high expectations that the readers of
> their statements will be able to figure out what they meant if not
> strictly what they said.
> 
> ] Third, and most important, anyone interested is unlikely to be confused,
> ] yes indeed. But any piece of software or inference engine is not
> ] unlikely to be confused. 
> 
> So this is the mismatch. Publishers write things down with some
> assumptions of what is likely to cause confusion that are probably
> based largely on their interactions with other humans, not with
> inference engines.

If they do this when writing, say, Javascript or PERL, things will go badly wrong. If they do it when writing RDF, things will also go wrong. Even when writing English in a non-conversational situation where reading is separated from writing (eg road signs, email), things will go wrong surprisingly often. I am not sure how much we can expect to be responsible for people saying garbage because they are too lazy or incompetent to learn how to use a language. 

> Writing things down exactly is incredibly difficult. A very large part
> of almost every discussion or disagreement usually comes down to
> someone understanding what was said differently than the person who
> said it meant. It can often take a lot of discussion before this
> becomes apparent. And that's between humans!

You seem to be making my point for me here :-)

> 
> So we want to get people to publish linked or structured data that is
> as exact as possible. Each step in that direction is a little bit more
> burdensome for the publisher, feels a little bit more pedantic and
> verbose to write down, means the publisher needs to know a little more
> about the kinds of things a reader can handle, but at the same time is
> easier to write software that can use it using simpler and more
> general algorithms that we know.
> 
> Some people seem to be saying that range-14 is a step too far. Other
> people seem to be saying that without that step it's impossible to
> write software in a general way to work with the data. If both are
> correct then we're stuck.

No, what will happen is that a class of people will arise who *do* understand http-range-14 (and other issues that are perceived as 'hard') and they will for a short while be able to earn a living writing (or writing code which generates) this stuff properly. This situation will last at most a decade, because by then a new generation of people will have educated themselves to 'speak' correctly in this new style without apparent effort, and all the whining about how terribly hard it was will be the stuff of nerdish jokes on XKCD.

> 
> The perception of RDF as complicated, verbose and pedantic is common
> and is something we cannot afford.

All we really need is enough people who can see through this mist of fear and actually get RDF written. On the whole, looking at the way the linked data is being created, I think we are doing quite well. Once stuff starts working and doing something useful, all this fear of formalism will melt away.

> Personally I don't think the range-14
> arrangment is too burdensome but outside this community this is a
> minority viewpoint. We cannot throw up extra barriers to publishers.
> So we need better software that can handle this kind of inexact data.

Sure, and to solve global warming, we need better power sources that don't emit CO2. Your move. We aren't going to get this magic software any time soon. The inference software in the semantic web engines behind RDF and OWL and RIF are the state of the art. If people can't write data that doesn't break these, we are in trouble. But I think they can: after all, they write RDB data out the wazoo. Anyone who can understand SQL can surely get their head around the distinction between the eiffel tower and a web page.

> 
> ] When you are the agent who is using this information, sure. But when
> ] you are the one publishing it or asserting it, you cannot do this.
> ] And when you are the one writing the rules to determine a globally
> ] accepted notion of entailment, you cannot do it.
> 
> Publishers will always make assumptions about how the information
> will be used. The assumptions will usually not be explicit. Even
> humans don't have a globally accepted notion of entailment

Ah, but they do. That is exactly why inference engines work.

> , it's
> all about context

No, its not all about context. There really are non-contextual logics. If it really were ALL about context, the Web itself would not work. 

> and intent on the part of the agent doing the
> reasoning. They will just have to deal with the fact that the
> publisher may not have anticipated their use.
> 
> Since range-14 seems to be a sticking point, we can try to address
> that particular kind of ambiguity with guidance about how to reason
> about information and non-information resources, and this guidance
> won't be general

It will be (well, it can be) general for the entire Web. Why not? The http-range-14 rule is actually pretty simple and intuitive. In my experience, most people kind of assume it without thinking about it, actually. (**Of course** the URI of a web page is the name of the web page...) So why not just say it, loud and clear, until people get it? 

Pat


> , it will have to do with particular classes and 
> predicates and how they should be interpreted in the local (graph)
> context.
> 
> ] Well, now you are stepping into an ocean of cans of worms. 
> 
> Oh, well aware of that :)
> 
> Cheers,
> -w
> 
> -- 
> William Waites                <mailto:ww@styx.org>
> http://river.styx.org/ww/        <sip:ww@styx.org>
> F4B3 39BF E775 CF42 0BAB  3DF0 BE40 A6DF B06F FD45
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Tuesday, 14 June 2011 15:56:05 UTC