Re: Squaring the HTTP-range-14 circle [was Re: Schema.org in RDF ...] from Henry Story on 2011-06-19 (public-lod@w3.org from June 2011)

From: Henry Story <henry.story@bblfish.net>
Date: Sun, 19 Jun 2011 13:44:03 +0200
To: Hugh Glaser <hg@ecs.soton.ac.uk>
Cc: Pat Hayes <phayes@ihmc.us>, Danny Ayers <danny.ayers@gmail.com>, David Booth <david@dbooth.org>, Linked Data community <public-lod@w3.org>, Jason Borro <jason@openguid.net>, Tim Berners-Lee <timbl@w3.org>
Message-Id: <0449F89D-D983-4BC3-904B-584557999566@bblfish.net>
On 19 Jun 2011, at 13:05, Hugh Glaser wrote:

> "A step too far"?
> 
> Hi.
> I've sort of been waiting for someone to say:
> "I have a system that consumes RDF from the world out there (eg dbpedia), and it would break and be unfixable if the sources didn't do 303 or #."
> Plenty of people saying they can't express what they want without it.
> And plenty of people saying they can't write some code that they might not be able to understand some RDF they receive properly.
> But no actual examples in the wild (at least as far as I can tell in a lot of messages).
> 
> This might be for quite a few reasons, such as:
> 1) There are no such consuming systems;
> 2) The existing consuming systems would not break.
> 
> Number (1) would be too embarrassing, and is wrong because I have some, so I'll think about number (2).

As you point out there are some consuming systems but they are not very distributed: you know ahead of time what you will find there, and so you can adapt your parsing for the few special cases. At that level the XML crowd/JSON crowd are right - rdf does not give you much. In fact it makes it easier to do things wrong. So we should be supporting more RESTful XML that can be GRDDLed with X-SPARQL.

The semweb gives you a lot more when things get even more distributed, such as when everyone starts having foaf files on billions of computers. At that point nobody will want to tweak their app for the specific data at one site. Also one will want to be careful of the difference between documents and things, for the same reason I pointed out with the "like" button in Facebook. So for the moment the errors don't appear, because we are few consumers and few producers, and we can work around mistakes manually on a case by case basis.

To get a real linked data application you need:
 1- data that is produced in a completely decentralised way
 2- data that is linked between those decentralised nodes
 3- data that is consumed, and where the consumption has real world effects
  
 
Number 3 is the recursive feedback piece that will make 1 and 2 come to a point of stability, or meta-stability, as we are dealing with self organising systems here.
 
This can be done with the social web. We need systems where you publishing data means that I can do something, learn something about you, and so on... but without you ever knowing ahead of time what software or services we are using. 
(( The Twitters and other Web2.0 folks have made their life easy by centralising data publishing and consumption as much as possible. For systems like there is no real communication problem: there is a central dictator and he says what the meaning of the terms go. As things evolve that part even escapes him - the way office document formats escaped M$ - because of the huge number of people and software dependent on the initial meaning produced.))

If I write things out wrong, your software should be able to let me know about it. Just as if we organise to meet but we
give each other the wrong address, we will end up missing the meeting. If this were not so then giving out addresses and organising meetings would be a very different exercise.



> There seem to be some axes in the discussion:
> publish / consume
> long/medium term / shorter term
> ideal / pragmatic
> Interestingly, we don't seem to have a strong theory / practice axis, which is great.

yes, my point has been we need to work on small vocabularies, widely distributed, widely used, to kick start the rest of the system

> 
> As a publisher, I/we have had to work pretty hard to conform to really quite complex requirements for publishing RDF as Linked Data; not just Range-14, but voiD, sitemaps and various bits and pieces that Kingsley always tells me to do in the RDF.
> As a consumer, it has been pretty simple: "Well guv, thanks for the URI, here's some RDF."
> It has always been something of a source of angst (if not actual pain) to me that none of the extra work I put into publishing RDF is ever used by me or anyone else, as far as I know.
> In fact, some of the sites I consume actually don't do things "properly" - I might have had to change my consuming systems to cope with this, but I don't, because they already cope fine.
> Why is it not a problem? One obvious reason is that the consuming application is actually looking for specific knowledge about things.

And as pointed out above they are not that distributed, and the consequences of things going wrong on a lot of the open data stack is not that big yet.

> I don't have a consuming system that is considering both lexical and animal subjects, and so confusion does not arise.

Also you are probably not putting up reasoners yet. 

> In fact, it is the predicates that tend to distinguish satisfactorily for me (as has been pointed out by some people).
> Thus, if I get a triple that says the URI that would resolve to my Facebook page foaf:knows the URI that would resolve to your Facebook page, I (my system) will happily interpret that as one person (or whatever) foaf:knows the other. I certainly don't want to go and resolve these to find out to what the URIs actually resolve. And if I did, what would I do about it? Ignore it?

yes, one can do a lot with incoherent data if one ignores the incoherence, or just follows through some networks like that. I try to follow these guidelines more for reasons of sanity. They are simple to follow, and help one think about the issues.

> In fact, as has also been mentioned, you can define domains, ranges and restrictions for as long as you like, but it is quite possible and likely that the users of URIs will continue blissfully unaware of any of this, in exactly the same way that they continue unaware that there might be something ambiguous about the URIs they are using.

That is the problem that only will appear if people don't consume the data, or if the data is known ahead of time to be pretty inconsistent, as dbpedia data probably is.

> 
> By the way, as is well-known I think, a lot of people use and therefore must be happy with URIs that are not Range-14 compliant, such as http://www.w3.org/2000/01/rdf-schema .
> 
> When we help people publish, it really is tough to engage them long enough to care about the complex issues, and they often get it wrong - I am engaged with quite a few people who are now publishing serious amounts of interesting RDF where I have contacted them to try to help. The status of the conversations is that they have fixed what they can, and are now thinking (for a long time) about how they might configure their systems to do it properly - but they may never get there. I will still want to use their RDF.

yes, in these case by case scenarios it is easy for you to write special case filters. And we could do the
same thing with HTML whenever we browse the web too. But the web had an application: the browser that lead to 
feedback effects that increased the coherence of the system.

> So, trying to be a little brief:
> I have always felt that the full Range-14 distinction was in danger of being a Step Too Far.
> Yes, it does matter, and it is likely (or at least possible) we will pay a price in the end.
> But the world is trying to pass us by - it has at least pulled alongside.

It has not passed by, it is not building for the distributed data. The big players are creating silos of information and getting rich of that. But the value of distributed information is much greater than what they are building - even if it is hard to believe. In any case we have no choice: the big guys are rich already. We can either be their slaves or be free by working together, and grow so big together that we tie them into our much larger system :-) 

> We must work out why we seem to have lost any lead we had, because it is likely to be the same reason we will get left behind.

We are not behind. We are way ahead. The arrows in your back are a testament to that :-)

> And I happen to believe that what we have can be better than the alternatives.
> 
> Sorry Pat, I don't actually have a proposal.
> But I do know we need to be liberal in what we consume.
> And we might need to be a bit more liberal in what we praise, or at least be nicer to people who want to publish RDF and don't do Range-14.
> 
> Best
> Hugh
> 
> On 19 Jun 2011, at 05:05, Pat Hayes wrote:
> 
>> Really (sorry to keep raining on the parade, but) it is not as simple as this. Look, it is indeed easy to not bother distinguishing male from female dogs. One simply talks of dogs without mentioning gender, and there is a lot that can be said about dogs without getting into that second topic. But confusing web pages, or documents more generally, with the things the documents are about, now that does matter a lot more, simply because it is virtually impossible to say *anything* about documents-or-things without immediately being clear which of them - documents or things - one is talking about. And there is a good reason why this particular confusion is so destructive. Unlike the dogs-vs-bitches case, the difference between the document and its topic, the thing, is that one is ABOUT the other. This is not simply a matter of ignoring some potentially relevant information (the gender of the dog) because one is temporarily not concerned with it: it is two different ways of using the very names that are the fabric of the descriptive representations themselves. It confuses language with language use, confuses language with meta-language. It is like saying giraffe has seven letters rather than "giraffe" has seven letters. Maybe this does not break Web architecture, but it certainly breaks **semantic** architecture. It completely destroys any semantic coherence we might, in some perhaps impossibly optimistic vision of the future, manage to create within the semantic web. So yes indeed, the Web will go on happily confusing things with documents, partly because the Web really has no actual contact with things at all: it is entirely constructed from documents (in a wide sense). But the SEMANTIC Web will wither and die, or perhaps be still-born, if it cannot find some way to keep use and mention separate and coherent. So far, http-range-14 is the only viable suggestion I have seen for how to do this. If anyone has a better one, let us discuss it. But just blandly assuming that it will all come out in the wash is a bad idea. It won't. 
>> 
>> Pat
>> 
>> On Jun 18, 2011, at 1:51 PM, Danny Ayers wrote:
>> 
>>> On 17 June 2011 02:46, David Booth <david@dbooth.org> wrote:
>>> 
>>>> I agree with TimBL that it is *good* to distinguish between web pages
>>>> and dogs -- and we should encourage folks to do so -- because doing so
>>>> *does* help applications that need this distinction.  But the failure to
>>>> make this distinction does *not* break the web architecture any more
>>>> than a failure to distinguish between male dogs and female dogs.
>>> 
>>> Thanks David, a nice summary of the most important point IMHO.
>>> 
>>> Ok, I've been trying to rationalize the case where there is a failure
>>> to make the distinction, but that's very much secondary to the fact
>>> that nothing really gets broken.
>>> 
>>> Cheers,
>>> Danny.
>>> 
>>> http://danny.ayers.name
>>> 
>>> 
>> 
>> ------------------------------------------------------------
>> IHMC                                     (850)434 8903 or (650)494 3973   
>> 40 South Alcaniz St.           (850)202 4416   office
>> Pensacola                            (850)202 4440   fax
>> FL 32502                              (850)291 0667   mobile
>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>> 
>> 
>> 
>> 
>> 
>> 
> 
> -- 
> Hugh Glaser,  
>              Intelligence, Agents, Multimedia
>              School of Electronics and Computer Science,
>              University of Southampton,
>              Southampton SO17 1BJ
> Work: +44 23 8059 3670, Fax: +44 23 8059 3045
> Mobile: +44 75 9533 4155 , Home: +44 23 8061 5652
> http://www.ecs.soton.ac.uk/~hg/
> 
> 
> 

Social Web Architect
http://bblfish.net/
Received on Sunday, 19 June 2011 11:44:47 UTC