Re: Squaring the HTTP-range-14 circle from Giovanni Tummarello on 2011-06-16 (public-lod@w3.org from June 2011)

From: Giovanni Tummarello <giovanni.tummarello@deri.org>
Date: Thu, 16 Jun 2011 19:53:23 +0200
To: Tim Berners-Lee <timbl@w3.org>
Cc: Richard Cyganiak <richard@cyganiak.de>, public-lod@w3.org, Christopher Gutteridge <cjg@ecs.soton.ac.uk>
Message-ID: <BANLkTi=JgojRfW+xjLOZs8mwCG6BaRL5Kw@mail.gmail.com>
Hi Tim ,

"documents" per se (a la HTTP response 200 response) on the web are less and
less relevant as opposed to the "conceptual entities" that are represented
by this document and  held e.g. as DB records inside CMS, social networks
etc.

e.g. a social network is about "people" those are the important entities.
Then there might be 1000 different HTTP documents that you can get e.g.i f
you're logged if you're not logged, if you have a cookie if you have another
cookie, if you add &format=print. Specific URLs are pretty irrelevant as
they contain all sort of extra information.

Layouts of CMS or web apps change all the time (and so do the HTML docs) but
not the entities.

that's why "http response 200" level annotations are of such little
ambiguity really you say you have so many annotations about documents, i
honestly dont understand what you're referring to, are these HTTP
retrievable documents? where are the annotations? are we talking about the
http headers? about the "meta" tags in the <head> these are about the
subject of the page too most of the time, not the page itself.

and this is the idea behind schema.org (opengraph whatever) which sorry Tim
you have to live with and we have to do the most with.

When someone refers to a URL which embeds a opengraph or
schema.organnotation then it is 99.+ (with the number of 9 augmenting
as the web
evolves to a rich app platform) certain that they refer to the entity
described in it and not to the web document itself (which can and does
change all the time and is of overall no conceptual relevance).

With respect to schema.org, we (as semantic web community) have not been
ignored: our work and proposals have been very well considered and then
diregarded alltogether - and for several reasons : 12 years of work, not an
agreement on ontology, not an easy way for people to publish data ( the 303
thing is a complete total utter insanity (as i had said in vain so many
times) ). etc.

So, think of how browsers work: they fix all the broken HTML markup doing
what it takes to undertand more or less the intention behind the broken
markup.

The same will exactly happen with applications that work on semantic markup
at web scale. they will do the specific cleanups and adaptations as they
need.

*the UPSIDE* of this is that RDF is a totally cool technology which can most
of the time "rule them all" .

Sindice is entirely RDF based, but then reads and processes microformats,
RDF, RDFa, and next week schema.org too microdata. So long life to all
really.

Fights work fighting: having RDFa play well along schema.org so that
schema.org tags can be written in RDFa and search engines will still read
it. This will allow people to still use rich representations and
vocabularies while not loosing compatibilities with the mainstream apps
which will be developed for schema.org compatible pages.

Gio








On Thu, Jun 16, 2011 at 7:04 PM, Tim Berners-Lee <timbl@w3.org> wrote:

> I disagree with this post very strongly, and it is hard to know where to
> start,
> and I am surprised to see it.
>
> On 2011-06 -13, at 07:41, Richard Cyganiak wrote:
>
> > On 13 Jun 2011, at 09:59, Christopher Gutteridge wrote:
> >> The real problem seems to me that making resolvable, HTTP URIs for real
> world things was a clever but dirty hack and does not make any semantic
> sense.
> >
> > Well, you worry about *real-world things*, but even people who just worry
> about *documents* have said for two decades that the web is broken because
> it conflates names and addresses.
>
> No, some people didn't get the architecture in that they had learned
> systems where there that
> there was a big distinction between names and address, and they had
> different properties,
> and then they came across URIs which had properties of both.
>
>
> > And they keep proposing things like URNs and info: URIs and tag: URIs and
> XRIs and DOIs to fix that and to separate the naming concern from the
> address concern. And invariably, these things fizzle around in their little
> niche for a while and then mostly die, because this aspect that you call a
> “clever but dirty hack” is just SO INCREDIBLY USEFUL. And being useful
> trumps making semantic sense.
>
> I agree ... except that ther URI architectre being like names and like
> addresses isn't a "clever but dirty hack".
>
> You then connect this with the idea of using HTTP URIs for real-world
> things, which is a separate queston.
> This again is a question of architecture. Of design of a system.
> We can make it work either way.
> We have to work out which is best.
>
> I don't think 303 is a quick and dirty hack.
> It does mean a large extension of HTTP to be uses with non-documents.
> It does have efficiency problems.
> It is an architectural extension to the web architecture.
>
> >
> > HTTP has been successfully conflating names and addresses since 1989.
>
> That is COMPLETELY irrelevant.
> It is not a question of the web being fuzzy or ambiguous and getting away
> with it.
> It is a clean architecture where the concepts of "name" and "address" don't
> connect directly with those of people or files on a disk or IP hosts.
>
>
> >
> > There is a trillion web pages out there, all named with URIs. And even if
> just 0.1% of these pages are unambiguously about a single specific thing,
> that gives us a billion free identifiers for real-world entities, all
> already equipped with rich *human-readable* representations, and already
> linked and interconnected with *human-readable*, untyped, @href links.
> >
> > And these one billion URIs are plain old http:// URIs. They don't have a
> thing:// in the beginning, nor a tdb://, nor a #this or #that in the end,
> nor do they respond with 303 redirects or to MGET requests or whatever other
> nutty proposals we have come up with over the years to disambiguate between
> page and topic. They are plain old http:// URIs. A billion.
> >
> > Then add to that another huge number that already responds with JSON or
> XML descriptions of some interesting entity, like the one from Facebook that
> Kingsley mentioned today in a parallel thread. Again, no thing:// or tdb://
> or #this or 303 or MGET on any of them.
> >
> > I want to use these URIs as identifiers in my data, and I have no
> intention of redirecting through an intermediate blank node just because the
> TAG fucked up some years ago.
>
> If you want to give yourself the luxury of being able to refer to the
> subject of a webpage, without having to add anthing to disambiguate it from
> the web page, then for the sake of your system, so you can use the billion
> web pages for your purposes, then you now stop other like me from using
> semantic web systems to refer to those web pages, or in fact to the other
> hundred million web pages either.
>
> Maybe you should an efficient way of doing what you want without destroying
> the system (which you as well have done so much to build)
>
>
>
> >
> > I want to tell the publishers of these web pages that they could join the
> web of data just by adding a few @rels to some <a>s, and a few @properties
> to some <span>s, and a few @typeofs to some <div>s (or @itemtypes and
> @itemprops). And I don't want to explain to them that they should also
> change http:// to thing:// or tdb:// or add #this or #that or make their
> stuff respond with 303 or to MGET requests because you can't squeeze a dog
> through an HTTP connection.
>
> Well actually I really want them to put metadata about BOTH the document
> and its subject.
>
> There is masses of metadata already about documents.
>
> Now you want to make it ambiguous so I don't know whether it is about the
> document or its subject?
>
> I don't think something like about="#product" is rocket science or
> unnatural.
>
> I really want people to be able to use RDF or microdata to say things about
> more than one thing in the same page
>
> >
> > And here you and Pat and Alan (and TimBL, for that matter) are preaching
> that we can't use this one billion of fantastic free URIs to identify things
> because it wouldn't make semantic sense.
>
> We are saying that actually we already are using them to refer to the web
> pages and that that is very important and so is all the existing web.
>
> >
> > Being useful trumps making semantic sense.
>
> That is romantic nonsense.  To be useful you need clean extensible
> architecture,
> well defined concepts.
>
> > The web succeeded *because* it conflates name and address.
>
> That is completely irrelevant nonsense.
>
>
> It succeeded with a clean architecture using URIs for web pages,
> and the # as punctuation syntax between the identifier of the page and the
> local identifier within the page.
>
>
> > The web of data will succeed *because* it conflates a thing and a web
> page about the thing.
> >
> > <http://richard.cyganiak.de/>
> >    a foaf:Document;
> >    dc:title "Richard Cyganiak's homepage";
> >    a foaf:Person;
> >    foaf:name "Richard Cyganiak";
> >    owl:sameAs <http://twitter.com/cygri>;
> >    .
> >
> > There.
> >
> > If your knowledge representation formalism isn't smart enough to make
> sense of that, then it may just not be quite ready for the web, and you may
> have some work to do.
>
> Formalisms aren't smart.
> Sure, I can make a program to make sense of that.
> But I'm not going to just to save you the effort of getting it right.
>
> Disappointed by the intensity of your posting.
> Systems have managed for a long time to distinguish between library car and
> book,
> between message header and message,
> between a book and its subject.
>
> Now we have masses of information about many books
> and about many other things we have great value in it
> Let's not mess it up.
>
> If you want an ambiguous source of information, use natural language.
> The power of data is that is a whole lot less ambiguous.
>
> Tim
>
> >
> > Best,
> > Richard
> >
>
>
>
Received on Thursday, 16 June 2011 17:54:12 UTC