Re: SemWeb Non-Starter -- Distributed URI Discovery from Charles McCathieNevile on 2005-04-11 (www-rdf-interest@w3.org from April 2005)

From: Charles McCathieNevile <charles@sidar.org>
Date: Mon, 11 Apr 2005 21:36:36 +1000
To: "Patrick Stickler" <patrick.stickler@nokia.com>
Cc: semantic-web@w3.org, "Stephen Rhoads" <rhoadsnyc@mac.com>, www-rdf-interest@w3.org, "ext Jeremy Carroll" <jjc@hplb.hpl.hp.com>, "ext Miles, AJ (Alistair)" <A.J.Miles@rl.ac.uk>
Message-ID: <op.so2fvahsw5l938@researchsft>
On Mon, 11 Apr 2005 04:02:28 +1000, Patrick Stickler  
<patrick.stickler@nokia.com> wrote:

>
>
> On Apr 10, 2005, at 11:16, ext Charles McCathieNevile wrote:
>
>> On Mon, 04 Apr 2005 19:04:28 +1000, Patrick Stickler  
>> <patrick.stickler@nokia.com> wrote:
>>
>>>
>>>
>>> On Apr 4, 2005, at 11:31, ext Jeremy Carroll wrote:
>>>
>>>> Al Miles wrote:
>>>>>> Second, there is a more general discovery requirement which can be  
>>>>>> loosely phrased as, 'I want to find out what x said about y,' or,  
>>>>>> 'who said what about what?'  I have no ideas for how to solve that.
>>>
>>> Hmmm...  I'm trying to grok how I might best rephrase the
>>> question in a more practical form to avoid philosophical
>>> nuances.
>>>
>>> Perhaps "what is the core, authoritative body of knowledge provided by
>>> the owner of this URI which describes the resource identified by the  
>>> URI?"

>> Actually Alistair poses two questions, one of wich strikes me as to  
>> general to be one of the "most right" ones :-). I think that Patrick's  
>> question is also somewhat general - what does the author of X have to  
>> say about X is a pretty unconstrained question. It strikes me as  
>> something that is often interesting, but is the kind of question I  
>> would avoid asking if I were trying to do any specific work.
>
> So you are not interested in using any authoritative
> assertions in any of your work?

I didn't say that, and it would not be true as an assertion.

What I said is that "the core, authorative knowledge provided by the  
author" strikes me as a fairly ill-specified collection, whose usefulness  
is somewhere more than "everything everyone said", but in many cases I can  
think of nowhere near as useful as the response to a particular set of  
questions.

>>>> A google like system seems to be a plausible answer, we just need an  
>>>> economic model for it.
>>>
>>> A google like system is certainly a part of the answer, but we
>>> also need access to authoritative descriptions of resources in
>>> an analogous manner to how we now have access to authoritative
>>> representations.
>>
>> I'm not sure this is true,

> The issue is ... about
> depending on centralized repositories to serve us data rather
> than going to the authoritative sources directly.
>
> Both approaches will be useful. And both will be necessary.

As steps towards implementation, I can certainly see a case for something  
like URIQA. It strkes me as a relatively simple bootstrap, but also has  
some clear limits.

I am not sure, in the long run, that it is necessary. (So a question  
underlying this whole thread for me is "what is the long term value of  
URIQA?"). If a more general query service can answer queries along the  
lines of "what statements about this document exist with the same  
signature as the document itself"? then the value of a particular service  
that answers only that question seems far less (although clearly non-zero).

> And
> it will be the direct, distrubuted access via web authorities
> that feeds/builds those centralized repositories, but also
> providing checks and balanaces against fraud and (data) corruption.

In order to "guarantee" the reliability of the data I am more likely to  
rely on a reasonably well-secured framework for general queries than on a  
system which relies on restricted access, but doesn't have any further  
security infrastructure. Although as you have noted this is likely to  
arrive in order to support security for high-value transactions, it  
strikes me that we are well on the way there already.

>>  After all, RDF should be capable of defining various "authority"  
>> relations, so you just describe the kind of authorative that you mean  
>> as part of your query, no?
>
> Yes, but this is entirely separate from where that knowledge is
> obtained. No matter where the knowledge is obtained, we will need
> to be able to authenticate it.

Agreed. I think that is almost orthogonal to our discussion, and I believe  
that it is a critical requirement whose current lack prevents the semantic  
web growing in a whole lot of places...

> (though, note that knowledge obtained directly from the web
> authority brings with it a defacto form of authentication
> and validity -- even if that will not be sufficiently robust
> for some applications, e.g. financial transactions, etc.)

Sure, but I think this is about getting over an initial implementation  
hurdle.

>>> One reason why the web is a success is because it is distributed,
>>> not centralized. One does not have to be aware of third party
>>> centralized repositories of representations in order to ask for
>>> one, given a particular URI. One just asks the web authority of
>>> the URI. Yes, centralized repositories (or indexes) of representations
>>> such as google are tremendous tools, but they simply augment the
>>> fundamental architecture of the web.

Actually I think that at Web scale, they are a fundamental part of the  
architecture - or of the furniture, if you will. Just as browsers are  
replaceable, so long as you have one. In many or most cases of serious web  
users I think that "given a URI" is not a condition that arises until you  
poke into and index or search engine.

>>> GET is to MGET as GOOGLE is to SPARQL

I am not at all sure that I understand the analogy.

>> If I could rely on that data to answer a handful of chosen questions  
>> then I can see it being more useful, and if I could know in advance  
>> when it would be useless that would be even better.
>
> It may be more useful for some applications to query
> third party sources -- but where shall those third
> party sources get their knowledge, in a manner that
> is traceable to its authoritative source?

By being the source of the knowledge. People or agents producing new and  
orginal data, signing it, publishing it, and making enough pointers that  
people can find and harvest it.

As near as I understand the process, MGET relies on a combination of the  
convenience that many RDF URIs happen to be resolvable through HTTP GET to  
something related, and the fact tat there are services crawling the HTML  
web and creating maps of its URIs that can be fed to a URIQA server.

> Again, it's not either-or. Rather, google'esque crawlers
> can harvest authoritative knowledge from the web authorities
> of URIs, recording the source/authority of the knowledge
> harvested using e.g. signed graphs, and make that knowledge
> available in a centralized knowledge base.

OK, but I don't see that this scenario is particular to URIQA - it applies  
to any RDF harvesting process. Perhaps the difference is that MGET allows  
you in many cases to use the maps produced of the document Web as a source  
of things to navigate.

> If anyone questions the validity, freshness, or completeness,
> of that third-party-maintained knowledge, they can check
> with the authority directly.

Sure. If the source information is maintained. Obviously this is something  
that URIQA makes easy, since you just use a different method.

>>> But a centralized solution to knowledge discovery cannot be
>>> the foundational or primary solution if we are to see
>>> global, ubiquitous scalability achieved for the SW in the
>>> same manner as has been achieved for the web.
>>
>> Right. Of course a lot depends on what you mean by "a centralised  
>> solution"...
>
> By 'centralized' I mean that (efficient) access to knowledge
> must be via third parties, not directly from the web authority
> of the URI identifying the resource in question.

Oh. We have a quite different use of the term then. (My default meaning is  
not quite diametrically opposite, but the idea that the publisher of any  
URI is automatically regarded as the authorative source of information  
about that URI seems pretty centralised to me, even if only locally...) I  
have tried above to be consistent with your usage as I understand it...

cheers

Chaals

-- 
Charles McCathieNevile                      Fundacion Sidar
charles@sidar.org   +61 409 134 136    http://www.sidar.org
Received on Monday, 11 April 2005 11:39:25 UTC