Re: Named graphs etc from Patrick Stickler on 2004-03-09 (www-archive@w3.org from March 2004)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Tue, 9 Mar 2004 10:12:03 +0200
To: "ext Chris Bizer" <chris@bizer.de>
Cc: <www-archive@w3.org>, "ext Jeremy Carroll" <jjc@hplb.hpl.hp.com>, "ext Pat Hayes" <phayes@ihmc.us>
Message-Id: <72B7DF40-71A1-11D8-B899-000A95EAFCEA@nokia.com>
On Mar 08, 2004, at 16:06, ext Chris Bizer wrote:

> Hi Patrick,
>
>> That said, I'm starting to appreciate some of Chris' arguments about
>> all statements being asserted, no matter what.
>>
>
> The argument isn't that they are all asserted, but that they are 
> uncertain
> until the user applies a trust function to them. I think it is a three 
> step
> process:
> 1. Graphs published on the Semantic Web are not asserted but uncertain 
> to
> the user.
> 2. Before a user does something with the information, he applies a
> subjective and task-specific trust function (or policy) to the 
> information.
> There is a wide range of different functions possible which take 
> provenance,
> the autor's reputation, related information published by other authors 
> into
> account.
> 3. After applying the trust function, the user treats the information 
> as
> asserted, keeping in mind that there is still the risk that it is 
> wrong.

This seems to blur two distinctions that I've held as important (1) 
whether
some statement is intended to represent a fact, or is simply in the form
of a statement for whatever purpose, and (2) whether a given statement 
is
trusted, based on the source of the statement.

In the case of non-asserted statements, i.e. statements that simply are
in the form of a statement but not intended to represent an actual fact,
trust is not an issue (though it is for asserted statements about 
unasserted
statements).

What I thought you were proposing, and what seemed like a useful 
generalization,
is to presume that all triples are asserted (requiring other forms of 
expression
to capture unasserted statements, such as via RDF reification), and 
focus on
qualification of the asserted statements (either individually or per an 
entire graph).

But basing the assertiveness of statements on functions of trust seems 
an
odd approach and one that doesn't reflect the way people behave. People
simply don't care about whether they trust unasserted statements (even 
if
they care about whether they trust asserted statements about those 
unasserted
statements).

I saw some utility in presuming that all statements/graphs are asserted 
(which
I think has some support in the RDF specs) and letting folks use 
reification
and even RDF/XML serialization to refer to forms of statements which
are not asserted.

>
>> I still have some questions about how to "bootstrap" trust, such that
>> it seems there must be some requirement for each graph to contain
>> statements reflecting its source/authority (a signature perhaps?)
>> otherwise, how do you anchor your trust in terms of a given graph?
>>
>
> Not a strict requirement. I think a trust architecture shouldn't 
> strictly
> require anything but use all trust relevant information it can get.
>
> There are different possibilities how provenance information could be
> attached to graphs:
> 1. The author of the graph attaches provenance information and might 
> also
> sign the graph.
> 2. The crawler (or other information access architecture) that collects
> published information adds the information where it found the data.

Right. But in both cases, you have bootstrapped the trust architecture
with external or atomic machinery: either the signature or 
system-maintained
provenance information.

One thought that occurred to me is that it might be cleaner/better to
bootstrap the trust architecture using a specialized vocabulary which
applies *only* to the graph in which statements using that vocabulary
occur.

This could also serve to allow folks to specify whether a graph is
asserted or not.

Thus, the name of the graph is used in statements within that graph, and
statements using a special property, e.g. x:isAsserted would
only be interpretable if the subject of the statement is the
same graph containing that statement. E.g.

:graphA {
      :graphA x:isAsserted x:true .
}

would mean that :graphA is asserted. But the statement in the
following :graphB

:graphB {
      :graphA x:isAsserted x:true .
}

would simply have no interpretation regarding assertion because
the semantics of the predicate are only defined iff the subject
of the statement is the graph in which the statement occurs.

Similar properties, constrained to having interpretations only when
the subject is the graph containing the statement, could be defined
to express source, authority, signatures, etc.

All of these special properties could be members of a special class,
e.g. x:GraphProperty, which would be the focal point for defining the
constraining semantics requiring the subject to be the graph containing
the statement in which the property is used.

Such a vocabulary/semantics would then allow folks to bootstrap both
the assertion and trust machinery, and systems employing that machinery
would have simple and consistent implementational requirements.

It may also be convenient (or even necessary?) to define a special URI
that has an interpretation similar to that of 'localhost', e.g.
x:thisGraph so that folks can assert, sign, and specify provenence
information without having to actually name the graph -- and the special
URI is implicitly bound to a distinct blank node for each graph to
keep its interpretation distinct between graphs.

???

If a system scraping knowledge from various sources encounters graphs
without any qualification (source, authority, signature, etc.) 
whatsoever,
then it can in another graph make qualifying statements about the
scraped graph using another vocabulary -- and then employ both 
vocabularies
together to ultimately decide about trust.

>
> This information can afterwards be used in trust evaluations like "Use 
> only
> data that has been signed by authors I know" or "Use all information, 
> no
> matter if it is signed and not matter from which source or author it
> originates".  The first policy is obviously stricter.
>
> The attached WWW2004 poster describes these ideas in more detail.

I'll have a look. Thanks.

Patrick



>
> Chris
>
> Jeremy: You will get the paper outline this afternoon.
> <bizer-www2004.pdf>

--

Patrick Stickler
Nokia, Finland
patrick.stickler@nokia.com
Received on Tuesday, 9 March 2004 03:12:41 UTC