Re: Named graphs etc from Patrick Stickler on 2004-03-12 (www-archive@w3.org from March 2004)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Fri, 12 Mar 2004 11:29:06 +0200
To: "ext Pat Hayes" <phayes@ihmc.us>
Cc: <www-archive@w3.org>, "ext Jeremy Carroll" <jjc@hplb.hpl.hp.com>, <chris@bizer.de>
Message-Id: <B5943B66-7407-11D8-BD2B-000A95EAFCEA@nokia.com>
On Mar 11, 2004, at 20:27, ext Pat Hayes wrote:

>
>> On Mar 10, 2004, at 19:00, ext Pat Hayes wrote:
>>
>>>>> You can't bootstrap assertion/trust/etc. by putting stuff in 
>>>>> graphs. These are all ABOUT graphs, so for example if you don't 
>>>>> trust graphs, then having them say 'trust me!!' isn't going to 
>>>>> make you trust them more.
>>>>
>>>> I would consider the same argument to then be valid against an XML 
>>>> attribute.
>>>
>>> No, because XML attributes aren't required to have truth-functional 
>>> semantics, as RDF is. They can 'mean' anything.
>>
>> See my recent discussion about a distinct "bootstrapping 
>> interpretation" phase
>> prior to a full RDF interpretation.
>>
>> I look at this bootstrapping phase as analogous to processing of XML 
>> attributes
>> prior to full RDF interpretation, but using statements.
>
> But this isnt feasible. That is, it might be OK to use statements to 
> express this, except that once you put stuff in RDF statements, it has 
> to obey the very generic 'logical rules' of RDF-style assertions. So 
> for example suppose you can *infer* such a triple from some exotic 
> piece of RDF/OWL reasoning? There should be no difference in meaning 
> between this and having it written right into the graph.

But such reasoning would not be licensed by the semantics of the 
bootstrapping
interpretation, and if later such statements are inferred, they are 
innocuous
because if you infer *new* statements you have a *different* graph!

The bootstrapping interpretation employs *no* closure rules. It 
examines the
graph at face value and determines if the graph is explicitly asserted, 
per
the bootstrapping vocabulary, and what other qualifications it has, such
as authority, signature, etc. That's *it*. If later some OWL reasoner 
infers
things about a graph, so what, those things are probably right, but the
bootstrapping interpretation will not take them into consideration since
you no longer are considering that original graph, but some derivative
or superset of it which is *another* graph.

This *really* could work. I think you are reading far too much into 
what I
am actually proposing.

>
> Talking about 'phases' sounds to me like wanting RDF to be more like a 
> programming language. Sorry, too late for that. If its not part of RDF 
> then its not part of RDF, even if it looks like RDF.

No, no, no.

The bootstrapping interpretation is just a special interpretation of
a literal graph based on a special vocabulary. Period. Take any graph
whatsoever, which is the product of any amount of merging, selection,
inference, etc. and you can apply that interpretation, because that
interpretation is applied to a static graph independent of any RDF
or OWL reasoning, closure rules, etc.

It uses a subset of the RDF MT to test particular triples in a 
particular
way to allow an agent to make a dermination about the assertiveness and
authenticity of that *particular* graph -- not any other potential graph
that might be obtained by reasoning about the graph in question.

>
>> If the boostrapping interpretation phase succeeds, then those 
>> bootstrapping
>> statements should still be true and valid for the full RDF 
>> interpretation.
>>
>>>
>>>>
>>>> The point is that it is not the graph saying "trust me", but the 
>>>> creator/owner/publisher
>>>> of the graph -- presuming there is the machinery to bootstrap a 
>>>> layer of authentication.
>>>
>>> But the whole RDF design is predicated on the possibility of triples 
>>> being freely transmittable, mergeable with other graphs, etc. etc. 
>>> So what happens to the triple "I assert foo" when it gets imported 
>>> from my graph into your graph? It no longer refers to the owner of 
>>> THAT graph.
>>
>> I think we need to require explicit graph naming, possibly even using 
>> URIs,
>> to deal with issues of merging.
>>
>> If some bootstrapping statement has a particular graph as its 
>> subject, then
>> if that graph is merged into another graph, (a) the statement will no 
>> longer
>> be relevant for bootstrapping of the new graph since its subject 
>> won't be
>> the same as the new graph, and (b) if the resultant graph is deemed 
>> to be asserted
>> and trustworthy by the agent, then the statement should in any case 
>> remain
>> valid insofar as its qualification of the earlier merged graph.
>>
>> The bootstrapping tests are done at points of entry into an agent's 
>> knowledge
>> base. If the bootstrapping succeeds, then they are later innocuous if 
>> merged
>> into some other graph. Yet if the agent has a knowledge base that 
>> preserves
>> graph membership of statements (quads, quintuples, whatever) then 
>> specialized
>> trust machinery can continue to use those statements to filter its 
>> views
>> of what is or is not relevant to a particular application.
>
> Maybe. Im not convinced: and in any case this is a lot more of a 
> change to RDf then, say, adding an XML property.

It's *no* change to RDF, IMO. It is a separate, distinct interpretation 
of
a static graph per a particular vocabulary.

Yes, the semantics of that special bootstrapping interpretation are 
over and
above the RDF and OWL MTs, but they are not contrary nor require any 
change
to the RDF or OWL MTs.

In essence, this specialized bootstrapping vocabulary and 
interpretation is
equivalent in functionality to XML attributes -- i.e. they are 
interpreted
in a special way *before* normal RDF/OWL interpretation -- but in a way
that is compatible with (a subset of) RDF interpretation.

>
>>
>> I agree, though, that having any kind of wildcard such as x:thisGraph 
>> would
>> add alot of difficulties without direct modification to the presently 
>> defined
>> RDF merge operation and probably is best avoided.
>>
>>>
>>> If we are going to put the 'asserting' into the RDF itself, then we 
>>> need a way to refer to the asserter that is reliably preserved when 
>>> triples (not just whole graphs) are transmitted across the Web. 
>>> Seems like we need to use URIs.
>>
>> Perhaps. I'd think, though, that blank nodes should also suffice. The
>> key is that the identity of each graph remains distinct no matter how 
>> many
>> merges/extractions occur.
>>
>> That *should* be the case for blank nodes, yes?
>
> No. Taking a blank node outside its 'home' graph is either meaningless 
> or has to be regarded as taking a re-named blank node into the other 
> graph. Its a bare existential statement.

Hmmmm....

If we have some resource that is denoted by a blank node, and statements
about that resource migrate from graph to graph to graph by 
merges/selects/etc.
and the denotation of that blank node remains intact and the 
interpretation
of those statements remains consistent, then it seems to me that 
(regardless
of how each system keeps track of the unique identity of that blank 
node)
the node itself has moved from graph to graph to graph.

Eh?


>
>>
>>>
>>>> If you have some statement, the interpretation of which is 
>>>> constrained to the graph
>>>> in which it occurs (and there are then some issues relating to 
>>>> graph merging that
>>>> need to be addressed, of course, see below) then that is, I think, 
>>>> functionally
>>>> equivalent to an XML attribute value expressed in the serialization 
>>>> of the graph.
>>>>
>>>> Essentially you are qualifying the graph with statements about the 
>>>> graph which are
>>>> only relevant for that graph.
>>>
>>> Which breaks the Web architecture assumed for RDF, seems to me.
>>
>> Not at all. See below.
>>
>>> Graphs are only abstractions, and sets of triples can be merged or 
>>> cut up at will by applications.
>>
>> Yes, but if a particular graph is modified in any way, it is no longer
>> the same graph and any new graph should have an identity distinct from
>> the original graph and thus any statements qualifying the original 
>> graph
>> are irrelevant for any other graph and thus are innocuous if included 
>> in a
>> new graph and represent no loss of information if omitted from the 
>> new graph.
>
> OK, fine as long as we have URI-style names that really do identify 
> graphs, and don't use indexical-style constructions like 'this 
> graph...'

Agreed.

And in fact, I'd even be happy to restrain the machinery to URI denoted
graphs (though I'm wondering how to actually do that in RDF since 
rdfs:range
seems incapable of excluding class members denoted by blank nodes rather
than URIs...)

>
>>
>> If some agent syndicates knowledge from multiple graphs, it will 
>> perhaps
>> test the assertiveness and authenticity/trust of those input graphs, 
>> do
>> its modifications, and when passing along a new graph to some other 
>> agent,
>> will qualify that *new* graph per it's own identity/credentials.
>>
>> So if graph :X has a statement
>>
>>    :X trix:asserted "true"^^xsd:boolean .
>>
>> and that statement is merged into graph :Y, so what. An agent 
>> determining
>> the assertiveness of :X will not be looking at graph :Y. And an agent
>> determining the assertiveness of graph :Y will disregard statements
>> about other graphs.
>
> But why would it?

Because the semantics of trix:asserted *says* that it has no meaning
if the subject is not the graph containing the statement. I.e. if the
agent understands the vocabulary (and no agent should do things if the
don't grok the vocabulary) then it simply *won't* be able to interpret
a statement using trix:asserted which is qualifying some other graph.

> That is, the claim in this triple - that X asserted a graph - is still 
> TRUE even if its in another graph asserted by someone else. And if its 
> true, and someone is asserting it, what's wrong with that?

Even if we allow for such a statement to be considered true, the trust
machinery may choose to disregad any such third-party statements and
rely solely on the bootstrapping interpretation applied to each 
individual
graph it encounters before syndicating that knowledge to whatever 
purpose.

> Why should it be required to be ignored? I might want to reason about 
> who is asserting what. Trust reasoners will probably have to reason 
> about stuff like that; the agent-policy reasoners people are writing 
> are doing stuff like this in OWL already.

Again, this is about authoritative assertion/authenticity, not third 
party.

Fine, let third parties say that other graphs are asserted. Let trust 
policies take
such statements into account. But without the means of determining 
assertiveness
and authenticity of each graph in isolation, I think we will have 
failed to
properly ground the architecture.

>
> I don't think it is feasible to say that a triple is true in one graph 
> but not true in another.

OK. Say it is true in both graphs, but not authoritative in any graph
but the one it qualifies.

If it's in some other graph than that it qualifies, it's a third-party 
statement
and may be disregarded on that basis (or not).

>
>> Thus, the inclusion of the statement in graph :Y
>> is innocuous.
>
> OK, but its still meaningful, so should not be required to be ignored.

OK, let's say that a trust policy has the ability and right to 
ignore/disregard
third party statements about other graphs than the one in which they 
occur
themselves.

:X { :X trix:asserted "true"^^xsd.boolean }  -> authoritative statement
:Y { :X trix:asserted "true"^^xsd.boolean }  -> third party statement

Both can be taken as 'true' per the RDF MT, but one may be 
trusted/accepted
and the other rejected by a particular trust policy.

And it is this distinction between authoritative and third party 
assertion
that I consider to be crucial to properly ground the architecture.

You could even have the following:

:X { :X trix:asserted "false"^^xsd.boolean }  -> authoritative statement
:Y { :X trix:asserted "true"^^xsd.boolean }   -> third party statement\

and because you trust the authority of graph Y more than the authority
of graph X, or because the authority of graph Y recommends presuming
graph X is asserted for a given application/reason, you may in fact
treat graph X as asserted. But it is then clear that you are acting
contrary to the intentions of the authority of graph X (which may be
OK in that particular case).

But because the authoritative vs. third-party distinct is maintained,
we have the foundation we need for determining authenticity and
accountability, which I consider to be essential components of any
trust model.


>
> Look, there's a basic point here. If we say that what makes an 
> assertion genuine is that we can find out who is making it, and if we 
> can find out who is asserting an "X asserts.." triple by looking at 
> who owns the graph that it occurs in, then we don't need the "X 
> asserts..." triple in the first place: it's redundant. We can take any 
> graph and see who owns it, and they are doing the asserting: end of 
> story.

I agree. What I thought we have been discussing is *how* to determine 
who has asserted
a given graph, and how we know that they actually asserted it.

> (We might want to allow a non-assert form of publication for things 
> like test cases, but they are the minority.)  Or, on the other hand, 
> if some such trick isn't reliable, then its not reliable for the "X 
> asserts ..." triple either, since there's nothing to stop Y from 
> saying "X asserts ...".

Yes, there is. It's the particular constraints applied by the 
bootstrapping
interpretation. Assertions in one graph about another graph simply are 
out
of scope for that interpretation.

Y can say whatever it wants, but that won't result in an agent getting
confused about what the authority of X says about X.

> You can't have it both ways. Either we have some way to check on who 
> is saying what, and if we have that then we can apply it without any 
> special vocabulary;

The *point* of the special vocabulary is to check on who is saying what.

> or we don't, in which case the use of vocabulary to check who is 
> making claims is way too fragile to be used.

I don't think you've fully understood what I've been proposing.

Maybe this email medium is just not working.

>
>> The semantics of the trix: vocabulary will make it quite clear that 
>> agents
>> should not believe/trust any statements made about a graph which occur
>> in another graph. And agents won't be able to effectively employ trust
>> machinery without, I think, additional support for maintaining graph
>> membership, so less capable (i.e. "vanilla") RDF/OWL applications 
>> won't
>> be able to make anything of those statements anyway unless working
>> exclusively on a graph by graph basis.
>
> I think that the only genuinely secure way to do this will be to have 
> the ultimate warrants of assertion to be physically secure, under 
> trusted control. To do that effectively requires external assertion (A 
> can assert B) since it seems to me to be extremely likely that A and B 
> here will have conflicting network requirements.

???

Are you talking about agents or graphs? Who/what is A? Who/what is B?

>
>> Agents who are savvy about the bootstrapping vocabulary may even 
>> choose
>> to filter out such statements when doing graph merging, knowing that
>> they are irrelevant to any graph but that within which they were
>> originally asserted.
>>
>>>
>>>> Now, when it comes to graph merge, the only trick bit I see are 
>>>> occurrences
>>>> of a "wildcard" term such as x:thisGraph, which would probably have 
>>>> to be mapped
>>>> to either a blank node, or if the graph is named, to the explicitly 
>>>> defined
>>>> name of the graph. If the knowledge store keeps track of which 
>>>> statements
>>>> belong to which graphs (e.g. via quads rather than triples) then 
>>>> one can
>>>> preserve all bootstrapping information even in a merged graph.
>>>
>>> Sure, but what about pieces of named graphs? What if I take a named 
>>> graph, toss out half the triples, then merge the rest with a piece 
>>> of another named graph?
>>
>> Then you have an entirely *different* graph, and any statements about
>> the other graphs are irrelevant to the new graph, and thus innocuous.
>
> But is this remotely plausible as a way to deploy information on a 
> Web? Every single set of triples has to have a unique, traceable name? 
> Every OWL reasoner has to generate URIs for every conclusion it 
> generates? Its is one thing to say that one CAN name graphs, but this 
> seems to require that every subset of triples ever contemplated needs 
> to be given a unique name for all time.

Graph naming/signing/authentication and determination of trust is an 
operation
at the boundaries of agent interchange -- not intra-agent reasoning or 
processing
of knowledge.

When knowledge is published, the graph can be qualified for assertion, 
authority,
and signature, and agents considering whether to syndicate and utilize 
that
knowledge can apply the bootstrapping intepretation on the static graph 
and
decide if it's acceptable. If so, for most agents, the original 
identity of
the accepted graph disappears and the agent may very well discard the 
bootstrapping
statements about that original graph. Other agents, which may keep 
track of
original graph membership of statements may do further reasoning on a 
per-graph
basis to filter knowledge for particular applications in various ways, 
or
to allow various users of a given knowledge base to apply personal trust
policies (the knowledge base essentially acting as a collection of 
graphs,
upon which distributed queries are executed).

But for simple, legacy RDF/OWL reasoners, they can just eat any old 
graph
they get, presume it's asserted/valid/trustworthy, and any bootstrapping
statements that might be present will be innocuous and irrelevant, no
different than any statements employing unknown terms.

>
>>
>> It's a simple as that.

It really is ;-)

Patrick

--

Patrick Stickler
Nokia, Finland
patrick.stickler@nokia.com
Received on Friday, 12 March 2004 04:33:18 UTC