Re: [TF-ENT] RDFS entailment regime proposal from Birte Glimm on 2009-09-28 (public-rdf-dawg@w3.org from July to September 2009)

From: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
Date: Mon, 28 Sep 2009 16:54:37 +0100
To: "Seaborne, Andy" <andy.seaborne@hp.com>
Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <492f2b0b0909280854i667fa473o7ffa60ad0852c772@mail.gmail.com>
[snip]

>> Well, but under RDFS semantics you have to check consistency first
>> anyway since an inconsistent graph entails all tuples. Bad lexical
>> forms are not causing an inconsistency, only when combined with an
>> assertion that the range of the used property/predicate is
>> rdfs:Literal or rdf:XMLLiteral. Thus, if you parse a data set and find
>> a literal that has a bad lexical form, you better check consistency
>> anyway and after that you know whether your data is legal or not.
>> Also, if a user asks
>> SLEECT ?x WHERE { ?x <ex:b> <ex:c> . }
>> I would expect an error because I wrote SEELCT instead of SELECT and I
>> should be told that the query is not a legal query. Similarly
>> SELECT ?x WHERE { ?x <ex:b> <ex:c> <ex:forthInATriple> . }
>> should give me an error, right?
>
> Yes it's a syntax error but I don't see how it connected.  It can be determined by a static determination from the query string.

Well, it is only connected in that I wanted to establish whether you
think that an illegal, mal-formed query should result in an error or
not. That is clear now, so we disagree about illegal data.

> Strictly, it's not a SPARQL query string and what a service does with that is outside the spec because the spec only defines what happens with query strings that match the grammar and says nothing about non-matching strings.  The SPARQL protocol error exists because the restriction is that it a SPARQL query string.
>
> But in the RDFS entailment case it's the data at issue. For scalability, I like to see a processor that can process the query and get the answers be able to return them.  As proposed it's an error - it's not now outside the spec; it's covered by the spec and explicitly wrong.  But if a processor can perform a BGP matching without needing to touch the whole graph, then I think that should be allowed.  Similarly if it can start generating answers, then finds a problem, then a required error (and no results) means the processor can't stream and has to buffer all results before it sends any which is a potentially huge cost.
>

Again, there can be illegal graphs due to inconsistencies or due to
just mal-formed RDF. I think you do want a different behavior for
inconsistent graphs. If I have mal-formed RDF, I don't see why any
system should just silently swallow that, see again data such as
<ex:a> <ex:b> <ex:c> <ex:d> .
That just is no RDF graph and I would want my system to tell me that I
wrote mal-formed RDF and I think you do as well. Thus, we can discuss
whether inconsistent graphs should be illegal or not I assume. You
propose, you read/load the data and then, when you get a query, you
start finding answers, apply some (entailment rules) while you do that
(because after all we do want some entailments under an RDFS
entailment regime) and happily keep finding answers and return them
until you come to a point where you apply a rule and detect an
inconsistency. At that point you want to stop or you would simply
continue? What would you tell the user? Would you say anything? Give a
warning that actually what you said before is still valid, but the
user should please be aware of the inconsistency?
What could also happen is that you know from some analysis that you
only need to look at a certain part of the graph and that part is fine
and you answer a query by only touching that part. But now another
query that touches another part and that part actually contains an
inconsistency that you could discover while you try to find the
answers to the query, right? In that case, the answers to your first
query are wrong because an inconsistent graph entails everything and
not just the answers that you returned.
I am against this. Under RDFS, inconsistencies arise only due to
illegal XMLLiterals, so, yes, when you load your data, you have to
parse the xml and not just take it for a string. Usually that XML
should parse fine (after all users usually do not intend to produce
inconsistencies) and you can do what you suggest to do. You are
guaranteed not to have any inconsistencies. In case you find
mal-formed XML, you should better do a consistency check first and
only then answer queries. You might want to give a warning anyway. I
prefer this to having a kind of undefined behavior where you might
later change your mind about answers that you gave to previous
queries. You can do that, but I personally would not call it RDFS
entailment.


> The entailment doc does not specify what an error is - what had you in mind?  If it's going to relatively undefined, then we can just say that if the data is illegal, then all bets are off i.e. it's not matching for RDFS entailment if you get any answers.
>
Well, but the point still is: Do we tell the user and at which point,
that all bets are off? Or can it happen that we answer some queries
and then suddenly say "Actually, dear user, all bets are off. I just
found an inconsistency. " I had in mind an error (with or without
error numer) that tells the user that the queried graph is
inconsistent, that we do not return any answers, but that an
inconsistent graph would entail all statements. If you are nice, you
even tell the user what caused the inconsistency.

Birte

> I'm assuming "error" means like the errors we have in FILTER evaluation i.e. no answers at best or the notion of "error" in other systems where it means return an error code but no answers.  A situation where an error code and answers are returned is harder to design over HTTP and may have problems with streaming (the return code is sent before the body).
>
>        Andy
>
>>
>> I can see your point for simple entailment, but for RDFS entailment I
>> would think that illegal data or query are best treated by an error.
>>
>> Birte
>>
>>
>> --
>> Dr. Birte Glimm, Room 306
>> Computing Laboratory
>> Parks Road
>> Oxford
>> OX1 3QD
>> United Kingdom
>> +44 (0)1865 283529
>



-- 
Dr. Birte Glimm, Room 306
Computing Laboratory
Parks Road
Oxford
OX1 3QD
United Kingdom
+44 (0)1865 283529
Received on Monday, 28 September 2009 15:55:13 UTC