Re: [TF-ENT] RDFS entailment regime proposal from Birte Glimm on 2009-09-28 (public-rdf-dawg@w3.org from July to September 2009)

From: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
Date: Mon, 28 Sep 2009 22:07:17 +0100
To: "Seaborne, Andy" <andy.seaborne@hp.com>
Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <492f2b0b0909281407h25fcd82fu76ba70323ac32467@mail.gmail.com>
Unintentionally. That is me clicking reply instead of reply all. I'll
resend it to the list. That's for letting me know.
Birte

2009/9/28 Seaborne, Andy <andy.seaborne@hp.com>:
> Was this intended for the list or was it off list intentionally?
>
>        Andy
>
>> -----Original Message-----
>> From: b.glimm@googlemail.com [mailto:b.glimm@googlemail.com] On Behalf Of
>> Birte Glimm
>> Sent: 28 September 2009 19:45
>> To: Seaborne, Andy
>> Subject: Re: [TF-ENT] RDFS entailment regime proposal
>>
>> Andy,
>> scalability is important, but it is not the only driving factor for
>> me. I am still hesitant to have MAY instead of MUST because we then
>> specify a system behavior that tolerates the violation of the RDFS
>> entailment lemma from the RDF spec for the RDFS entailment regime. It
>> can give better performance under an RDFS entailment regime, but
>> interpreting blank nodes as normal names would also give you much
>> better performance in many cases and nevertheless that is not what is
>> and should be done.
>> I want to understand the consequences that such a change has and since
>> it can violate the very basic underlying principles, such as the RDFS
>> entailment lemma, I think one should be very careful with such a
>> change.
>>
>> Apart from scalability, a consistent behavior of SPARQL engines under
>> an RDFS entailment regime is also important to me. What is not good
>> from an interoperability point of view is that one system gives you
>> answers A and another gives you answers B or in this case, one system
>> answers the query and another says the data is inconsistent. Which
>> system is correct? Both because the one that gave an answer just
>> didn't see the inconsistency? If you query the same data twice with
>> the same query, can it happen that for the first query you get an
>> answer, then the system answers some other query maybe from another
>> user, which makes it recognize the inconsistency, and then I ask my
>> same query again and then I get an inconsistency message? I would find
>> that not a nice behavior.
>>
>> It is definitey something we should discuss in the telcon if we have
>> the time and if not, I would like to have some more opinions on that
>> and some more explanations of the effects that such a change would
>> have.
>> Birte
>>
>>
>> 2009/9/28 Seaborne, Andy <andy.seaborne@hp.com>:
>> >
>> >
>> >> -----Original Message-----
>> >> From: b.glimm@googlemail.com [mailto:b.glimm@googlemail.com] On Behalf Of
>> >> Birte Glimm
>> >> Sent: 28 September 2009 16:55
>> >> To: Seaborne, Andy
>> >> Cc: SPARQL Working Group
>> >> Subject: Re: [TF-ENT] RDFS entailment regime proposal
>> >>
>> >> [snip]
>> >>
>> >> >> Well, but under RDFS semantics you have to check consistency first
>> >> >> anyway since an inconsistent graph entails all tuples. Bad lexical
>> >> >> forms are not causing an inconsistency, only when combined with an
>> >> >> assertion that the range of the used property/predicate is
>> >> >> rdfs:Literal or rdf:XMLLiteral. Thus, if you parse a data set and find
>> >> >> a literal that has a bad lexical form, you better check consistency
>> >> >> anyway and after that you know whether your data is legal or not.
>> >> >> Also, if a user asks
>> >> >> SLEECT ?x WHERE { ?x <ex:b> <ex:c> . }
>> >> >> I would expect an error because I wrote SEELCT instead of SELECT and I
>> >> >> should be told that the query is not a legal query. Similarly
>> >> >> SELECT ?x WHERE { ?x <ex:b> <ex:c> <ex:forthInATriple> . }
>> >> >> should give me an error, right?
>> >> >
>> >> > Yes it's a syntax error but I don't see how it connected.  It can be
>> >> determined by a static determination from the query string.
>> >>
>> >> Well, it is only connected in that I wanted to establish whether you
>> >> think that an illegal, mal-formed query should result in an error or
>> >> not. That is clear now, so we disagree about illegal data.
>> >
>> > And this is a general issue, not just RDFS: D-entailment, rules.
>> >
>> > The concern is scalability but I see no mention of this below.
>> >
>> >> > Strictly, it's not a SPARQL query string and what a service does with
>> >> that is outside the spec because the spec only defines what happens with
>> >> query strings that match the grammar and says nothing about non-matching
>> >> strings.  The SPARQL protocol error exists because the restriction is that
>> >> it a SPARQL query string.
>> >> >
>> >> > But in the RDFS entailment case it's the data at issue. For scalability,
>> >> I like to see a processor that can process the query and get the answers
>> >> be able to return them.  As proposed it's an error - it's not now outside
>> >> the spec; it's covered by the spec and explicitly wrong.  But if a
>> >> processor can perform a BGP matching without needing to touch the whole
>> >> graph, then I think that should be allowed.  Similarly if it can start
>> >> generating answers, then finds a problem, then a required error (and no
>> >> results) means the processor can't stream and has to buffer all results
>> >> before it sends any which is a potentially huge cost.
>> >> >
>> >>
>> >> Again, there can be illegal graphs due to inconsistencies or due to
>> >> just mal-formed RDF. I think you do want a different behavior for
>> >> inconsistent graphs. If I have mal-formed RDF, I don't see why any
>> >> system should just silently swallow that, see again data such as
>> >> <ex:a> <ex:b> <ex:c> <ex:d> .
>> >> That just is no RDF graph and I would want my system to tell me that I
>> >> wrote mal-formed RDF and I think you do as well. Thus, we can discuss
>> >> whether inconsistent graphs should be illegal or not I assume. You
>> >> propose, you read/load the data
>> >
>> > No.  Maybe the data is loaded, maybe it's partly loaded and an on-demand
>> scheme is used.
>> >
>> >> and then, when you get a query, you
>> >> start finding answers, apply some (entailment rules) while you do that
>> >> (because after all we do want some entailments under an RDFS
>> >> entailment regime) and happily keep finding answers and return them
>> >> until you come to a point where you apply a rule and detect an
>> >> inconsistency. At that point you want to stop or you would simply
>> >> continue? What would you tell the user? Would you say anything?
>> >
>> > My point is why should the spec tell me that I have to do things one
>> particular way.  For small systems, the provider might want to provide an
>> exception but a system for large scale data may be unwilling to generate an
>> error unless it is encountered.
>> >
>> > e.g.
>> >
>> > ASK { ?x :p :z }
>> >
>> > Or even
>> >
>> > ASK { :x :p :z }
>> >
>> >> Give a
>> >> warning that actually what you said before is still valid, but the
>> >> user should please be aware of the inconsistency?
>> >> What could also happen is that you know from some analysis that you
>> >> only need to look at a certain part of the graph and that part is fine
>> >> and you answer a query by only touching that part. But now another
>> >> query that touches another part and that part actually contains an
>> >> inconsistency that you could discover while you try to find the
>> >> answers to the query, right?
>> >> In that case, the answers to your first
>> >> query are wrong because an inconsistent graph entails everything and
>> >> not just the answers that you returned.
>> >
>> > May be wrong, it may not.  See above.
>> >
>> > A query that just requires only part of an entailment regime to be answered
>> completely should be in scope for optimization.  The requirement to make a
>> global determination has a scalability implication.
>> >
>> > Do you recognize that scalability is a concern some systems might have?  Or
>> are you saying that scalability is not a primary issue and should not be
>> considered a requirement for entailment regime designs?
>> >
>> > (noting the data may also be offered up under different entailment regimes
>> on different endpoints) (/me avoids mentioning mixed entailment on different
>> BGPs in the same query)
>> >
>> >> I am against this. Under RDFS, inconsistencies arise only due to
>> >> illegal XMLLiterals, so, yes, when you load your data,
>> >
>> > IF you load the data.
>> >
>> > The processor may not touch the literals.  Maybe it does the entailment by
>> simple rules during query execution.
>> >
>> >> you have to
>> >> parse the xml and not just take it for a string. Usually that XML
>> >> should parse fine (after all users usually do not intend to produce
>> >> inconsistencies) and you can do what you suggest to do. You are
>> >> guaranteed not to have any inconsistencies. In case you find
>> >> mal-formed XML, you should better do a consistency check first and
>> >> only then answer queries. You might want to give a warning anyway. I
>> >> prefer this to having a kind of undefined behavior where you might
>> >> later change your mind about answers that you gave to previous
>> >> queries. You can do that, but I personally would not call it RDFS
>> >> entailment.
>> >>
>> >>
>> >> > The entailment doc does not specify what an error is - what had you in
>> >> mind?  If it's going to relatively undefined, then we can just say that if
>> >> the data is illegal, then all bets are off i.e. it's not matching for RDFS
>> >> entailment if you get any answers.
>> >> >
>> >> Well, but the point still is: Do we tell the user and at which point,
>> >> that all bets are off? Or can it happen that we answer some queries
>> >> and then suddenly say "Actually, dear user, all bets are off. I just
>> >> found an inconsistency. " I had in mind an error (with or without
>> >> error numer) that tells the user that the queried graph is
>> >> inconsistent, that we do not return any answers, but that an
>> >> inconsistent graph would entail all statements. If you are nice, you
>> >> even tell the user what caused the inconsistency.
>> >>
>> >> Birte
>> >
>> > There is no recognition here that scalability, and the related issue of
>> streaming results are significant.
>> >
>> > Do you accept these are concerns?
>> >
>> > Infinite numbers of statements don't preclude useful answers.  I am
>> proposing that instead of a design where an error MUST be signalled, which
>> has scaling issues (streaming, global check of the data), the design is that
>> it is outside the spec and an error MAY be signalled and MUST be if it
>> affects the answers.
>> >
>> > This really is a small change and might even be argued to be there because
>> if it's not a legal graph than it is outside the entailment regime anyway
>> isn't?
>> >
>> > However, the wording is too categorical to me and it expresses an intent of
>> a particular outcome.  I can see cases where the answers are what is required
>> but the graph is illegal, where the inconsistency is somewhere that the
>> engine need not touch.
>> >
>> >        Andy
>> >
>> >>
>> >> > I'm assuming "error" means like the errors we have in FILTER evaluation
>> >> i.e. no answers at best or the notion of "error" in other systems where it
>> >> means return an error code but no answers.  A situation where an error
>> >> code and answers are returned is harder to design over HTTP and may have
>> >> problems with streaming (the return code is sent before the body).
>> >> >
>> >> >        Andy
>> >> >
>> >> >>
>> >> >> I can see your point for simple entailment, but for RDFS entailment I
>> >> >> would think that illegal data or query are best treated by an error.
>> >> >>
>> >> >> Birte
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Dr. Birte Glimm, Room 306
>> >> >> Computing Laboratory
>> >> >> Parks Road
>> >> >> Oxford
>> >> >> OX1 3QD
>> >> >> United Kingdom
>> >> >> +44 (0)1865 283529
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Dr. Birte Glimm, Room 306
>> >> Computing Laboratory
>> >> Parks Road
>> >> Oxford
>> >> OX1 3QD
>> >> United Kingdom
>> >> +44 (0)1865 283529
>> >
>>
>>
>>
>> --
>> Dr. Birte Glimm, Room 306
>> Computing Laboratory
>> Parks Road
>> Oxford
>> OX1 3QD
>> United Kingdom
>> +44 (0)1865 283529
>



-- 
Dr. Birte Glimm, Room 306
Computing Laboratory
Parks Road
Oxford
OX1 3QD
United Kingdom
+44 (0)1865 283529
Received on Monday, 28 September 2009 21:07:57 UTC