- From: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
- Date: Mon, 28 Sep 2009 22:07:17 +0100
- To: "Seaborne, Andy" <andy.seaborne@hp.com>
- Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
Unintentionally. That is me clicking reply instead of reply all. I'll resend it to the list. That's for letting me know. Birte 2009/9/28 Seaborne, Andy <andy.seaborne@hp.com>: > Was this intended for the list or was it off list intentionally? > > Andy > >> -----Original Message----- >> From: b.glimm@googlemail.com [mailto:b.glimm@googlemail.com] On Behalf Of >> Birte Glimm >> Sent: 28 September 2009 19:45 >> To: Seaborne, Andy >> Subject: Re: [TF-ENT] RDFS entailment regime proposal >> >> Andy, >> scalability is important, but it is not the only driving factor for >> me. I am still hesitant to have MAY instead of MUST because we then >> specify a system behavior that tolerates the violation of the RDFS >> entailment lemma from the RDF spec for the RDFS entailment regime. It >> can give better performance under an RDFS entailment regime, but >> interpreting blank nodes as normal names would also give you much >> better performance in many cases and nevertheless that is not what is >> and should be done. >> I want to understand the consequences that such a change has and since >> it can violate the very basic underlying principles, such as the RDFS >> entailment lemma, I think one should be very careful with such a >> change. >> >> Apart from scalability, a consistent behavior of SPARQL engines under >> an RDFS entailment regime is also important to me. What is not good >> from an interoperability point of view is that one system gives you >> answers A and another gives you answers B or in this case, one system >> answers the query and another says the data is inconsistent. Which >> system is correct? Both because the one that gave an answer just >> didn't see the inconsistency? If you query the same data twice with >> the same query, can it happen that for the first query you get an >> answer, then the system answers some other query maybe from another >> user, which makes it recognize the inconsistency, and then I ask my >> same query again and then I get an inconsistency message? I would find >> that not a nice behavior. >> >> It is definitey something we should discuss in the telcon if we have >> the time and if not, I would like to have some more opinions on that >> and some more explanations of the effects that such a change would >> have. >> Birte >> >> >> 2009/9/28 Seaborne, Andy <andy.seaborne@hp.com>: >> > >> > >> >> -----Original Message----- >> >> From: b.glimm@googlemail.com [mailto:b.glimm@googlemail.com] On Behalf Of >> >> Birte Glimm >> >> Sent: 28 September 2009 16:55 >> >> To: Seaborne, Andy >> >> Cc: SPARQL Working Group >> >> Subject: Re: [TF-ENT] RDFS entailment regime proposal >> >> >> >> [snip] >> >> >> >> >> Well, but under RDFS semantics you have to check consistency first >> >> >> anyway since an inconsistent graph entails all tuples. Bad lexical >> >> >> forms are not causing an inconsistency, only when combined with an >> >> >> assertion that the range of the used property/predicate is >> >> >> rdfs:Literal or rdf:XMLLiteral. Thus, if you parse a data set and find >> >> >> a literal that has a bad lexical form, you better check consistency >> >> >> anyway and after that you know whether your data is legal or not. >> >> >> Also, if a user asks >> >> >> SLEECT ?x WHERE { ?x <ex:b> <ex:c> . } >> >> >> I would expect an error because I wrote SEELCT instead of SELECT and I >> >> >> should be told that the query is not a legal query. Similarly >> >> >> SELECT ?x WHERE { ?x <ex:b> <ex:c> <ex:forthInATriple> . } >> >> >> should give me an error, right? >> >> > >> >> > Yes it's a syntax error but I don't see how it connected. It can be >> >> determined by a static determination from the query string. >> >> >> >> Well, it is only connected in that I wanted to establish whether you >> >> think that an illegal, mal-formed query should result in an error or >> >> not. That is clear now, so we disagree about illegal data. >> > >> > And this is a general issue, not just RDFS: D-entailment, rules. >> > >> > The concern is scalability but I see no mention of this below. >> > >> >> > Strictly, it's not a SPARQL query string and what a service does with >> >> that is outside the spec because the spec only defines what happens with >> >> query strings that match the grammar and says nothing about non-matching >> >> strings. The SPARQL protocol error exists because the restriction is that >> >> it a SPARQL query string. >> >> > >> >> > But in the RDFS entailment case it's the data at issue. For scalability, >> >> I like to see a processor that can process the query and get the answers >> >> be able to return them. As proposed it's an error - it's not now outside >> >> the spec; it's covered by the spec and explicitly wrong. But if a >> >> processor can perform a BGP matching without needing to touch the whole >> >> graph, then I think that should be allowed. Similarly if it can start >> >> generating answers, then finds a problem, then a required error (and no >> >> results) means the processor can't stream and has to buffer all results >> >> before it sends any which is a potentially huge cost. >> >> > >> >> >> >> Again, there can be illegal graphs due to inconsistencies or due to >> >> just mal-formed RDF. I think you do want a different behavior for >> >> inconsistent graphs. If I have mal-formed RDF, I don't see why any >> >> system should just silently swallow that, see again data such as >> >> <ex:a> <ex:b> <ex:c> <ex:d> . >> >> That just is no RDF graph and I would want my system to tell me that I >> >> wrote mal-formed RDF and I think you do as well. Thus, we can discuss >> >> whether inconsistent graphs should be illegal or not I assume. You >> >> propose, you read/load the data >> > >> > No. Maybe the data is loaded, maybe it's partly loaded and an on-demand >> scheme is used. >> > >> >> and then, when you get a query, you >> >> start finding answers, apply some (entailment rules) while you do that >> >> (because after all we do want some entailments under an RDFS >> >> entailment regime) and happily keep finding answers and return them >> >> until you come to a point where you apply a rule and detect an >> >> inconsistency. At that point you want to stop or you would simply >> >> continue? What would you tell the user? Would you say anything? >> > >> > My point is why should the spec tell me that I have to do things one >> particular way. For small systems, the provider might want to provide an >> exception but a system for large scale data may be unwilling to generate an >> error unless it is encountered. >> > >> > e.g. >> > >> > ASK { ?x :p :z } >> > >> > Or even >> > >> > ASK { :x :p :z } >> > >> >> Give a >> >> warning that actually what you said before is still valid, but the >> >> user should please be aware of the inconsistency? >> >> What could also happen is that you know from some analysis that you >> >> only need to look at a certain part of the graph and that part is fine >> >> and you answer a query by only touching that part. But now another >> >> query that touches another part and that part actually contains an >> >> inconsistency that you could discover while you try to find the >> >> answers to the query, right? >> >> In that case, the answers to your first >> >> query are wrong because an inconsistent graph entails everything and >> >> not just the answers that you returned. >> > >> > May be wrong, it may not. See above. >> > >> > A query that just requires only part of an entailment regime to be answered >> completely should be in scope for optimization. The requirement to make a >> global determination has a scalability implication. >> > >> > Do you recognize that scalability is a concern some systems might have? Or >> are you saying that scalability is not a primary issue and should not be >> considered a requirement for entailment regime designs? >> > >> > (noting the data may also be offered up under different entailment regimes >> on different endpoints) (/me avoids mentioning mixed entailment on different >> BGPs in the same query) >> > >> >> I am against this. Under RDFS, inconsistencies arise only due to >> >> illegal XMLLiterals, so, yes, when you load your data, >> > >> > IF you load the data. >> > >> > The processor may not touch the literals. Maybe it does the entailment by >> simple rules during query execution. >> > >> >> you have to >> >> parse the xml and not just take it for a string. Usually that XML >> >> should parse fine (after all users usually do not intend to produce >> >> inconsistencies) and you can do what you suggest to do. You are >> >> guaranteed not to have any inconsistencies. In case you find >> >> mal-formed XML, you should better do a consistency check first and >> >> only then answer queries. You might want to give a warning anyway. I >> >> prefer this to having a kind of undefined behavior where you might >> >> later change your mind about answers that you gave to previous >> >> queries. You can do that, but I personally would not call it RDFS >> >> entailment. >> >> >> >> >> >> > The entailment doc does not specify what an error is - what had you in >> >> mind? If it's going to relatively undefined, then we can just say that if >> >> the data is illegal, then all bets are off i.e. it's not matching for RDFS >> >> entailment if you get any answers. >> >> > >> >> Well, but the point still is: Do we tell the user and at which point, >> >> that all bets are off? Or can it happen that we answer some queries >> >> and then suddenly say "Actually, dear user, all bets are off. I just >> >> found an inconsistency. " I had in mind an error (with or without >> >> error numer) that tells the user that the queried graph is >> >> inconsistent, that we do not return any answers, but that an >> >> inconsistent graph would entail all statements. If you are nice, you >> >> even tell the user what caused the inconsistency. >> >> >> >> Birte >> > >> > There is no recognition here that scalability, and the related issue of >> streaming results are significant. >> > >> > Do you accept these are concerns? >> > >> > Infinite numbers of statements don't preclude useful answers. I am >> proposing that instead of a design where an error MUST be signalled, which >> has scaling issues (streaming, global check of the data), the design is that >> it is outside the spec and an error MAY be signalled and MUST be if it >> affects the answers. >> > >> > This really is a small change and might even be argued to be there because >> if it's not a legal graph than it is outside the entailment regime anyway >> isn't? >> > >> > However, the wording is too categorical to me and it expresses an intent of >> a particular outcome. I can see cases where the answers are what is required >> but the graph is illegal, where the inconsistency is somewhere that the >> engine need not touch. >> > >> > Andy >> > >> >> >> >> > I'm assuming "error" means like the errors we have in FILTER evaluation >> >> i.e. no answers at best or the notion of "error" in other systems where it >> >> means return an error code but no answers. A situation where an error >> >> code and answers are returned is harder to design over HTTP and may have >> >> problems with streaming (the return code is sent before the body). >> >> > >> >> > Andy >> >> > >> >> >> >> >> >> I can see your point for simple entailment, but for RDFS entailment I >> >> >> would think that illegal data or query are best treated by an error. >> >> >> >> >> >> Birte >> >> >> >> >> >> >> >> >> -- >> >> >> Dr. Birte Glimm, Room 306 >> >> >> Computing Laboratory >> >> >> Parks Road >> >> >> Oxford >> >> >> OX1 3QD >> >> >> United Kingdom >> >> >> +44 (0)1865 283529 >> >> > >> >> >> >> >> >> >> >> -- >> >> Dr. Birte Glimm, Room 306 >> >> Computing Laboratory >> >> Parks Road >> >> Oxford >> >> OX1 3QD >> >> United Kingdom >> >> +44 (0)1865 283529 >> > >> >> >> >> -- >> Dr. Birte Glimm, Room 306 >> Computing Laboratory >> Parks Road >> Oxford >> OX1 3QD >> United Kingdom >> +44 (0)1865 283529 > -- Dr. Birte Glimm, Room 306 Computing Laboratory Parks Road Oxford OX1 3QD United Kingdom +44 (0)1865 283529
Received on Monday, 28 September 2009 21:07:57 UTC