Error handling (including with entailment regimes) from Seaborne, Andy on 2009-10-05 (public-rdf-dawg@w3.org from October to December 2009)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Mon, 5 Oct 2009 11:44:44 +0000
To: SPARQL WG <public-rdf-dawg@w3.org>
Message-ID: <B6CF1054FDC8B845BF93A6645D19BEA3693ECF6D84@GVW1118EXC.americas.hpqcorp.net>

While this is in the context of entailment regimes, the principles need to apply everywhere. For example, bad IRIs are "syntax errors" at some level and many systems currently handle them. Current practice for ill-formed literals is to handle them as encountered - also, in casting in FILTERs. E.g. bad lexical forms for XSD dateTimes (all too common) or xsd:integers. Sometimes, that means on loading but not always.

== Context

Entailment regimes apply when solving a basic graph pattern. This means that one query can have graphs with different entailment regimes applied to them, even if it's the same base data with different entailment regimes (this is a situation that happens in practice where one graph is the base data and one if the same data viewed with entailment). The consequence is that the data can't be assumed to have been checked for constituency for a given entailment regime beforehand.

Here are some example queries that pick out important issues for me:

SELECT * { ?x :p ?o }

This may need to consider a small part of the data.

pattern1 might be GRAPH <g1> { ...} and pattern2 might be GRAPH <g2> { ... }, or more complicate patterns involving different graphs with the algebra used to combine BGP matching.

SELECT *
{
{pattern1}
UNION
{pattern2}
}

This query can be executed by dispatching each pattern in parallel to some processing element, streaming results back to the caller from either side of the UNION as they become available.

SELECT * { pattern1 pattern2 }

(this is a join of pattern1 and pattern2). This query has two interesting points: order independence between pattern1 and pattern2 and results from one pattern can be used to evaluate the other in a more restrictive fashion.

Suppose pattern1 can be determined to generate at most 10 results but pattern2 is predicated to generate a large number of solutions. Pattern2 shares variables with pattern1. One possible join strategy is take each solution from pattern1, one at a time, substitute for the variables in pattern2 and solve the more restrictive pattern much more efficiently. At an extreme, suppose pattern1 ends up yielding zero solutions. Pattern2 need never be evaluated, as a whole or in a restricted form. When filters are considered,

SELECT * { pattern1 OPTIONAL { FILTER(expr) pattern2 } }

it means the BGP matching parts are nested inside algebra expressions in common usages.

== Error handling

SPARQL already has mechanisms for error reporting via the protocol. The nature of HTTP places some constraints on what can be done - the error or success code must be transmitted before results start. Defining warnings would require protocol specification to make them be delivered consistently across implementations.

We might choose to separate this aspect of error reporting from error and warning generation by entailment (e.g. accept that warnings might be logged instead of returned in conformant implementations).

My motivation is a specification of error handling that defines the minimum expectations of any system.

Service description can be used to communicate when a system provides stronger characteristics so we have a way for implementations to communicate when they provide stronger guarantees.

== Proposal

An implementation MAY generate an error or warning and SHOULD generate such an error or warning if, in the course of processing, it determines that the data or query is not compatible with the request.

== Discussion

I don't see how anything stronger can be imposed. The service description option allows systems to state they provide stronger characteristics.

The SHOULD allows for the error handling issues mentioned above - it may be impractical in some circumstances but implementations as per RFC 2119 - "the full implications must be understood and carefully weighed before choosing a different course" which I read as including that it is conceivable that it is not practical to require a course of action. SHOULD is a reasonable strong statement.

"not compatible" - this phrase intends to convey that the data does not meet the entailment regime requirements. Query includes here because of the possibility of substitution evaluation where it is not a syntactic test that a query is legal.
Issue: what if the full pattern is illegal but all uses in the restricted context of an execution are legal?

Andy

Received on Monday, 5 October 2009 11:45:35 UTC