Re: XML serialization of SPARQL

On Mar 16, 2005, at 4:27 AM, Steve Harris wrote:
[snip]
> I have an alternative suggestion:
>
> 	<sparql>
> 	  CONSTRUCT { ?a foo ?b . ?b bar ?c }
> 	  WHERE { ?b thoip ?c . ?c fump ?d . ?a wonk ?d }
> 	</sparql>

That would not answer to my needs and use cases.

> In seriousness the XML serialisation better be pretty close to the 
> grammar
> or it is a bit too much of a PITA for very little gain.

Well, I don't agree witt the latter disjunt. However, we are working on 
a schema for a version that tries to be reasonably close.

>> I can see that this would enable one to use a SOAP SPARQL Protocol
>> binding to sign and/or encrypt. There may be other motivations.
>> What are they?
>
> Validating inputs I think, but the rules for validating SPARQL 
> expression
> are sufficiently complicated that I dont think youre going to be able 
> to
> express them in schema or whatever. e.g. how do you say that variable 
> that
> appear only in optional blocks may only appear in one.

I think it's preferable for the WSDL, for embeddabilty, for the wire, 
for generation, for interoperability, for query management,  and for 
(in various contexts) selling it.

Prior to detailing these, I would like to determine a point of order. I 
think it's a point of order. It's sort of a meta point about your 
argument anyway. I don't think it's nearly sufficient to note that 
schema cannot express *all* of the constraints of the SPARQL grammar to 
make, really, any kind of strong case against the XML serialization. 
First, we have choices. We can simplify the grammar until it *is* 
(largely) schema accessible. Second, most XML formats have constraints 
not expressibly in schema (take wsdl, for example), just as most 
languages have constraints not expressible in their grammar (it's not 
sufficient to get a legal C program to successfully compile it). 
Granted, its more often the case that those are *semantic* constraints, 
but still.

XML schema is the only type systems (in a programming language sense) 
at the W3C. There's a general imperative to use W3C specs in other W3C 
specs. Hence, my defaulting to W3C schema. Schema is the only built-in 
type system for WSDL (in the abstract layer....you can bind on the wire 
to anything that is a serialization of that type, including current 
Sparql). I find an input type for sparql query of an undifferentiated 
ws:string to be worthless and other Web Services people I talked to 
agreed. It doesn't tell you anything, it cannot be used (usefully) for 
discovery or matchmaking, generic toolkits can do little to help sanity 
check your input. Essenitally, you have to have a client side (or 
firewall, or...) Sparql sanity checker *or* rely on runtime fault 
behavior. That's unacceptable in many contexts.

In general, an XML format, with a good type or not, is more composable 
with other formats than not. For example, I would like to use Sparql 
queries to specify preconditions in OWL-S (and eventually, in a WSDL 
extension). I could embed a string but it's harder, uglier, and most 
importantly, makes that content largely unavailable to my current xml 
tools (data binders, xslt transfermation, xquery, etc.). I'd much 
rather have something XML for soap, so my debuggers, firewall filters, 
etc. don't have to be customized (in a novel way) for SPARQL.

SImilarly, even without a *heavy* tool investment, I'd much rather 
generate XML than an arbitrary grammar. And I *like* grammars! For many 
shops, it's the only plausible choice absent, again, a rather large 
investment in sparql tools.

One nice thing about having a Schema for it, is that certain 
constraints and variants become *much* easier to experss. For example, 
let's say I have a legacy RDQL server that i now will describe using 
wsdl. I could add additional constraints to the schema type for sparql 
query that eliminated CONSTRUCT and DESCRIBE, thus, telling clients 
exactly what I do or do not handle. All without the need for 
conformance levels or additional work from the group and it's machine 
readable. Similarly, if I want to support legacy systems or 
applications, it seems much handier to have an xml format as an 
intermediary. Once I have sparql to XML (which Bryan says he'll support 
in his parser) I can write style sheets to go to RDQL, Versa, etc. 
(assuming it types to the schemas for them; and Imight have to add 
additional constraints, but at least I'm not starting from zero). 
Similarly, I could adapt my current query internal structures for 
whatever to general XML and then convert that via a generic converter 
back to sparql. It's just easier to reuse such tools across systems and 
languages.

If I have large collections of queries, or query fragments, I may well 
like to keep them in a database. It would be nicer to be able to store 
them in my handy dandy xml store and use xquery on them. (Servers in a 
large organization might log all queries and want to analyze them for 
tuning or auditing).

I believe there are more arguments and more detail for these arguments, 
but the final point is not insignificant: There are many organizations 
for which "lack of angle brackets" is just a non-starter. I was told 
this again recently by a Semweb booster at a large company. Semantic 
Web is often a hard sell. Lack of modern xml practices for semweb 
formats makes it several times harder. That's in essence the argument 
from WSDL...web services at the W3C (and elsewhere largely) are *xml* 
web services. You look like a dork selling something else; 
unprofessional. Plus, you *are* being a bit of a dork, since the 
overhead of the retooling you require makes accepting our somewhat out 
of mainstream technology that much more expensive for many people.

Ok, this was partly to respond to Steve's argument and partly to 
respond to Eric's request for more motivations.

None of this tells *against* a human oriented syntax. Far from it. I 
believe a sane human oriented syntax is valuable for adoption and for 
use. To be cliched, a machine oriented syntax is important too.

Kendall and I have been working on a suitable XML Schema for what we 
hope is a suitable XML format. We'll share it when it's coherent enough 
and debugged enough to be an actual proposal (I hope sometime next 
week). I hope we'll have implemenation of SPAQRL2XMLformat and 
XMLFormat to SPARQL shortly after that.

One question we wrestled with is how roundtrippable we want this 
format. I.e., how much of the surface syntax we want to preserve. For 
example, do you want to mirror the turtle exactly or normalize it to a 
triple oriented form? The latter makes the xml easier but makes it 
impossible to recover the human syntax exactly. I'd be interested if 
that was important to people (it's not for me).

In terms of impact on schedule, I'm willing to let it slip wrt the main 
SPARQL rec, as with the protocol document.

Hmm. Ok, that seems like everything I could think of on this topic at 
the moment. Sorry for the brain dump; bit crazy here. :)

Cheers,
Bijan Parsia.

Received on Wednesday, 16 March 2005 12:51:22 UTC