Re: XML serialization of SPARQL

Bijan Parsia wrote:
> On Mar 16, 2005, at 4:27 AM, Steve Harris wrote:
> [snip]
>>I have an alternative suggestion:
>>	<sparql>
>>	  CONSTRUCT { ?a foo ?b . ?b bar ?c }
>>	  WHERE { ?b thoip ?c . ?c fump ?d . ?a wonk ?d }
>>	</sparql>
> That would not answer to my needs and use cases.
>>In seriousness the XML serialisation better be pretty close to the 
>>or it is a bit too much of a PITA for very little gain.
> Well, I don't agree witt the latter disjunt. However, we are working on 
> a schema for a version that tries to be reasonably close.
>>>I can see that this would enable one to use a SOAP SPARQL Protocol
>>>binding to sign and/or encrypt. There may be other motivations.
>>>What are they?
>>Validating inputs I think, but the rules for validating SPARQL 
>>are sufficiently complicated that I dont think youre going to be able 
>>express them in schema or whatever. e.g. how do you say that variable 
>>appear only in optional blocks may only appear in one.
> I think it's preferable for the WSDL, for embeddabilty, for the wire, 
> for generation, for interoperability, for query management,  and for 
> (in various contexts) selling it.
> Prior to detailing these, I would like to determine a point of order. I 
> think it's a point of order. It's sort of a meta point about your 
> argument anyway. I don't think it's nearly sufficient to note that 
> schema cannot express *all* of the constraints of the SPARQL grammar to 
> make, really, any kind of strong case against the XML serialization. 
> First, we have choices. We can simplify the grammar until it *is* 
> (largely) schema accessible. Second, most XML formats have constraints 
> not expressibly in schema (take wsdl, for example), just as most 
> languages have constraints not expressible in their grammar (it's not 
> sufficient to get a legal C program to successfully compile it). 
> Granted, its more often the case that those are *semantic* constraints, 
> but still.
> XML schema is the only type systems (in a programming language sense) 
> at the W3C. There's a general imperative to use W3C specs in other W3C 
> specs. Hence, my defaulting to W3C schema. Schema is the only built-in 
> type system for WSDL (in the abstract can bind on the wire 
> to anything that is a serialization of that type, including current 
> Sparql). I find an input type for sparql query of an undifferentiated 
> ws:string to be worthless and other Web Services people I talked to 
> agreed. It doesn't tell you anything, it cannot be used (usefully) for 
> discovery or matchmaking, generic toolkits can do little to help sanity 
> check your input. Essenitally, you have to have a client side (or 
> firewall, or...) Sparql sanity checker *or* rely on runtime fault 
> behavior. That's unacceptable in many contexts.
> In general, an XML format, with a good type or not, is more composable 
> with other formats than not. For example, I would like to use Sparql 
> queries to specify preconditions in OWL-S (and eventually, in a WSDL 
> extension). I could embed a string but it's harder, uglier, and most 
> importantly, makes that content largely unavailable to my current xml 
> tools (data binders, xslt transfermation, xquery, etc.). I'd much 
> rather have something XML for soap, so my debuggers, firewall filters, 
> etc. don't have to be customized (in a novel way) for SPARQL.
> SImilarly, even without a *heavy* tool investment, I'd much rather 
> generate XML than an arbitrary grammar. And I *like* grammars! For many 
> shops, it's the only plausible choice absent, again, a rather large 
> investment in sparql tools.
> One nice thing about having a Schema for it, is that certain 
> constraints and variants become *much* easier to experss. For example, 
> let's say I have a legacy RDQL server that i now will describe using 
> wsdl. I could add additional constraints to the schema type for sparql 
> query that eliminated CONSTRUCT and DESCRIBE, thus, telling clients 
> exactly what I do or do not handle. All without the need for 
> conformance levels or additional work from the group and it's machine 
> readable. Similarly, if I want to support legacy systems or 
> applications, it seems much handier to have an xml format as an 
> intermediary. Once I have sparql to XML (which Bryan says he'll support 
> in his parser) I can write style sheets to go to RDQL, Versa, etc. 
> (assuming it types to the schemas for them; and Imight have to add 
> additional constraints, but at least I'm not starting from zero). 
> Similarly, I could adapt my current query internal structures for 
> whatever to general XML and then convert that via a generic converter 
> back to sparql. It's just easier to reuse such tools across systems and 
> languages.
> If I have large collections of queries, or query fragments, I may well 
> like to keep them in a database. It would be nicer to be able to store 
> them in my handy dandy xml store and use xquery on them. (Servers in a 
> large organization might log all queries and want to analyze them for 
> tuning or auditing).
> I believe there are more arguments and more detail for these arguments, 
> but the final point is not insignificant: There are many organizations 
> for which "lack of angle brackets" is just a non-starter. I was told 
> this again recently by a Semweb booster at a large company. Semantic 
> Web is often a hard sell. Lack of modern xml practices for semweb 
> formats makes it several times harder. That's in essence the argument 
> from WSDL...web services at the W3C (and elsewhere largely) are *xml* 
> web services. You look like a dork selling something else; 
> unprofessional. Plus, you *are* being a bit of a dork, since the 
> overhead of the retooling you require makes accepting our somewhat out 
> of mainstream technology that much more expensive for many people.
> Ok, this was partly to respond to Steve's argument and partly to 
> respond to Eric's request for more motivations.
> None of this tells *against* a human oriented syntax. Far from it. I 
> believe a sane human oriented syntax is valuable for adoption and for 
> use. To be cliched, a machine oriented syntax is important too.
> Kendall and I have been working on a suitable XML Schema for what we 
> hope is a suitable XML format. We'll share it when it's coherent enough 
> and debugged enough to be an actual proposal (I hope sometime next 
> week). I hope we'll have implemenation of SPAQRL2XMLformat and 
> XMLFormat to SPARQL shortly after that.
> One question we wrestled with is how roundtrippable we want this 
> format. I.e., how much of the surface syntax we want to preserve. For 
> example, do you want to mirror the turtle exactly or normalize it to a 
> triple oriented form? The latter makes the xml easier but makes it 
> impossible to recover the human syntax exactly. I'd be interested if 
> that was important to people (it's not for me).
> In terms of impact on schedule, I'm willing to let it slip wrt the main 
> SPARQL rec, as with the protocol document.
> Hmm. Ok, that seems like everything I could think of on this topic at 
> the moment. Sorry for the brain dump; bit crazy here. :)
> Cheers,
> Bijan Parsia.

Thanks for the motivational requirments.

Two minor points:

1/ Tool investment: the argument is about reuse of the tool invetsment is true
but partial if the intent is to handle the query in some way.  It's the "tags
aren't semantics" issue - the tool investment is still going to have to be done
to work above the structure.

In the same way, RDF builds in XML - it's useful to reuse an XML parser (not to 
trivialize XML parsing) but that is a only a part of the overall work to produce 
an RDF framework.

2/ The syntax sugar is defined in terms of its triple output.  It's the triples
that matter.  Capturing the surface syntactic form of the SPARQL is going to be
of little value, and possibly a big hinderance, to tools that are trying to
manage and manipulate SPARQL queries where they are more interested in the
effect of the query/query fragment (that is not the display oriented usages

Writing pretty printers is a bit fiddly at times but it is their job to
reconstruct syntactic sugar, especially after the query has been "optimized".

And timing points:

A/ An XML syntax would be useful in situations outlined here - so would be RDF
one for RDF tools - but what happens about the test suite?  Is one syntax the
primary one for defining tests?

B/ As the purpose is for XML tools, I'd like to see a input from those
communities during the creation of the XML syntax, during the requirments.


Received on Wednesday, 16 March 2005 16:03:32 UTC