Re: Last Call comment on SPARQL 1.1 Update from Michael Hausenblas on 2011-05-30 (public-rdf-dawg-comments@w3.org from May 2011)

From: Michael Hausenblas <michael.hausenblas@deri.org>
Date: Mon, 30 May 2011 09:44:09 +0100
To: Ivan Mikhailov <imikhailov@openlinksw.com>
Cc: W3C SPARQL WG comments <public-rdf-dawg-comments@w3.org>
Message-Id: <FA6F0600-02A5-4260-8205-DE6BFC3CE5F9@deri.org>
Ivan,

I assume this is not an official answer, based on the 'with all hats  
off' next to your name.

> The draft lists all severe issues already, namely #26, #18 and #19  
> (the
> #19 relates to this because cost of errors/attacks scales linearly or
> faster with the scale of the storage and security becomes more and  
> more
> important). I see no reason to lengthen that list.

I do not suggest to add something to the list. Let's see where we are  
re the issues:
  + ISSUE-18 'Concurrency in SPARQL/update'
  + ISSUE-19 'Security issues on SPARQL/UPdate'
  + ISSUE-26 'Conjunction of operation vs atomocity, transactions'

Hm. Sounds more like ISSUE-18/26 to me, but without knowing the  
history of all discussions it's hard to tell ...

> [...] All these features can not be fit in any common-purpose spec,  
> due to
> prohibitive cost of the "smallest valid implementation". What could be
> in the spec, however, is a common syntax for implementation-specific
> pragmas, like in XQuery, but this idea is rejected in SPARQL 1.0  
> times.


Let me phrase my proposal a bit more concretely:

[[
To future-proof the SPARQL Update specification, add a non-normative  
appendix titled ‘large-scale deployment considerations’ (from a system- 
level POV). This section should discuss performance and scalability  
issues concerning large-scale deployments (hundreds of nodes/tera- 
triples scale) and offer implementation advices how to handle update  
language operations, such as DELETE, in this context.
]]

Cheers,
	Michael
--
Dr. Michael Hausenblas, Research Fellow
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html

On 29 May 2011, at 19:38, Ivan Mikhailov wrote:

> Michael,
>
> The draft lists all severe issues already, namely #26, #18 and #19  
> (the
> #19 relates to this because cost of errors/attacks scales linearly or
> faster with the scale of the storage and security becomes more and  
> more
> important). I see no reason to lengthen that list.
>
> The body of the document does not contain a word "transaction". Even
> worse, I see no possibility to reach some consensus about it, due to
> variety of implementations. It is not really important for me because
> "my" SPARUL is a preprocessor on full-scale SQL engine; I can  
> implement
> any standardized semantics in hours. So let others choose.
>
> Meanwhile we offer numerous implementation-specific pragmas, some of
> them control transaction log and the like. We also offer graph-level
> security for both SPARQL read-only access and SPARUL read-write. We  
> also
> can make SPARQL queries with side effects such as loading missing
> resources on demand, and there's security for that side effects as  
> well.
> We also offer different non-SPARUL tools for massive data loading,
> because one LOAD at time is definitely not the best way of keeping a
> hundred of CPU cores busy and different single/cluster configurations
> require different loading policies. We also configure parsers, because
> real data are not always perfect and we should selectively recover  
> from
> different sorts of errors.
> All these features can not be fit in any common-purpose spec, due to
> prohibitive cost of the "smallest valid implementation". What could be
> in the spec, however, is a common syntax for implementation-specific
> pragmas, like in XQuery, but this idea is rejected in SPARQL 1.0  
> times.
>
> Best Regards,
>
> Ivan Mikhailov (with all hats off)
> OpenLink Software
> http://virtuoso.openlinksw.com
>
> On Sun, 2011-05-29 at 17:29 +0100, Michael Hausenblas wrote:
>> All,
>>
>> This is a comment concerning the Last Call Working Draft 'SPARQL 1.1
>> Update' [1]. It is clearly written and, AFAICT sound. However, I have
>> an issue with it - more on the conceptual level. I tried to express  
>> my
>> concerns in a blog post [2] and will do my best to summarise in the
>> following.
>>
>> While the proposed update language - without any doubt - is perfectly
>> suitable for 'small to medium'-sized setups, I fear that we will run
>> into troubles in large-scale deployments concerning the costs for
>> updating and deleting huge volumes of triples. Now, I wish I had
>> experimental evidence myself to proof this (and I have to admit I
>> don't have), but I would like the WG to consider to either include a
>> section discussing the issue, or setting up a (non-REC Track)  
>> document
>> that discusses this (which could be titled 'implementation/usage
>> advices for large-scale deployments' or the like). I do feel strongly
>> about this and would offer to contribute to such a document, if  
>> desired.
>>
>> I'd very much appreciate it if WG members would be able to point me  
>> to
>> own experiences in this field (experiments or real-world deployments
>> alike).
>>
>> Cheers,
>> 	Michael (with my DERI AC Rep and RDB2RDF WG co-chair hat off)
>>
>> [1] http://www.w3.org/TR/2011/WD-sparql11-update-20110512/
>> [2] http://webofdata.wordpress.com/2011/05/29/ye-shall-not-delete-data/
>>
>> --
>> Dr. Michael Hausenblas, Research Fellow
>> LiDRC - Linked Data Research Centre
>> DERI - Digital Enterprise Research Institute
>> NUIG - National University of Ireland, Galway
>> Ireland, Europe
>> Tel. +353 91 495730
>> http://linkeddata.deri.ie/
>> http://sw-app.org/about.html
>>
>>
>
>
Received on Monday, 30 May 2011 08:44:39 UTC