RE: On Parameters from Ezzat, Ahmed on 2009-03-26 (public-rdf-dawg@w3.org from January to March 2009)

From: Ezzat, Ahmed <Ahmed.Ezzat@hp.com>
Date: Thu, 26 Mar 2009 14:13:00 +0000
To: Orri Erling <erling@xs4all.nl>, 'Ivan Mikhailov' <imikhailov@openlinksw.com>, "Seaborne, Andy" <andy.seaborne@hp.com>
CC: 'Steve Harris' <steve.harris@garlik.com>, 'SPARQL Working Group' <public-rdf-dawg@w3.org>
Message-ID: <3B7AE9BA67C72B4891EF21842246A21C4132C31074@GVW1097EXB.americas.hpqcorp.net>

Hello Orri,

Look forward to your measurements.

I agree array parameters is a big win.  We did not have them for a while and the difference is significant with them...
Regards,

Ahmed


From: Orri Erling [mailto:erling@xs4all.nl]
Sent: Thursday, March 26, 2009 01:32
To: Ezzat, Ahmed; 'Ivan Mikhailov'; Seaborne, Andy
Cc: 'Steve Harris'; 'SPARQL Working Group'
Subject: RE: On Parameters

________________________________
From: public-rdf-dawg-request@w3.org [mailto:public-rdf-dawg-request@w3.org] On Behalf Of Ezzat, Ahmed
Sent: Wednesday, March 25, 2009 10:37 PM
To: Orri Erling; 'Ivan Mikhailov'; Seaborne, Andy
Cc: 'Steve Harris'; 'SPARQL Working Group'
Subject: RE: On Parameters



Hello,

Wanted to introduce myself as new member if this WG from HP.  My background covers many areas but SQL is a big part of my activities. Recently data integration is attracting me and hence semantic web...

Here is my input on this thread:
I definitely support having an array/vector (we call it in our database "row set") as parameters.  In SQL it proved to be useful and almost all databases support that at the CLI level as Orri mentioned.

Regarding effectively prepared statement vs query caching.  In our experience if you have good query cache the value of prepared statement is minimal (not worth it).  However there are some differences at least in our MPP environment between the two capabilities:


 1.  For prepared statement, all tables remain open after query completion while with query cache that typically is not true.
 2.  For prepared statement downloading query fragments to appropriate query execution processes is done once while with query cache you will need to IPC the query fragments every time you reuse the same query plan from the cache.

In our experience, efficient query cache is good enough...
Regards,

Ahmed

 ...



Ahmed

Yes, query caching delivers essentially the same value as prepared statements.
More precisely, this is so in the SQL world where parameters and array parameters are an accepted fact of life.  When getting the query, it is enough to check if the same text has been seen before.  Then these caches have to be invalidated on schema change or rrecomputing statistics.

Here this is a bit trickier, because without any notion of parameters in the language, an end ppoint would have to read a bunch of pipelined requests, parse them all, find that they differ only in literals, compile if not already in cache, and then execute with the equivalent of array parameters internally.  The internal array parameter part is for clusters only.  Such a reuse logic is more expensive than just checking if the text has been seen before.

I may do some measurements next week to get numbers.  But it is safe to say that in SQL, for single row operations, using array parameters in the CLI can  easily be 50-100x faster than not.  For SPARQL, we'll have to see the ratios of network latency, parse,
optimize, execute with the type of short queries that SPARQL federated joining will create.


Orri

Received on Thursday, 26 March 2009 14:14:28 UTC