RE: On Parameters from Orri Erling on 2009-03-26 (public-rdf-dawg@w3.org from January to March 2009)

From: Orri Erling <erling@xs4all.nl>
Date: Thu, 26 Mar 2009 09:32:15 +0100
To: "'Ezzat, Ahmed'" <Ahmed.Ezzat@hp.com>, "'Ivan Mikhailov'" <imikhailov@openlinksw.com>, "'Seaborne, Andy'" <andy.seaborne@hp.com>
Cc: "'Steve Harris'" <steve.harris@garlik.com>, "'SPARQL Working Group'" <public-rdf-dawg@w3.org>
Message-Id: <200903260833.n2Q8XIml095331@smtp-vbr5.xs4all.nl>

  _____  

From: public-rdf-dawg-request@w3.org [mailto:public-rdf-dawg-request@w3.org]
On Behalf Of Ezzat, Ahmed
Sent: Wednesday, March 25, 2009 10:37 PM
To: Orri Erling; 'Ivan Mikhailov'; Seaborne, Andy
Cc: 'Steve Harris'; 'SPARQL Working Group'
Subject: RE: On Parameters

Hello,

Wanted to introduce myself as new member if this WG from HP.  My background
covers many areas but SQL is a big part of my activities. Recently data
integration is attracting me and hence semantic web.

Here is my input on this thread:

I definitely support having an array/vector (we call it in our database "row
set") as parameters.  In SQL it proved to be useful and almost all databases
support that at the CLI level as Orri mentioned.

Regarding effectively prepared statement vs query caching.  In our
experience if you have good query cache the value of prepared statement is
minimal (not worth it).  However there are some differences at least in our
MPP environment between the two capabilities:

1.	For prepared statement, all tables remain open after query
completion while with query cache that typically is not true.
2.	For prepared statement downloading query fragments to appropriate
query execution processes is done once while with query cache you will need
to IPC the query fragments every time you reuse the same query plan from the
cache.

In our experience, efficient query cache is good enough...

Regards,

Ahmed

 .

Ahmed

Yes, query caching delivers essentially the same value as prepared
statements.

More precisely, this is so in the SQL world where parameters and array
parameters are an accepted fact of life.  When getting the query, it is
enough to check if the same text has been seen before.  Then these caches
have to be invalidated on schema change or rrecomputing statistics.

Here this is a bit trickier, because without any notion of parameters in the
language, an end ppoint would have to read a bunch of pipelined requests,
parse them all, find that they differ only in literals, compile if not
already in cache, and then execute with the equivalent of array parameters
internally.  The internal array parameter part is for clusters only.  Such a
reuse logic is more expensive than just checking if the text has been seen
before.

I may do some measurements next week to get numbers.  But it is safe to say
that in SQL, for single row operations, using array parameters in the CLI
can  easily be 50-100x faster than not.  For SPARQL, we'll have to see the
ratios of network latency, parse, 

optimize, execute with the type of short queries that SPARQL federated
joining will create.

Orri

Received on Thursday, 26 March 2009 08:34:00 UTC