W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > January to March 2009

RE: On Parameters

From: Orri Erling <erling@xs4all.nl>
Date: Thu, 26 Mar 2009 09:32:15 +0100
Message-Id: <200903260833.n2Q8XIml095331@smtp-vbr5.xs4all.nl>
To: "'Ezzat, Ahmed'" <Ahmed.Ezzat@hp.com>, "'Ivan Mikhailov'" <imikhailov@openlinksw.com>, "'Seaborne, Andy'" <andy.seaborne@hp.com>
Cc: "'Steve Harris'" <steve.harris@garlik.com>, "'SPARQL Working Group'" <public-rdf-dawg@w3.org>

From: public-rdf-dawg-request@w3.org [mailto:public-rdf-dawg-request@w3.org]
On Behalf Of Ezzat, Ahmed
Sent: Wednesday, March 25, 2009 10:37 PM
To: Orri Erling; 'Ivan Mikhailov'; Seaborne, Andy
Cc: 'Steve Harris'; 'SPARQL Working Group'
Subject: RE: On Parameters






Wanted to introduce myself as new member if this WG from HP.  My background
covers many areas but SQL is a big part of my activities. Recently data
integration is attracting me and hence semantic web.


Here is my input on this thread:

I definitely support having an array/vector (we call it in our database "row
set") as parameters.  In SQL it proved to be useful and almost all databases
support that at the CLI level as Orri mentioned.


Regarding effectively prepared statement vs query caching.  In our
experience if you have good query cache the value of prepared statement is
minimal (not worth it).  However there are some differences at least in our
MPP environment between the two capabilities:


1.	For prepared statement, all tables remain open after query
completion while with query cache that typically is not true.
2.	For prepared statement downloading query fragments to appropriate
query execution processes is done once while with query cache you will need
to IPC the query fragments every time you reuse the same query plan from the


In our experience, efficient query cache is good enough...











Yes, query caching delivers essentially the same value as prepared

More precisely, this is so in the SQL world where parameters and array
parameters are an accepted fact of life.  When getting the query, it is
enough to check if the same text has been seen before.  Then these caches
have to be invalidated on schema change or rrecomputing statistics.


Here this is a bit trickier, because without any notion of parameters in the
language, an end ppoint would have to read a bunch of pipelined requests,
parse them all, find that they differ only in literals, compile if not
already in cache, and then execute with the equivalent of array parameters
internally.  The internal array parameter part is for clusters only.  Such a
reuse logic is more expensive than just checking if the text has been seen


I may do some measurements next week to get numbers.  But it is safe to say
that in SQL, for single row operations, using array parameters in the CLI
can  easily be 50-100x faster than not.  For SPARQL, we'll have to see the
ratios of network latency, parse, 

optimize, execute with the type of short queries that SPARQL federated
joining will create.






Received on Thursday, 26 March 2009 08:34:00 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:00:56 UTC