Re: Proposed RAND() defn from Steve Harris on 2010-12-01 (public-rdf-dawg@w3.org from October to December 2010)

From: Steve Harris <steve.harris@garlik.com>
Date: Wed, 1 Dec 2010 12:56:35 +0000
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-Id: <3FAAFA28-12D7-46B5-8FAB-F7D3676AB26B@garlik.com>

On 2010-11-29, at 21:49, Andy Seaborne wrote:
> On 29/11/10 12:38, Steve Harris wrote:
>> N.B. I'm not sure that a SQL-style definition of RAND(seed) is really
>> practical to define in a SPARQL context without changing a lot of other
>> things.
>> 
>> Though there's a Solution Sequence, there's nothing that requires the
>> SPARQL engine to execute FILTER expressions in any particular order so
>> far as I can tell. We could either drop this feature (not my
>> preference), or relax the wording — if this is an issue. Relaxing the
>> wording would make it hard to test. Thoughts?
>> 
>> - Steve
>> 
>> ----
>> 
>> RAND
>> 
>> The RAND function returns an xsd:double in the range [0,1), i.e. 0 ≤
>> RAND() < 1. The return value may be generated using some stochastic
>> process, or a pseudorandom sequence.
>> 
>> If RAND() is called with no arguments, then it returns a potentially
>> different random/psuedorandom value for each invocation.
>> 
>> If RAND() is called with a numeric argument, then the argument is used
>> as a seed value, returning a consistent value in [0,1) for each solution
>> in the solution sequence for which it is evaluated. Such that, for a
>> given seed RAND(seed) will return the same value whenever it's invoked
>> for evaluation of the first solution in the solution sequence, and a
>> possibly different value consistent value for the second solution, and
>> so on.
> 
> Maybe we can specify RAND(seed) by simply saying that it will generate a pseudorandom sequence with the suggestion ("SHOULD") generate the same sequence on each run as a debugging aid.  This decouples it from solution sequences.

A "SHOULD" is probably a good idea. It's not just a debugging aid though, it's for repeatability generally.

> An implementation can be simply a random number generator like srand(N).

I'm not sure who's / which srand(n) you're referring to.

The key thing is that you get the same return value twice if you do something like:
   FILTER(RAND(1) > 0.5 && RAND(1) < 0.6)

If an implementation can't be consistent with it's RAND(?n) results from execution run to execution run, that's maybe OK. I've only used RAND(s) in SQL a handful of times, so I'm not familiar with all the uses.

> Obviously, an implementation that changes execution plan based on external factors (e.g. load, RAM available or somethign smart like that) could potentially change the number of calls to RAND but that's a bit
> 
> (it already has to worry about RAND not being a strict function

- Steve

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD

Received on Wednesday, 1 December 2010 12:57:11 UTC