- From: Steve Harris <steve.harris@garlik.com>
- Date: Thu, 2 Dec 2010 13:51:05 +0000
- To: Andy Seaborne <andy.seaborne@epimorphics.com>
- Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
On 2010-12-02, at 11:38, Andy Seaborne wrote:
>>>
>>> Maybe we can specify RAND(seed) by simply saying that it will generate a pseudorandom sequence with the suggestion ("SHOULD") generate the same sequence on each run as a debugging aid. This decouples it from solution sequences.
>>
>> A "SHOULD" is probably a good idea. It's not just a debugging aid though, it's for repeatability generally.
>>
>>> An implementation can be simply a random number generator like srand(N).
>>
>> I'm not sure who's / which srand(n) you're referring to.
>
> This one:
>
> http://www.gnu.org/s/libc/manual/html_node/ISO-Random.html
>
>> The key thing is that you get the same return value twice if you do something like:
>> FILTER(RAND(1)> 0.5&& RAND(1)< 0.6)
>
> For me, that's not necessary. For predictability, all I require is that each call of RAND(seed) returns the same number at the same point in execution across runs.
>
> Maybe I don't understand RAND for SQL well enough but I thought that RAND() returns different numbers in
>
> FILTER(RAND()> 0.5&& RAND()< 0.6)
It does, but not if you provide a seed number, the seed gives you a new number, per row.
> (if you want the same number assign it in some way)
SQL doesn't have per-row assignment, and it's going to be problematic in SPARQL (see below)
> As RAND() returns different numbers, so
>
> FILTER(RAND(1)> 0.5&& RAND(1)< 0.6)
>
> should, just the same numbers at the same invocation count every run.
That doesn't make me comfortable.
The implementation in SQL is something like: [in very naive terms, obviously]
srand(row_num + seed);
return (double)rand() / (double)RAND_MAX+1.0;
Otherwise you have issues about execution order, which might not be stable between executions, or even execution phases.
Also,
OPTIONAL {
?x :a ?y
FILTER(RAND(1) < 0.5)
}
OPTIONAL {
?s :b ?z
FILTER(RAND(1) < 0.5)
}
Is going to have both undesirable, and unpredictable behaviour.
BIND(RAND(1) AS ?r)
OPTIONAL {
?x :a ?y
FILTER(?r < 0.5)
}
...
Won't work, because of the scoping, right?
You could do something with nested OPTIONALs, but anyone who's familiar with SQL's behaviour is not going to be very happy.
- Steve
--
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203 http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
Received on Thursday, 2 December 2010 13:51:41 UTC