Re: PROV-ISSUE-311 (clarify-optionals): Clarify optional arguments in DM [prov-dm]

('binary' encoding is not supported, stored as-is)
>> 
>> I think there are still ambiguity problems, though:
>> 
>> wasGeneratedBy, wasStartedBy and wasEndedBy have independent optional id,  activity, time  and attribute arguments, along with a constraint that one of activity, time and attrs must be present.  So how do I parse:
>> 
>> wasGeneratedBy(x,y,attrs)
>> 
>> where both x and y are identifiers?  It could mean
>> 
>> x is a generation id and y is an entity (the generated entity)
>> or
>> x is an entity and y is an activity (that generated x)
>>   
> 
> Looking at the production again:
> 
> wasGeneratedBy' '(' ((id0=identifier | '-') ',')? id2=identifier ',' ((id1=identifier) | '-') ',' ( time | '-' ) optionalAttributeValuePairs ')'
> 
> wasGeneratedBy(x,y,attrs)
> 
> does not parsed.
> 
> 
> You need to write
> wasGeneratedBy(id2,id1,-)
> wasGeneratedBy(id2,id1,t)
> 
> or
> wasGeneratedBy(id0, id2,id1,-)
> wasGeneratedBy(id0, id2,id1,t)
> 


OK; sorry again; I did not notice that the production had changed.  In this case, it rules out examples such as

wasGeneratedBy(id,x,y)

which we use in examples extensively [1]


> 
>> Still seems ambiguous.  LL parsing will take the first parse, which means that if the id is omitted we always have to say so explicitly:
>> 
>> wasGeneratedBy(-,e,a,attrs)
>> 
>> which seems suboptimal to me, and there are many examples where we don't do this for the short form of generation.  I suppose we could patch this by saying that if attrs is present then the id also has to be.
>> 
>> Association has this problem too, but worse:
>> 
>> wasAssociatedWith(x,y)
>> 
>> could be interpreted as
>> x is an id and y is an activity
>> x is an activity and y is an agent
>> x is an activity and y is an entity
>>   
> 
> Again, with the production:
>    :    'wasAssociatedWith' '('  ((id0=identifier | '-') ',')? a=identifier ',' (ag=identifier | '-') ',' (pl=identifier | '-') optionalAttributeValuePairs ')'
> 
> 
> wasAssociatedWith(x,y) does not parse.
> 
> 
> You need to write, eg
> wasAssociatedWith(a,ag,pl)
> wasAssociatedWith(a,-,pl)

>> 
>> So I still advocate having simple, orthogonal rules for optional arguments:
>> 
>> - id is at the beginning and followed by a different symbol (say, semicolon) if present, to make it trivial to see whether there's an id present;
>> - attrs are in brackets if present (which is already fine);
>> - other optional attributes are either all omitted (short form) or all given, with missing ones as '-'
>> 
>> 
>> 
>>   
> 
> So the grammar, as far as I can see, is not ambiguous.

Correct, I was misreading it, apologies again.

I was also assuming the grammar parses all of the examples in the various documents.  

This seems not to be the case after all (see [1] ), and it is a more significant problem that we should solve first :)



> However, it may not be systematic enough (a comment that Paolo made).
> Also, for instance, it forces us to have t or - for time in generation.

Treating the "positional" optional arguments in the all-or-nothing way (which wasDerivedFrom and others already do) would fix this, while allowing short forms of wasGeneratedBy and friends.  Right now, they are treated in an "all-or-all" way that seems incompatible with many examples


> 
> 
> If people feel that a syntactic marker for the identifier would facilitate readability, I am fine to support it.
> 
> Maybe : (assuming this doesn't cause problems with the colon in qualified names)
> 
> wasGeneratedBy(id  : e, a,t)
> 

I believe using colon for this would cause lots of problems with qualified names:

does 

wasGeneratedBy(prov:ex:foo, a)

mean

wasGeneratedBy(prov:ex, foo, a)

or 

wasGeneratedBy(prov, ex:foo, a)

???

I certainly wouldn't insist on ';', but are there compelling arguments for using colon, which is already in the grammar (and widely used for namespaces), instead of some arbitrary symbol that is not already used?  Semicolon was just a suggestion, but perhaps there is too much danger of confusion if the distinction between semicolon and comma isn't clear.

Another possibility could be to prefix the id argument with a special character such as @ in order to make its special role as the optional first-argument identifier clear:

wasGeneratedBy(@id,e,a,-)
wasGeneratedBy(e,a,-)

instead of 
wasGeneratedBy(id,  e,a,-)
wasGeneratedBy(e,a,-)


--James

[1] http://dvcs.w3.org/hg/prov/raw-file/default/primer/Primer.html#prov-n-examples
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Received on Thursday, 19 April 2012 12:27:36 UTC