Re: PROV-DICTIONARY internal review for first public working draft (ISSUE-614)

Apologies for my slightly late review which you will find below.

As usual my review ended up massive - so apologies if this reads very
negative - it is more like a red-pen exercise.

In fact I am quite impressed with the specification - it reads very
well, and the semantics is very solid (specially compared to earlier
back-and-forths on collections).  Well done!

See below for detailed review.

On Wed, Jan 16, 2013 at 8:17 AM, Tom De Nies <> wrote:

> The latest editor's draft is here:

> Questions for reviewers
> - Is the notation of Dictionary concepts clear & acceptable for you? (in


> - Are the constraints acceptable, or are they too loose/too strict?
> -- In particular, can the constraint "IF derivedByRemovalFrom(d2, d1,
> {"k1"}) THEN hadDictionaryMember(d1, e1, "k1") " be dropped, or do you
> strongly support it?

> - Is the name PROV-DICTIONARY appropriate for the document?


> - Can this be released as a first public working draft?


> - If not, where are the blocking issues?

>From below: 1, 8b, 9, 11, 12, 15, 16, 17, 19, 20, 29, 34, 39, 41, 46, 47, 48, 49

> - If yes, are there other issues to work on?

See below. These are mainly editorial/presentational.

> In your review please include ISSUE-614


Please find below detailed review:

> 2.1 Dictionary  membership
>     key: a key key_1 that is associated with the specified entity;

1) This should specify that the key is a *Value* as according to
PROV-DM - as it is specified
it could be interpreted as also an identifier - although the PROV-N
example uses a "string value".   (I know it is confusing because DM
for some reason chose to call this Value rather than the more
appropriate term Literal - but here in a dictionary the actual 'value'
is the Entity (identifier), and the key is a (literal) Value!
Therefore this should be clarified with DM hyperlink.)

> Note that dictionary membership implies collection membership, but not vice versa.

2) This should link forward to inference 1

3) Also I don't quite understand this.  So a prov:Dictionary kind of
collection can have members that don't have keys?

entity(d, [prov:type='prov:Dictionary' ])
// implies:
entity(d, [prov:type='prov:Collection ])

hadDictionaryMember(d, e1, "k1")
// implies:
hadMember(d, e1)

// But what if we also see?
hadMember(d, e3)
// are you saying this would NOT imply the below?
hadDictionaryMember(d, e3, ?unknownKey)

If so then I am a bit confused - a prov:Dictionary to be useful should
be a constrained prov:Collection in which every member is associated
with a key. This should be added to the Conceptual Definition of
Dictionary above.

If there is no such implication (of course the key is unknown until
stated otherwise), I am not sure in which cases such a data type could
be useful. It would be like describing an array type of collection,
but where some items are allowed to not have a position.  (which is
quite different from saying they have an unknown position!)

> > // d1 is a dictionary, with unknown content

4) Change to "with (so far) unknown content" ?   Both example 1 and 2
and others - specially those that later actually do state the members.

> 2.2 Dictionary insertion
>  d0 is the set { }
> ...

5) I thought d0 was a Dictionary, not a set.   s/set/dictionary/g here
and in other examples.

It is not simply a set of pairs, because we don't allow multiple
entries for same key.

> attributes: an optional set (attrs) of attribute-value pairs representing additional information about this relation.

6) Could one of the lines in example 3 perhaps include a
dcterms:description or something? Just because it can get confusing
with the new {("key", value)} syntax as opposed to the attribute

derivedByInsertionFrom(d2, d1, {("k3", e3)}, [ dcterms:description =
"An optional attribute" ] )

> 2.2 Dictionary Insertion
> 2.3 Dictionary Removal
> In particular, no assumptions are needed regarding the mutability of a data structure that is subject to updates

7) So from the examples given, it seems that both insertion and
removal are "complete" statements - for instance after

  derivedByRemovalFrom(d3, d2, {"k1", "k3"})

we conclude that "k2" remains as a key in d3.

I very much prefer this semantics, rather than a very earlier draft,
where members could come and go out of the blue and you never really
knew much. I think also this is OK as this is a specific
prov:Dictionary thing rather than a general thing about

Therefore 2.2 and 2.3 should clarify this and change

> key-entity-set: the inserted key-entity pairs (
> key-entity-set: all inserted key-entity pairs (

> key-set: a set of deleted keys (..)
> key-set: a set of all deleted keys (..)

Inference 7 (insertion-removal-membership) makes this explicit (yay!)
- so a forward reference would be good.

3. PROV-N Notation of Dictionary Concepts

8a) This section should start with a paragraph explaining that
PROV-DICTIONARY specifies an extension according to PROV-N Extension
chapter - - but using the
same namespace

8b) The extension grammar should be downloadable and linked to from
introduction of PROV-N notation.

9) However, to be a valid extension, the terms must be given as a
 QUALIFIED_NAME.  With the proposed syntax in this document - the
QNAMEs are in the default name space, which can be customized in
- and therefore easily overlap with other extensions.

It is not explicit in PROV-N (perhaps it should be!) - but a missing
default namespace declaration would make the QNAMEs invalid.

> A qualified name's prefix is optional. If a prefix occurs in a qualified name, the prefix must refer to a namespace declared in a namespace declaration. In the absence of prefix, the qualified name belongs to the default namespace.

(Side note: PROV-N should require that PN_PREFIX is a declared prefix
in the current scope - and that if it is empty, that the default
namespace has been declared)

The prefix prov: is reserved in PROV-N.

I therefore suggest to rename all the Dictionary concepts to qualified
names using prov:* - that is:


Thus we would avoid the embarrassing situation of PROV-Dictionary-N
documents not being valid PROV-N documents - while documents using
other extensions would be.

10) This section paragraph should mention that prov:Dictionary and
prov:EmptyDictionary is declared using regular entity statements (with
example). This could be inserted before 3.1.

3.1 Membership  (PROV-N)

> membershipExpression ::= 'hadDictionaryMember' '(' dIdentifier ',' entity ',' key ')'
> key  key (Non-Terminal)

11) "key" is not a defined Non-Terminal - neither here nor in PROV-N
document or grammar.

"dIdentifier" is similarly a new non-Terminal that must be defined
here. (presumably it is an entity which is a prov:Dictionary type).

Combined with issue #1 this is very confusing, and so I have made this
and equivalent below a blocker.

Use the style of the PROV-N document:

.. here it also introduces optionalAttributeValuePairs, etc.

12) This is not valid grammar according to the EBNF section of PROV-N
as it uses 'single quotes' rather than "double"

Change this and following declaration to style with double quotes:

> membershipExpression ::= "hadDictionaryMember" "(" dIdentifier "," entity "," key ")"

13) For some reason I was not able to copy-paste the ::== line  above
correctly from neither Firefox nor Chrome - the 'single quotes'
disappeared! Why? CSS? Ensure quotes are regular characters.

> Example 6
> In this example, d is a dictionary known to have e0, e1, and e2 as members, and may have other members.
>  entity(e0)
> ..
> hadDictionaryMember(d, e0, "k0")

14) I think this and following examples can be shortened - you should
not explain the PROV Dictionary semantics again in this section and
don't need to repeat the full examples - only the syntax of the
particular statement. So change style to only:

> Example 6
> // e0 is member in d under key "k0"
> hadDictionaryMember(d, e0, "k0")

> 3.2 Insertion
>  derivationByInsertionFromExpression ::= derivedByInsertionFrom ( identifier , dIdentifier , dIdentifier , { keyValuePairs } optional-attribute-values )

15) Non-Terminals not matching table below: identifier,
keyValuePairs, optional-attribute-values
Change to match table.

16) Not defined: dIdentifier, keyEntitySet/keyValuePairs

Remember that for keyValuePairs the first pair is not optional.

17) as the identifiers is optional, the separator should be ";" not
",". This also applies to  derivedByRemovalFrom

18) Hyperlinks for prov-n terminals like 'identifier' are wrong - they
go to non-existing local anchors rather than to PROV-N - like

> 3.3 Removal
> derivationByRemovalFromExpression ::= derivedByRemovalFrom ( identifier , dIdentifier , dIdentifier , { keySet } optional-attribute-values )

19) optional-attribute-values -> optionalAttributeValuePairs

> key-set keySet

20) Undefined terminal keySet

Remember that the first key is not optional.

> The following table summarizes how each constituent of a PROV-DM Membership maps to a non-terminal.
> The following table summarizes how each constituent of a PROV-DM Insertion maps to a non-terminal.
> The following table summarizes how each constituent of a PROV-DM Removal maps to a non-terminal.

21) PROV-DM -> PROV-DICTIONARY (or simply Dictionary)

> 4. Ontological definition of dictionary

Note that I have only checked the Turtle syntax here 'by hand' - they
should be formally checked programmatically.

>     prov:hadDictionaryMember [
>       a prov:KeyValuePair;
>       prov:key   "k1"^^xsd:string
>    prov:value :e1

22) Fix indentation of prov:value  (do NEVER use TAB character in such
examples as it has inconsistent rendering)

23) Missing ; after xsd:string (invalid Turtle syntax)

> A dictionary may be empty, in which case it should be described as an instance of the subclass prov:EmptyDictionary.

24) This makes sense regardless of syntax - can a similar statement be
made in section 2.?

>  PROV-O provides two kinds of involvements


>  prov:qualifiedInsertion is used to  (..)
> prov:qualifiedRemoval is used to specify  (..)

26) Text should relate/hyperlink this to the insertion/removal sections earlier.

> prov:Dictionary

>  back to collections classes

27) hyperlink from headers - does not work. Remove or change to link
to "These terms are used" section.

>    :wasCreatedBy    :bob;

28) Remove this as it is potentially confusing with
prov:wasAttributedTo  (or just change to prov:wAT)

> described with properties
>    prov:derivedByInsertionFromop prov:qualifiedRemovalop prov:qualifiedInsertionop prov:derivedByRemovalFromop

29) Somehow the most important property, prov:hadDictionaryMember
seems to be missing here!

>  prov:EmptyDictionary
> :d1 a prov:Dictionary;
>   prov:derivedByInsertionFrom :d;

30) I find it confusing that this example uses most lines to talk
about a non-empty dictionary.  Simplify to just show the boring:

>  :d  a prov:EmptyDictionary .

> An empty dictionary.

31) Could you add the obvious "i.e. has no members"

>  prov:insertedKeyValuePair [
>         a prov:KeyValuePair;
>         prov:key   "k1"^^xsd:string;
>         prov:value :e1;
>     ]

32) I know we had discussions about this before (when this was in
PROV-O) as one can consider either the value or the keyvaluepair to be
what is inserted. But this is getting quite verbose..
inserterkeyvaluepair keyvaluepair key and value! Puh!

How about simplifying to:

 prov:inserted [
         a prov:KeyValuePair;
         prov:key   "k1"^^xsd:string;
         prov:value :e1;

.. or possibly prov:insertedPair  ?

This is similar to how the insertion is a prov:Insertion and not a

I would keep prov:removedKey as it's important to know it's the *key*
that was removed.

> prov:Removal
> removing one or more key-value pairs

33) Add ", specified by prov:removedKey"

(as you don't specify the removed key/value pairs)

> prov:dictionary
> has domain
>       prov:Insertion
>       prov:Removal

34) This specifies that the domain is the intersection of Insertion
and Removal  - thus any Insertion with a prov:dictionary becomes also
a Removal etc. This is surely wrong.

Formally you should specify this as the domain of the union of
Insertion and Removal - alternatively as a common "abstract"
superclass prov:DictionaryInfluence.


As the union would take the extension out of OWL-RL, and there already
is precedence for other kind of influences, I would suggest making a
new prov:DictionaryInfluence which can be superclass of Insertion and
Removal, and in domain of prov:dictionary.

35)  Where can I find a download link for the OWL of this PROV
Dictionary extension? I expected to find it in te beginning of section
4. I would also expect section 4 to have an explanation of the
namespaces and how importing <> one would
get both PROV-O and extensions like PROV-Dictionary, while
<> would give only PROV-O.  I would then
also expect a separate versionIRI to import if I only wanted
PROV-Dictionary - like <>

> prov:qualifiedInsertion
> If this Dictionary prov:derivedByInsertionFrom another Dictionary :e, then it can qualify how it did perform the Insertion using prov:qualifiedInsertion [ a prov:Insertion; prov:dictionary :e; prov:inserted [a prov:KeyValuePair; prov:key "k1"^^xsd:string; prov:value :foo] ].

36) I know I probably wrote this text - but without any indentation it
is very difficult to read. Could this be simplified to some kind of
mirror of prov:derivedByInsertionFrom? Same for prov:qualifiedRemoval.

37) example for qualifiedInsertion should not detail the members of
our-old-baseball-team-field-positions as that is confusing and makes
example large.

> 5. XML Schema Dictionary
> In this section, we provide the XML Schema to use dictionaries with the [PROV-XML] serialization.

Ironically, you don't (only fragments) - where can I find and download
the XML Schema?

Note that I have not checked the syntax of XSD statements in this
section - if I had the schema file I could have validated it.

38) Modify to "This section details how to describe dictionaries with
the [PROV-XML] serialization".

39) Provide download link for XML schema - also include usage
instructions as per imports etc. (equivalent as for OWL imports above)

40) Rename section to "PROV-XML representation of Dictionary" -
similarly section 4 from "Ontological Definition of Dictionary" to
"PROV-O representation of Dictionary"

> 5.2 Key-Value Pair
> <xs:element name="key" type="xs:String" />

41) The specification earlier specifies key as any value/literal - so
xs:string (lowercase S!) is too specific - it would not allow
integers, for instance. Change to xs:anySimpleType here and in 5.5
> <xs:element name="key" type="xs:String" maxOccurs="unbounded" />

>  members of a dictionary are specified by listing all the key-value pairs inside a prov:DictionaryMembership element

42) Remove "all the" -- according to semantics of  hadDictionaryMember

> 6. Constraints Associated with Dictionary

This section is very solid, good work!

> These inferences and constraints need to be applied to obtain valid provenance when using dictionaries.

43) I don't think they *need to* be applied? Rephrase to "MAY be
applied in order to ensure valid provenance".

I think a similar kind of disclaimer as in (or just linking to
this) could be useful, so that it's clear that these constraints are
not a MUST to use dictionaries.

>  Inference 1 (dictionary-membership-collection-membership)
>  IF hadDictionaryMember(d, e1, "k1") THEN hadMember(d, e1)

44) I would also add the inverse inference dictionary-all-members-have-keys:

> Inference X (dictionary-all-members-have-keys)
> Here k is a (potentially unknown) key
> IF hadMember(d, e1) AND 'prov:Dictionary' ∈ typeOf(d)
> THEN hadDictionaryMember(d, e1, k)

If we don't specify this, opens for a prov:Dictionary to also contain
entities which don't have a key. This could be confusing. For instance
this would be valid:

entity(d0, [prov:type='prov:EmptyDictionary'])
entity(d1, [prov:type='prov:Dictionary'])
entity(d2, [prov:type='prov:Dictionary'])

derivedByInsertionFrom(d1, d0, {("k1", e1)})
// implied insertion-membership
//hadDictionaryMember(d1, e1, "k1")
// implied by dictionary-membership-collection-membership:
//hadMember(d1, e1)

derivedByRemovalFrom(d2, d1, {"k1"})
// Invalid by impossible-removal-membership:
//hadDictionaryMember(d2, e1, "k1")

// however this is still valid ..
hadMember(d2, e1)

// Unless we add:
entity(d2, [prov:type='prov:EmptyDictionary'])
// which implies
//entity(d2, [prov:type='prov:EmptyCollection'])
//which is invalid by constraint membership-empty-collection in PROV-CONSTRAINTS

// However, we might now still add this confusing statement:
hadMember(d1, e2)

// By the PROV-DICTIONARY constraints (including the completeness
constraint impossible-insertion-insertion and
membership-insertion-membership) we would not here expect any other
keys than "k1" in d1 - so we SHOULD be able to conclude that e2==e1 -
however we can't do that unless we introduce my inference
// dictionary-all-members-have-keys implies
//hadDictionaryMember(d1, e2, k)
// and by key-single-entity and membership-insertion-membership and
//e2 == e1

>  IF hadDictionaryMember(d1, e, "k") and derivedByInsertionFrom(d2, d1, {("k1", e1),...,("kn", en)}) and k ∉ {k1,...,kn} THEN hadDictionaryMember(d2, e, "k")

45) Don't remove the quotes: "k" ∉ {"k1",...,"kn"}

>  IF derivedByRemovalFrom(d2, d1, {"k1",...,"kn"}) and derivedByInsertionFrom(d2, d1, {("k1", e1),...,("km",em)})THEN INVALID

46) This implies that "k1", etc. needs to be on both sides (Implying
an intersection/containment of keys). This constraint is true no
matter the key/value pairs, so change to same style as below:

> Here, KV1 and KV2 are sets of key-entity pairs.
>  IF derivedByRemovalFrom(d2, d1, KV1) and derivedByInsertionFrom(d2, d1, KV2) THEN INVALID

> 6.3 Typing

> IF entity(d, [prov:type='prov:Dictionary']) THEN 'prov:Dictionary' ∈ typeOf(d) and 'prov:Collection' ∈ typeOf(d)
> IF entity(d, [prov:type='prov:EmptyDictionary']) THEN 'prov:EmptyDictionary' ∈ typeOf(d) and 'prov:Dictionary' ∈ typeOf(d)

47) The second rule would not cause firing of the first rule, so
expand the second rule. Also include prov:EmptyCollection and entity.

> IF entity(d, [prov:type='prov:EmptyDictionary']) THEN 'prov:EmptyDictionary' ∈ typeOf(d) and 'prov:Dictionary' ∈ typeOf(d) and 'prov:Collection' ∈ typeOf(d) and 'prov:EmptyCollection' ∈ typeOf(d)

48) Both lines must also include:
 'entity' ∈ typeOf(c) PROV-Constraint "typing" would not fire from a typeof prov:Collection.

> IF hadDictionaryMember(d, e, "k") THEN 'prov:Dictionary' ∈ typeOf(d) and 'entity' ∈ typeOf(e)
> IF derivedByInsertionFrom(d2, d1, {("k1", e1)}) THEN 'prov:Dictionary' ∈ typeOf(d1) and 'prov:Dictionary' ∈ typeOf(d2) and 'entity' ∈ typeOf(e1)
> IF derivedByRemovalFrom(d2, d1, {"k1"}) THEN 'prov:Dictionary' ∈ typeOf(d1) and 'prov:Dictionary' ∈ typeOf(d2)

49) Similarly all of these should also include
> AND prov:Collection' ∈ typeOf(c) AND 'entity' ∈ typeOf(c)

==== END OF REVIEW ===

Now relax..! :)

Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester

Received on Thursday, 24 January 2013 15:06:31 UTC