Re: PROV-DICTIONARY internal review for first public working draft (ISSUE-614) from Stian Soiland-Reyes on 2013-01-24 (public-prov-wg@w3.org from January 2013)

From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Date: Thu, 24 Jan 2013 15:05:42 +0000
To: Tom De Nies <tom.denies@ugent.be>
Cc: Provenance Working Group <public-prov-wg@w3.org>
Message-ID: <CAPRnXtm4-DrMzt_J-p-u85sPhRbvRSGf1qR1QQ+AtYUtjkPKow@mail.gmail.com>
Apologies for my slightly late review which you will find below.

As usual my review ended up massive - so apologies if this reads very
negative - it is more like a red-pen exercise.



In fact I am quite impressed with the specification - it reads very
well, and the semantics is very solid (specially compared to earlier
back-and-forths on collections).  Well done!


See below for detailed review.

On Wed, Jan 16, 2013 at 8:17 AM, Tom De Nies <tom.denies@ugent.be> wrote:

> The latest editor's draft is here:
> https://dvcs.w3.org/hg/prov/raw-file/default/dictionary/prov-dictionary.html#dictionary-xml-schema

> Questions for reviewers
> - Is the notation of Dictionary concepts clear & acceptable for you? (in
> PROV-N, PROV-O and/or PROV-XML)

Yes

> - Are the constraints acceptable, or are they too loose/too strict?
> -- In particular, can the constraint "IF derivedByRemovalFrom(d2, d1,
> {"k1"}) THEN hadDictionaryMember(d1, e1, "k1") " be dropped, or do you
> strongly support it?

> - Is the name PROV-DICTIONARY appropriate for the document?

Yes

> - Can this be released as a first public working draft?

No

> - If not, where are the blocking issues?

>From below: 1, 8b, 9, 11, 12, 15, 16, 17, 19, 20, 29, 34, 39, 41, 46, 47, 48, 49


> - If yes, are there other issues to work on?

See below. These are mainly editorial/presentational.


> In your review please include ISSUE-614

Done!

Please find below detailed review:

> 2.1 Dictionary  membership
>     key: a key key_1 that is associated with the specified entity;

1) This should specify that the key is a *Value* as according to
PROV-DM http://www.w3.org/TR/prov-dm/#term-value - as it is specified
it could be interpreted as also an identifier - although the PROV-N
example uses a "string value".   (I know it is confusing because DM
for some reason chose to call this Value rather than the more
appropriate term Literal - but here in a dictionary the actual 'value'
is the Entity (identifier), and the key is a (literal) Value!
Therefore this should be clarified with DM hyperlink.)


> Note that dictionary membership implies collection membership, but not vice versa.

2) This should link forward to inference 1
(dictionary-membership-collection-membership)


3) Also I don't quite understand this.  So a prov:Dictionary kind of
collection can have members that don't have keys?

entity(d, [prov:type='prov:Dictionary' ])
// implies:
entity(d, [prov:type='prov:Collection ])

hadDictionaryMember(d, e1, "k1")
// implies:
hadMember(d, e1)

// But what if we also see?
hadMember(d, e3)
// are you saying this would NOT imply the below?
hadDictionaryMember(d, e3, ?unknownKey)

If so then I am a bit confused - a prov:Dictionary to be useful should
be a constrained prov:Collection in which every member is associated
with a key. This should be added to the Conceptual Definition of
Dictionary above.

If there is no such implication (of course the key is unknown until
stated otherwise), I am not sure in which cases such a data type could
be useful. It would be like describing an array type of collection,
but where some items are allowed to not have a position.  (which is
quite different from saying they have an unknown position!)


> > // d1 is a dictionary, with unknown content

4) Change to "with (so far) unknown content" ?   Both example 1 and 2
and others - specially those that later actually do state the members.



> 2.2 Dictionary insertion
>  d0 is the set { }
> ...

5) I thought d0 was a Dictionary, not a set.   s/set/dictionary/g here
and in other examples.

It is not simply a set of pairs, because we don't allow multiple
entries for same key.


> attributes: an optional set (attrs) of attribute-value pairs representing additional information about this relation.

6) Could one of the lines in example 3 perhaps include a
dcterms:description or something? Just because it can get confusing
with the new {("key", value)} syntax as opposed to the attribute
syntax.


derivedByInsertionFrom(d2, d1, {("k3", e3)}, [ dcterms:description =
"An optional attribute" ] )


> 2.2 Dictionary Insertion
> 2.3 Dictionary Removal
> In particular, no assumptions are needed regarding the mutability of a data structure that is subject to updates

7) So from the examples given, it seems that both insertion and
removal are "complete" statements - for instance after

  derivedByRemovalFrom(d3, d2, {"k1", "k3"})

we conclude that "k2" remains as a key in d3.

I very much prefer this semantics, rather than a very earlier draft,
where members could come and go out of the blue and you never really
knew much. I think also this is OK as this is a specific
prov:Dictionary thing rather than a general thing about
prov:Collection's.


Therefore 2.2 and 2.3 should clarify this and change

> key-entity-set: the inserted key-entity pairs (
to
> key-entity-set: all inserted key-entity pairs (

> key-set: a set of deleted keys (..)
to
> key-set: a set of all deleted keys (..)

Inference 7 (insertion-removal-membership) makes this explicit (yay!)
- so a forward reference would be good.




3. PROV-N Notation of Dictionary Concepts

8a) This section should start with a paragraph explaining that
PROV-DICTIONARY specifies an extension according to PROV-N Extension
chapter - http://www.w3.org/TR/prov-n/#extensibility - but using the
same namespace

8b) The extension grammar should be downloadable and linked to from
introduction of PROV-N notation.


9) However, to be a valid extension, the terms must be given as a
 QUALIFIED_NAME.  With the proposed syntax in this document - the
QNAMEs are in the default name space, which can be customized in
PROV-N - http://www.w3.org/TR/prov-n/#expression-NamespaceDeclaration
- and therefore easily overlap with other extensions.

It is not explicit in PROV-N (perhaps it should be!) - but a missing
default namespace declaration would make the QNAMEs invalid.

> A qualified name's prefix is optional. If a prefix occurs in a qualified name, the prefix must refer to a namespace declared in a namespace declaration. In the absence of prefix, the qualified name belongs to the default namespace.

(Side note: PROV-N should require that PN_PREFIX is a declared prefix
in the current scope - and that if it is empty, that the default
namespace has been declared)


The prefix prov: is reserved in PROV-N.

I therefore suggest to rename all the Dictionary concepts to qualified
names using prov:* - that is:

prov:hadDictionaryMember
prov:derivedByInsertionFrom
prov:derivedByRemovalFrom

Thus we would avoid the embarrassing situation of PROV-Dictionary-N
documents not being valid PROV-N documents - while documents using
other extensions would be.


10) This section paragraph should mention that prov:Dictionary and
prov:EmptyDictionary is declared using regular entity statements (with
example). This could be inserted before 3.1.



3.1 Membership  (PROV-N)

> membershipExpression ::= 'hadDictionaryMember' '(' dIdentifier ',' entity ',' key ')'
> key  key (Non-Terminal)

11) "key" is not a defined Non-Terminal - neither here nor in PROV-N
document or grammar.

"dIdentifier" is similarly a new non-Terminal that must be defined
here. (presumably it is an entity which is a prov:Dictionary type).


Combined with issue #1 this is very confusing, and so I have made this
and equivalent below a blocker.


Use the style of the PROV-N document:

http://www.w3.org/TR/prov-n/#expression-Entity

.. here it also introduces optionalAttributeValuePairs, etc.



12) This is not valid grammar according to the EBNF section of PROV-N
as it uses 'single quotes' rather than "double"

http://www.w3.org/TR/prov-n/#grammar-notation

Change this and following declaration to style with double quotes:

> membershipExpression ::= "hadDictionaryMember" "(" dIdentifier "," entity "," key ")"



13) For some reason I was not able to copy-paste the ::== line  above
correctly from neither Firefox nor Chrome - the 'single quotes'
disappeared! Why? CSS? Ensure quotes are regular characters.




> Example 6
>
>
> In this example, d is a dictionary known to have e0, e1, and e2 as members, and may have other members.
>  entity(e0)
> ..
> hadDictionaryMember(d, e0, "k0")


14) I think this and following examples can be shortened - you should
not explain the PROV Dictionary semantics again in this section and
don't need to repeat the full examples - only the syntax of the
particular statement. So change style to only:

> Example 6
> // e0 is member in d under key "k0"
> hadDictionaryMember(d, e0, "k0")


> 3.2 Insertion
>  derivationByInsertionFromExpression ::= derivedByInsertionFrom ( identifier , dIdentifier , dIdentifier , { keyValuePairs } optional-attribute-values )

15) Non-Terminals not matching table below: identifier,
keyValuePairs, optional-attribute-values
Change to match table.

16) Not defined: dIdentifier, keyEntitySet/keyValuePairs

Remember that for keyValuePairs the first pair is not optional.


17) as the identifiers is optional, the separator should be ";" not
",". This also applies to  derivedByRemovalFrom


18) Hyperlinks for prov-n terminals like 'identifier' are wrong - they
go to non-existing local anchors rather than to PROV-N - like
http://www.w3.org/TR/prov-n/#prod-identifier


> 3.3 Removal
> derivationByRemovalFromExpression ::= derivedByRemovalFrom ( identifier , dIdentifier , dIdentifier , { keySet } optional-attribute-values )

19) optional-attribute-values -> optionalAttributeValuePairs


> key-set keySet

20) Undefined terminal keySet

Remember that the first key is not optional.


> The following table summarizes how each constituent of a PROV-DM Membership maps to a non-terminal.
> The following table summarizes how each constituent of a PROV-DM Insertion maps to a non-terminal.
> The following table summarizes how each constituent of a PROV-DM Removal maps to a non-terminal.

21) PROV-DM -> PROV-DICTIONARY (or simply Dictionary)




> 4. Ontological definition of dictionary

Note that I have only checked the Turtle syntax here 'by hand' - they
should be formally checked programmatically.


>     prov:hadDictionaryMember [
>       a prov:KeyValuePair;
>       prov:key   "k1"^^xsd:string
>    prov:value :e1

22) Fix indentation of prov:value  (do NEVER use TAB character in such
examples as it has inconsistent rendering)


23) Missing ; after xsd:string (invalid Turtle syntax)


> A dictionary may be empty, in which case it should be described as an instance of the subclass prov:EmptyDictionary.

24) This makes sense regardless of syntax - can a similar statement be
made in section 2.?


>  PROV-O provides two kinds of involvements

25) PROV-O --> PROV-DICTIONARY

>  prov:qualifiedInsertion is used to  (..)
> prov:qualifiedRemoval is used to specify  (..)

26) Text should relate/hyperlink this to the insertion/removal sections earlier.


> prov:Dictionary

>  back to collections classes

27) hyperlink from headers - does not work. Remove or change to link
to "These terms are used" section.



>    :wasCreatedBy    :bob;

28) Remove this as it is potentially confusing with
prov:wasAttributedTo  (or just change to prov:wAT)


> described with properties
>    prov:derivedByInsertionFromop prov:qualifiedRemovalop prov:qualifiedInsertionop prov:derivedByRemovalFromop

29) Somehow the most important property, prov:hadDictionaryMember
seems to be missing here!


>  prov:EmptyDictionary
>
> :d1 a prov:Dictionary;
>   prov:derivedByInsertionFrom :d;

30) I find it confusing that this example uses most lines to talk
about a non-empty dictionary.  Simplify to just show the boring:

>  :d  a prov:EmptyDictionary .


> An empty dictionary.

31) Could you add the obvious "i.e. has no members"


>  prov:insertedKeyValuePair [
>         a prov:KeyValuePair;
>         prov:key   "k1"^^xsd:string;
>         prov:value :e1;
>     ]


32) I know we had discussions about this before (when this was in
PROV-O) as one can consider either the value or the keyvaluepair to be
what is inserted. But this is getting quite verbose..
inserterkeyvaluepair keyvaluepair key and value! Puh!

How about simplifying to:

 prov:inserted [
         a prov:KeyValuePair;
         prov:key   "k1"^^xsd:string;
         prov:value :e1;
      ]

.. or possibly prov:insertedPair  ?

This is similar to how the insertion is a prov:Insertion and not a
prov:DictionaryInsertion.

I would keep prov:removedKey as it's important to know it's the *key*
that was removed.


> prov:Removal
> removing one or more key-value pairs

33) Add ", specified by prov:removedKey"

(as you don't specify the removed key/value pairs)


> prov:dictionary
> has domain
>       prov:Insertion
>       prov:Removal

34) This specifies that the domain is the intersection of Insertion
and Removal  - thus any Insertion with a prov:dictionary becomes also
a Removal etc. This is surely wrong.


Formally you should specify this as the domain of the union of
Insertion and Removal - alternatively as a common "abstract"
superclass prov:DictionaryInfluence.

See PROV-O http://www.w3.org/TR/prov-o/#owl-profile

As the union would take the extension out of OWL-RL, and there already
is precedence for other kind of influences, I would suggest making a
new prov:DictionaryInfluence which can be superclass of Insertion and
Removal, and in domain of prov:dictionary.




35)  Where can I find a download link for the OWL of this PROV
Dictionary extension? I expected to find it in te beginning of section
4. I would also expect section 4 to have an explanation of the
namespaces and how importing <http://www.w3.org/ns/prov#> one would
get both PROV-O and extensions like PROV-Dictionary, while
<http://www.w3.org/ns/prov-o> would give only PROV-O.  I would then
also expect a separate versionIRI to import if I only wanted
PROV-Dictionary - like <http://www.w3.org/ns/prov-dictionary#>



> prov:qualifiedInsertion
> If this Dictionary prov:derivedByInsertionFrom another Dictionary :e, then it can qualify how it did perform the Insertion using prov:qualifiedInsertion [ a prov:Insertion; prov:dictionary :e; prov:inserted [a prov:KeyValuePair; prov:key "k1"^^xsd:string; prov:value :foo] ].

36) I know I probably wrote this text - but without any indentation it
is very difficult to read. Could this be simplified to some kind of
mirror of prov:derivedByInsertionFrom? Same for prov:qualifiedRemoval.


37) example for qualifiedInsertion should not detail the members of
our-old-baseball-team-field-positions as that is confusing and makes
example large.


> 5. XML Schema Dictionary
> In this section, we provide the XML Schema to use dictionaries with the [PROV-XML] serialization.

Ironically, you don't (only fragments) - where can I find and download
the XML Schema?

Note that I have not checked the syntax of XSD statements in this
section - if I had the schema file I could have validated it.


38) Modify to "This section details how to describe dictionaries with
the [PROV-XML] serialization".


39) Provide download link for XML schema - also include usage
instructions as per imports etc. (equivalent as for OWL imports above)


40) Rename section to "PROV-XML representation of Dictionary" -
similarly section 4 from "Ontological Definition of Dictionary" to
"PROV-O representation of Dictionary"


> 5.2 Key-Value Pair
> <xs:element name="key" type="xs:String" />

41) The specification earlier specifies key as any value/literal - so
xs:string (lowercase S!) is too specific - it would not allow
integers, for instance. Change to xs:anySimpleType here and in 5.5
Removal
> <xs:element name="key" type="xs:String" maxOccurs="unbounded" />



>  members of a dictionary are specified by listing all the key-value pairs inside a prov:DictionaryMembership element

42) Remove "all the" -- according to semantics of  hadDictionaryMember


> 6. Constraints Associated with Dictionary

This section is very solid, good work!


> These inferences and constraints need to be applied to obtain valid provenance when using dictionaries.

43) I don't think they *need to* be applied? Rephrase to "MAY be
applied in order to ensure valid provenance".

I think a similar kind of disclaimer as in
http://www.w3.org/TR/prov-constraints/#purpose (or just linking to
this) could be useful, so that it's clear that these constraints are
not a MUST to use dictionaries.


>  Inference 1 (dictionary-membership-collection-membership)
>  IF hadDictionaryMember(d, e1, "k1") THEN hadMember(d, e1)


44) I would also add the inverse inference dictionary-all-members-have-keys:

> Inference X (dictionary-all-members-have-keys)
>
> Here k is a (potentially unknown) key
>
> IF hadMember(d, e1) AND 'prov:Dictionary' ∈ typeOf(d)
> THEN hadDictionaryMember(d, e1, k)



If we don't specify this, opens for a prov:Dictionary to also contain
entities which don't have a key. This could be confusing. For instance
this would be valid:

entity(d0, [prov:type='prov:EmptyDictionary'])
entity(d1, [prov:type='prov:Dictionary'])
entity(d2, [prov:type='prov:Dictionary'])

derivedByInsertionFrom(d1, d0, {("k1", e1)})
// implied insertion-membership
//hadDictionaryMember(d1, e1, "k1")
// implied by dictionary-membership-collection-membership:
//hadMember(d1, e1)

derivedByRemovalFrom(d2, d1, {"k1"})
// Invalid by impossible-removal-membership:
//hadDictionaryMember(d2, e1, "k1")

// however this is still valid ..
hadMember(d2, e1)

// Unless we add:
entity(d2, [prov:type='prov:EmptyDictionary'])
// which implies
//entity(d2, [prov:type='prov:EmptyCollection'])
//which is invalid by constraint membership-empty-collection in PROV-CONSTRAINTS

// However, we might now still add this confusing statement:
hadMember(d1, e2)

// By the PROV-DICTIONARY constraints (including the completeness
constraint impossible-insertion-insertion and
membership-insertion-membership) we would not here expect any other
keys than "k1" in d1 - so we SHOULD be able to conclude that e2==e1 -
however we can't do that unless we introduce my inference
dictionary-all-members-have-keys:
// dictionary-all-members-have-keys implies
//hadDictionaryMember(d1, e2, k)
// and by key-single-entity and membership-insertion-membership and
membership-empty-collection:
//e2 == e1


>  IF hadDictionaryMember(d1, e, "k") and derivedByInsertionFrom(d2, d1, {("k1", e1),...,("kn", en)}) and k ∉ {k1,...,kn} THEN hadDictionaryMember(d2, e, "k")

45) Don't remove the quotes: "k" ∉ {"k1",...,"kn"}


>  IF derivedByRemovalFrom(d2, d1, {"k1",...,"kn"}) and derivedByInsertionFrom(d2, d1, {("k1", e1),...,("km",em)})THEN INVALID

46) This implies that "k1", etc. needs to be on both sides (Implying
an intersection/containment of keys). This constraint is true no
matter the key/value pairs, so change to same style as below:

> Here, KV1 and KV2 are sets of key-entity pairs.
>  IF derivedByRemovalFrom(d2, d1, KV1) and derivedByInsertionFrom(d2, d1, KV2) THEN INVALID



> 6.3 Typing

> IF entity(d, [prov:type='prov:Dictionary']) THEN 'prov:Dictionary' ∈ typeOf(d) and 'prov:Collection' ∈ typeOf(d)
> IF entity(d, [prov:type='prov:EmptyDictionary']) THEN 'prov:EmptyDictionary' ∈ typeOf(d) and 'prov:Dictionary' ∈ typeOf(d)

47) The second rule would not cause firing of the first rule, so
expand the second rule. Also include prov:EmptyCollection and entity.

> IF entity(d, [prov:type='prov:EmptyDictionary']) THEN 'prov:EmptyDictionary' ∈ typeOf(d) and 'prov:Dictionary' ∈ typeOf(d) and 'prov:Collection' ∈ typeOf(d) and 'prov:EmptyCollection' ∈ typeOf(d)


48) Both lines must also include:
 'entity' ∈ typeOf(c)

..as PROV-Constraint "typing" would not fire from a typeof prov:Collection.



> IF hadDictionaryMember(d, e, "k") THEN 'prov:Dictionary' ∈ typeOf(d) and 'entity' ∈ typeOf(e)
> IF derivedByInsertionFrom(d2, d1, {("k1", e1)}) THEN 'prov:Dictionary' ∈ typeOf(d1) and 'prov:Dictionary' ∈ typeOf(d2) and 'entity' ∈ typeOf(e1)
> IF derivedByRemovalFrom(d2, d1, {"k1"}) THEN 'prov:Dictionary' ∈ typeOf(d1) and 'prov:Dictionary' ∈ typeOf(d2)

49) Similarly all of these should also include
> AND prov:Collection' ∈ typeOf(c) AND 'entity' ∈ typeOf(c)


==== END OF REVIEW ===

Now relax..! :)


-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester
Received on Thursday, 24 January 2013 15:06:31 UTC