Re: PROV-ISSUE-613 (prov-aq-draft-review): Review paq for release as last call working draft [Accessing and Querying Provenance] from Stian Soiland-Reyes on 2013-01-17 (public-prov-wg@w3.org from January 2013)

From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Date: Thu, 17 Jan 2013 11:35:14 +0000
To: Provenance Working Group <public-prov-wg@w3.org>
Message-ID: <CAPRnXtmsvvbwe2kEc_KueUZfLGO6MXx=qv7Znhp7168bLxt5Xg@mail.gmail.com>

On Thu, Jan 10, 2013 at 2:56 PM, Provenance Working Group Issue
Tracker <sysbot+tracker@w3.org> wrote:
> PROV-ISSUE-613 (prov-aq-draft-review): Review paq for release as last call working draft [Accessing and Querying Provenance]
> https://dvcs.w3.org/hg/prov/raw-file/b3f397c7b15c/paq/prov-aq.html

Here is my partial review of the above document PROV-AQ.

Due to travelling and sick days I have not been able to review section
4, 5, 6, nor appendices.

1) Could we have a more detailed "Changes since last version"
appendix, like in our other documents?

1.1 Concepts

2) Why the term "Target-URI"? As far as I can understand, this is
"Entity-URI". It is only vaguely hinted that this is the identifier
for the prov:Entity I should be looking for.

1.2 Provenance and resources

3) These paragraphs talk about 'revisions' and 'versions'
interchangeably. In terms of provenance this can get a bit confusing.
I would use only the term "revision"

4) "must be persistent and not themselves dependent on context" -->
"must be persistent and must not themselves be dependent on context"

5) "In summary, a provenance description may be not universally
applicable to a resource, but may be expressed with respect to that
resource in a restricted context (e.g. at a particular time). This
restriction is itself just another resource (e.g. the weather forecast
for a give date as opposed to the current weather forecast), with its
own URI for referring to it within a provenance description. " - this
summary is I'm afraid more confusing then the previous 3 paragraphs.
Could this be written in a lighter language?

1.4 URI types and dereferencing

6) "Service-URI A provenance query service (i.e. a resource of type
prov:ProvenanceQueryService). "
You can't use "i.e." here - we've never heard about
prov:ProvenanceQueryService before. I don't think the type should be
listed here as that is specific to section 4. (and possibly 3.3
although it is not mentioned there).

7) "Provenance-URI A provenance description in the sense described by
[PROV-DM] (PROV Overview)."
I am uncertain as to what this mean. Does this mean a PROV structure
description - as given in PROV-DM, or any odd provenance description?
>From the feeling of the rest of the document I understand it is any
kind of provenance description, so I find the reference to PROV-DM odd
here. (I do recognize that we should say strongly that a PROV format
SHOULD be one of the formats - but this table is not the right place
for it)

2. Accessing provenance descriptions

8) " There is no requirement that a bundle identifier can be
dereferenced to access the corresponding provenance, but where
practical it is RECOMMENDED that matters be arranged so this is
possible. "
- although this is not a formal specification, I don't think we need
to write in 1850's legal English, so I would kindly request the
honourable gentlemen to provide a more directly specified
recommendation than "matters to be arranged".

9) " One possible realization of a bundle is that it is published as
part of an RDF Dataset [RDF-CONCEPTS11] or similar composite structure
containing multiple RDF graphs in a single document. To access such a
bundle would require accessing the RDF Dataset and then extracting the
identified component; this in turn would require knowing a URI or some
other way to retrieve the dataset. This specification does not
describe a specific mechanism for extracting components from a
document containing multiple graphs. "
- this sounds all very speculative and I don't see why this belongs in
here at all. The various PROV serializations to larger and smaller
extend already define how to represent PROV bundles.

3. Locating provenance descriptions

10) "If a provenance description is a resource that can be accessed
using web retrieval, one needs to know its provenance-URI to
dereference. If this is known in advance, there is nothing more to
specify. If a provenance-URI is not known then a mechanism to discover
one must be based on information that is available to the would-be
accessor."

- I don't understand this, and I don't understand why this is in the
document. Could we try to write the document more like a specification
rather than a philosophical "what-if" paper?

11) "provider is an agent that collects or constructs some
information and makes it available. The nature of the information or
the means by which it is made available are not constrained, but the
following discussion focuses on provenance descriptions made available
by HTTP transactions (i.e. where the provenance provider is an HTTP
server), "
-- Just simplify this to the same style as consumer:
"provider is an agent that makes available provenance descriptions"

I don't think we need to mention HTTP at all here, as only one of the
3 mechanisms deal with HTTP.

12) "We consider here mechanisms for a provider to indicate a
provenance-URI or service-URI along with a target-URI. "

This document is not a paper that considers things and reports results
- this is a specification on how to do things. Change to "We here
define"

13) "primary current web protocol and data formats" -> "current
primary web protocol and data formats"

14) " While a provider should avoid giving spurious information, there
are no fixed semantics, particularly when multiple resources are
indicated, and a client should not assume that a specific given
provenance-URI will yield information about a specific given
target-URI. In the general case, a client presented with multiple
provenance-URIs and multiple target-URIs should look at all of the
provenance-URIs for information about any or all of the target-URIs. "
- this paragraph sounds of out of place - and it's anyway too early as
we have not yet seen a single way to get to this information. Delete
and keep it only in appendix "Security Considerations".

15) " In the general case, a client presented with multiple
provenance-URIs and multiple target-URIs should look at all of the
provenance-URIs for information about any or all of the target-URIs. "
- this is very low-level detail, and I don't understand it at this
point (I've not seen my first target-URI yet!), so it's simply too
heavy and too early to start with all the exceptions and edge-cases
before I have even read about how to do it in the first place. Move
all such considerations to the end.

16) "does not preclude the possibility that other publishers may " -
not heard about "publisher" before - perhaps "provider"?

17) "Provenance indicated in this way is not guaranteed to be
authoritative. Trust in the linked provenance descriptions must be
determined separately from trust in the original resource. Just as in
the web at large, it is a user's responsibility to determine an
appropriate level of trust in any other linked resource; e.g. based on
the domain that serves it, or an associated digital signature. (See
also section 6. Security considerations.) " - this is just repeated
blurb from half a screen up - although I think this is a slightly
better place to mention it, so I am OK to leave it here as long as the
previous blurb goes.

18) The document talks about URIs - but generally these days
specifications talk about IRIs. Any reason for this (like HTTP Link
headers must be URIs), and could we clarify this in an appendix?

19) "There may be multiple hasQueryService link header fields, and
these may appear in an HTTP response together with hasProvenance link
header fields (though, in simple cases, we anticipate that
hasProvenance and hasQueryService link relations will not be used
together). " - I think both 'may' should be 'MAY' - to correspond with
equivalent section in 3.2.

20) Can the Link: <pre> blocks be broken into several lines? On my
printout it is cut out just after #hasProvenance. I suggest:

Link: <provenance-service-URI>;
rel="http://www.w3.org/ns/prov#hasQueryService";
anchor="target-URI"

This should also be valid HTTP (and is used in the 3.1.2 example).

21) Can we have an example of the two Link headers in use here? I find
it confusing due to the <two> "styles" of URIs.

3.1.2 Content negotiation

22) The example seems to use HTTP 0.9. Could it be updated for HTTP 1.1?

3.2 Resource represented as HTML

23) Can the two <link> header lines be <b>old in both examples?

24) "The provenance-URI given by the hasProvenance link element" ...
"The target-URI given by the hasAnchor link element "
- I found these confusing, because I could not easily find
"hasProvenance" and "hasAnchor" above - as they are bits of the URI.
If you don't want to repeat the full URIs here, then highlight the two
terms more (super-bold?) in the pre above. This is particularly
confusing for hasAnchor - because in this style you have two <link>
entries while in the HTTP example this was just a single link entry
with an optional parameter.

I don't like the approach here with the anchors disconnected from the
hasProvenance - specially not as it is not consistent with the
approach of 3.1. I would have preferred the two approaches to be
equivalent. I now can't construct the Link headers of 3.1 based on the
HTML in 3.2 or the RDF in 3.3. Although I don't particularly like it,
I might recommend changing 3.1 to also have a separate 'hasAnchor'
relation, to make it consistent. (Also it would allow the off-spec
use of hasAnchor without provenance links).

3.2.1 specifying provenance query service

25) " (though, in simple cases, we anticipate that hasProvenance and
hasQueryService link relations would not be used together). " - I
would drop this sentence. I thought hasProvenance was for simple
cases.

26) " (These terms may be used to indicate provenance of arbitrary
other resources too, but discussion of such usage is beyond the scope
of this section.) " - so where is the section where I can read about
this? It sounds important and useful.

27) "The RDF property prov:hasProvenance is a relation between two
resources, where the object of the property is a resource that
presents a provenance description of the subject resource. " - I
would add the term provenance-URI here.

28) " This property corresponds to a hasProvenance link relation used
with an HTTP Link header field, or HTML <link> element (see above). "
and " This corresponds to use of the anchor parameter in an HTTP
provenance Link header field, or a hasAnchor link relation in an HTML
<link> element, which similarly indicate a URI used by the provenance
description to refer to the described document.", "This property
corresponds to a hasQueryService link relation used with an HTTP Link
header field, or HTML <link> element. " - I would totally drop these
sentences - as long as you specify in funny font that it is target-URI
and provenance-URI you are defining, it's OK. Section 3.2 don't have
an equivalent statement, and reads quite easily.

29) Example
Add "Turtle syntax [TURTLE]" somewhere near this example.

30) Example
Remove the use of invalid and confusing ":" for continuation - if anything use
# .. RDF data ...

31) Why are the provenance relations long URIs, rather than registered
Link Types? I might have missed something, because earlier we
suggested to register such link types as "provenance".

32) According to http://tools.ietf.org/html/rfc5988#section-4.2

When extension relation types are compared, they MUST be compared as
strings (after converting to URIs if serialised in a different
format, such as a Curie [W3C.CR-curie-20090116]) in a case-
insensitive fashion, character-by-character. Because of this, all-
lowercase URIs SHOULD be used for extension relations.

Should we not have relation URIs that are all lowercase to avoid problems? ie.

Link: <http://acme.example.org/provenance/super-widget>;
rel="http://www.w3.org/ns/prov#hasprovenance"

33) Section 5 - Link examples don't have appropriate quoting.of rel and anchor.

NOTE: I have not reviewed section 4, 5, 6, A, B, C due to time constraints.

I might try to finish that tomorrow.

--
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester

Received on Thursday, 17 January 2013 11:36:03 UTC