Re: PROV-ISSUE-613 (prov-aq-draft-review): Review paq for release as last call working draft [Accessing and Querying Provenance] from Stian Soiland-Reyes on 2013-01-17 (public-prov-wg@w3.org from January 2013)

From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Date: Thu, 17 Jan 2013 11:42:15 +0000
To: Provenance Working Group <public-prov-wg@w3.org>
Message-ID: <CAPRnXtnSowGnvG9-xUnXsxH=Su6WqfmCXEPR0wpVWRSE37eF5g@mail.gmail.com>
My apologies, I forgot to attach the answers to these questions in the review.




On Thu, Jan 10, 2013 at 3:13 PM, Paul Groth <p.t.groth@vu.nl> wrote:
> - Can this be released as a last call working draft?

No.

> - Is the name provenance access and query appropriate for the document?

Yes.

> - If not, where are the blocking issues?

Of my issues below, these are considered blocking:

1, 5, 6, 7, 8, 9, 10, 12, 14, 15, 18, 24, 28, 30, 33, 34

I would want to split 24, and add a new 34)

I don't like the approach here with the anchors disconnected from the
hasProvenance - specially not as it is not consistent with the
approach of 3.1. I would have preferred the two approaches to be
equivalent. I now can't construct the Link headers of 3.1 based on the
HTML in 3.2 or the RDF in 3.3.  Although I don't particularly like it,
I might recommend changing 3.1 to also have a separate 'hasAnchor'
relation, to make it consistent.   (Also it would allow the off-spec
use of hasAnchor without provenance links).



> - If yes, are there other issues to work on?

 See below :) apologies if it seems hostile.. I was in a rush.


On Thu, Jan 17, 2013 at 11:35 AM, Stian Soiland-Reyes
<soiland-reyes@cs.manchester.ac.uk> wrote:
> On Thu, Jan 10, 2013 at 2:56 PM, Provenance Working Group Issue
> Tracker <sysbot+tracker@w3.org> wrote:
>> PROV-ISSUE-613 (prov-aq-draft-review): Review paq for release as last call working draft [Accessing and Querying Provenance]
>> https://dvcs.w3.org/hg/prov/raw-file/b3f397c7b15c/paq/prov-aq.html
>
> Here is my partial review of the above document PROV-AQ.
>
> Due to travelling and sick days I have not been able to review section
> 4, 5, 6, nor appendices.
>
>
>
>
>
> 1) Could we have a more detailed "Changes since last version"
> appendix, like in our other documents?
>
>
> 1.1 Concepts
>
> 2) Why the term "Target-URI"? As far as I can understand, this is
> "Entity-URI". It is only vaguely hinted that this is the identifier
> for the prov:Entity I should be looking for.
>
>
> 1.2 Provenance and resources
>
> 3) These paragraphs talk about 'revisions' and 'versions'
> interchangeably. In terms of provenance this can get a bit confusing.
> I would use only the term "revision"
>
> 4) "must be persistent and not themselves dependent on context" -->
> "must be persistent and must not themselves be dependent on context"
>
> 5) "In summary, a provenance description may be not universally
> applicable to a resource, but may be expressed with respect to that
> resource in a restricted context (e.g. at a particular time). This
> restriction is itself just another resource (e.g. the weather forecast
> for a give date as opposed to the current weather forecast), with its
> own URI for referring to it within a provenance description. " - this
> summary is I'm afraid more confusing then the previous 3 paragraphs.
> Could this be written in a lighter language?
>
>
> 1.4 URI types and dereferencing
>
> 6) "Service-URI         A provenance query service (i.e. a resource of type
> prov:ProvenanceQueryService). "
> You can't use "i.e." here - we've never heard about
> prov:ProvenanceQueryService before. I don't think the type should be
> listed here as that is specific to section 4. (and possibly 3.3
> although it is not mentioned there).
>
> 7) "Provenance-URI      A provenance description in the sense described by
> [PROV-DM] (PROV Overview)."
> I am uncertain as to what this mean. Does this mean a PROV structure
> description - as given in PROV-DM, or any odd provenance description?
> From the feeling of the rest of the document I understand it is any
> kind of provenance description, so I find the reference to PROV-DM odd
> here.  (I do recognize that we should say strongly that a PROV format
> SHOULD be one of the formats - but this table is not the right place
> for it)
>
> 2. Accessing provenance descriptions
>
> 8) " There is no requirement that a bundle identifier can be
> dereferenced to access the corresponding provenance, but where
> practical it is RECOMMENDED that matters be arranged so this is
> possible. "
>  - although this is not a formal specification, I don't think we need
> to write in 1850's legal English, so I would kindly request the
> honourable gentlemen to provide a more directly specified
> recommendation than "matters to be arranged".
>
>
> 9) " One possible realization of a bundle is that it is published as
> part of an RDF Dataset [RDF-CONCEPTS11] or similar composite structure
> containing multiple RDF graphs in a single document. To access such a
> bundle would require accessing the RDF Dataset and then extracting the
> identified component; this in turn would require knowing a URI or some
> other way to retrieve the dataset. This specification does not
> describe a specific mechanism for extracting components from a
> document containing multiple graphs. "
> - this sounds all very speculative and I don't see why this belongs in
> here at all. The various PROV serializations to larger and smaller
> extend already define how to represent PROV bundles.
>
> 3. Locating provenance descriptions
>
> 10) "If a provenance description is a resource that can be accessed
> using web retrieval, one needs to know its provenance-URI to
> dereference. If this is known in advance, there is nothing more to
> specify. If a provenance-URI is not known then a mechanism to discover
> one must be based on information that is available to the would-be
> accessor."
>
> - I don't understand this, and I don't understand why this is in the
> document. Could we try to write the document more like a specification
> rather than a philosophical "what-if" paper?
>
>
> 11) "provider     is an agent that collects or constructs some
> information and makes it available. The nature of the information or
> the means by which it is made available are not constrained, but the
> following discussion focuses on provenance descriptions made available
> by HTTP transactions (i.e. where the provenance provider is an HTTP
> server), "
> -- Just simplify this to the same style as consumer:
> "provider   is an agent that makes available provenance descriptions"
>
> I don't think we need to mention HTTP at all here, as only one of the
> 3 mechanisms deal with HTTP.
>
>
> 12) "We consider here mechanisms for a provider to indicate a
> provenance-URI or service-URI along with a target-URI. "
>
> This document is not a paper that considers things and reports results
> - this is a specification on how to do things. Change to "We here
> define"
>
>
> 13) "primary current web protocol and data formats" -> "current
> primary web protocol and data formats"
>
> 14) " While a provider should avoid giving spurious information, there
> are no fixed semantics, particularly when multiple resources are
> indicated, and a client should not assume that a specific given
> provenance-URI will yield information about a specific given
> target-URI. In the general case, a client presented with multiple
> provenance-URIs and multiple target-URIs should look at all of the
> provenance-URIs for information about any or all of the target-URIs. "
> - this paragraph sounds of out of place - and it's anyway too early as
> we have not yet seen a single way to get to this information. Delete
> and keep it only in appendix "Security Considerations".
>
> 15) " In the general case, a client presented with multiple
> provenance-URIs and multiple target-URIs should look at all of the
> provenance-URIs for information about any or all of the target-URIs. "
> - this is very low-level detail, and I don't understand it at this
> point (I've not seen my first target-URI yet!), so it's simply too
> heavy and too early to start with all the exceptions and edge-cases
> before I have even read about how to do it in the first place. Move
> all such considerations to the end.
>
>
> 16) "does not preclude the possibility that other publishers may "  -
> not heard about "publisher" before - perhaps "provider"?
>
> 17) "Provenance indicated in this way is not guaranteed to be
> authoritative. Trust in the linked provenance descriptions must be
> determined separately from trust in the original resource. Just as in
> the web at large, it is a user's responsibility to determine an
> appropriate level of trust in any other linked resource; e.g. based on
> the domain that serves it, or an associated digital signature. (See
> also section 6. Security considerations.) "  - this is just repeated
> blurb from half a screen up - although I think this is a slightly
> better place to mention it, so I am OK to leave it here as long as the
> previous blurb goes.
>
>
> 18)  The document talks about URIs - but generally these days
> specifications talk about IRIs. Any reason for this (like HTTP Link
> headers must be URIs), and could we clarify this in an appendix?
>
> 19) "There may be multiple hasQueryService link header fields, and
> these may appear in an HTTP response together with hasProvenance link
> header fields (though, in simple cases, we anticipate that
> hasProvenance and hasQueryService link relations will not be used
> together). " - I think both 'may' should be 'MAY' - to correspond with
> equivalent section in 3.2.
>
> 20) Can the Link:  <pre> blocks be broken into several lines? On my
> printout it is cut out just after #hasProvenance. I suggest:
>
> Link: <provenance-service-URI>;
>   rel="http://www.w3.org/ns/prov#hasQueryService";
>   anchor="target-URI"
>
> This should also be valid HTTP (and is used in the 3.1.2 example).
>
> 21) Can we have an example of the two Link headers in use here? I find
> it confusing due to the <two> "styles" of URIs.
>
>
> 3.1.2 Content negotiation
>
> 22) The example seems to use HTTP 0.9. Could it be updated for HTTP 1.1?
>
>
> 3.2 Resource represented as HTML
>
> 23) Can the two <link> header lines be <b>old in both examples?
>
> 24) "The provenance-URI given by the hasProvenance link element" ...
> "The target-URI given by the hasAnchor link element "
> -  I found these confusing, because I could not easily find
> "hasProvenance" and "hasAnchor" above - as they are bits of the URI.
> If you don't want to repeat the full URIs here, then highlight the two
> terms more (super-bold?) in the pre above. This is particularly
> confusing for hasAnchor - because in this style you have two  <link>
> entries while in the HTTP example this was just a single link entry
> with an optional parameter.
>
> I don't like the approach here with the anchors disconnected from the
> hasProvenance - specially not as it is not consistent with the
> approach of 3.1. I would have preferred the two approaches to be
> equivalent. I now can't construct the Link headers of 3.1 based on the
> HTML in 3.2 or the RDF in 3.3.  Although I don't particularly like it,
> I might recommend changing 3.1 to also have a separate 'hasAnchor'
> relation, to make it consistent.   (Also it would allow the off-spec
> use of hasAnchor without provenance links).
>
>
> 3.2.1 specifying provenance query service
>
> 25) " (though, in simple cases, we anticipate that hasProvenance and
> hasQueryService link relations would not be used together). "  - I
> would drop this sentence. I thought hasProvenance was for simple
> cases.
>
> 26) " (These terms may be used to indicate provenance of arbitrary
> other resources too, but discussion of such usage is beyond the scope
> of this section.) " - so where is the section where I can read about
> this? It sounds important and useful.
>
> 27) "The RDF property prov:hasProvenance is a relation between two
> resources, where the object of the property is a resource that
> presents a provenance description of the subject resource. "  - I
> would add the term provenance-URI here.
>
> 28) " This property corresponds to a hasProvenance link relation used
> with an HTTP Link header field, or HTML <link> element (see above). "
> and " This corresponds to use of the anchor parameter in an HTTP
> provenance Link header field, or a hasAnchor link relation in an HTML
> <link> element, which similarly indicate a URI used by the provenance
> description to refer to the described document.", "This property
> corresponds to a hasQueryService link relation used with an HTTP Link
> header field, or HTML <link> element. "   - I would totally drop these
> sentences - as long as you specify in funny font that it is target-URI
> and provenance-URI you are defining, it's OK.  Section 3.2 don't have
> an equivalent statement, and reads quite easily.
>
>
> 29) Example
> Add "Turtle syntax [TURTLE]" somewhere near this example.
>
> 30) Example
> Remove the use of invalid and confusing ":"  for continuation - if anything use
>    # .. RDF data ...
>
>
> 31) Why are the provenance relations long URIs, rather than registered
> Link Types? I might have missed something, because earlier we
> suggested to register such link types as "provenance".
>
> 32) According to http://tools.ietf.org/html/rfc5988#section-4.2
>
> When extension relation types are compared, they MUST be compared as
>    strings (after converting to URIs if serialised in a different
>    format, such as a Curie [W3C.CR-curie-20090116]) in a case-
>    insensitive fashion, character-by-character.  Because of this, all-
>    lowercase URIs SHOULD be used for extension relations.
>
> Should we not have relation URIs that are all lowercase to avoid problems?  ie.
>
> Link: <http://acme.example.org/provenance/super-widget>;
>            rel="http://www.w3.org/ns/prov#hasprovenance"
>
>
> 33) Section 5 - Link examples don't have appropriate quoting.of rel and anchor.
>
>
> NOTE: I have not reviewed section 4, 5, 6, A, B, C due to time constraints.
>
> I might try to finish that tomorrow.
>
>
> --
> Stian Soiland-Reyes, myGrid team
> School of Computer Science
> The University of Manchester



-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester
Received on Thursday, 17 January 2013 11:43:04 UTC