PROV-AQ responses to Stian's review (part 1) from Graham Klyne on 2013-03-11 (public-prov-wg@w3.org from March 2013)

From: Graham Klyne <graham.klyne@zoo.ox.ac.uk>
Date: Mon, 11 Mar 2013 09:54:45 +0000
To: W3C provenance WG <public-prov-wg@w3.org>, Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Message-ID: <513DA9E5.8010507@zoo.ox.ac.uk>
Stian (http://lists.w3.org/Archives/Public/public-prov-wg/2013Jan/0069.html)

 >>> My responses are prefixed like this.

1) Could we have a more detailed "Changes since last version"
appendix, like in our other documents?
 >>> Added dump of the HG commit log.

1.1 Concepts

2) Why the term "Target-URI"? As far as I can understand, this is
"Entity-URI". It is only vaguely hinted that this is the identifier
for the prov:Entity I should be looking for.
 >>> May also be an activity.  Stuck with target-URI, but updated definition to 
make this clearer.


1.2 Provenance and resources

3) These paragraphs talk about 'revisions' and 'versions'
interchangeably. In terms of provenance this can get a bit confusing.
I would use only the term "revision"
 >>> I didn't see them that way.  It's subtle, but "revision" is used in the 
context of examples that are revisions of a document in an editing process. 
There is one use of "version" (para 3) that is more generic, and I felt might be 
something other than a "revision" (is "Luc in Boston" a revision of Luc?) - here 
"version seems to me to be more encompassing.
 >>> No change.


4) "must be persistent and not themselves dependent on context" -->
"must be persistent and must not themselves be dependent on context"
 >>> Changed.

5) "In summary, a provenance description may be not universally
applicable to a resource, but may be expressed with respect to that
resource in a restricted context (e.g. at a particular time). This
restriction is itself just another resource (e.g. the weather forecast
for a give date as opposed to the current weather forecast), with its
own URI for referring to it within a provenance description. " - this
summary is I'm afraid more confusing then the previous 3 paragraphs.
Could this be written in a lighter language?
 >>> This discussion has been moved to section 1.2, and re-worked


1.4 URI types and dereferencing

6) "Service-URI     A provenance query service (i.e. a resource of type
prov:ProvenanceQueryService). "
You can't use "i.e." here - we've never heard about
prov:ProvenanceQueryService before.
 >>> I disagree that I *can't* use "i.e." here, even if the forward reference is 
unhelpful.
 >>> Text re-worked, no longer has i.e.


I don't think the type should be
listed here as that is specific to section 4. (and possibly 3.3
although it is not mentioned there).
<<< Changed.


7) "Provenance-URI  A provenance description in the sense described by
[PROV-DM] (PROV Overview)."
I am uncertain as to what this mean. Does this mean a PROV structure
description - as given in PROV-DM, or any odd provenance description?
 >From the feeling of the rest of the document I understand it is any
kind of provenance description, so I find the reference to PROV-DM odd
here.  (I do recognize that we should say strongly that a PROV format
SHOULD be one of the formats - but this table is not the right place
for it)
 >>> Specific reference provided
 >>> I note PROV-DM uses both "provenance description" and "provenance-record". 
  In response to a previous comment, I've adopted provenance-record throughout - 
but I've included province description here as that term is used in the 
referenced description.


2. Accessing provenance descriptions

8) " There is no requirement that a bundle identifier can be
dereferenced to access the corresponding provenance, but where
practical it is RECOMMENDED that matters be arranged so this is
possible. "
  - although this is not a formal specification, I don't think we need
to write in 1850's legal English, so I would kindly request the
honourable gentlemen to provide a more directly specified
recommendation than "matters to be arranged".
 >>> Re-organized and tightened up text.  But I don't know if I've gone far 
enough to address your comment, which I didn't fully understand.


9) " One possible realization of a bundle is that it is published as
part of an RDF Dataset [RDF-CONCEPTS11] or similar composite structure
containing multiple RDF graphs in a single document. To access such a
bundle would require accessing the RDF Dataset and then extracting the
identified component; this in turn would require knowing a URI or some
other way to retrieve the dataset. This specification does not
describe a specific mechanism for extracting components from a
document containing multiple graphs. "
- this sounds all very speculative and I don't see why this belongs in
here at all. The various PROV serializations to larger and smaller
extend already define how to represent PROV bundles.
 >>> The text has been re-worked, and incorporated into to a supporting note, as 
I agree it's not appropriate as part of the specification per se.  I was 
previously asked to add some discussion of this, so I hope you find this is a 
sensible compromise.


3. Locating provenance descriptions

10) "If a provenance description is a resource that can be accessed
using web retrieval, one needs to know its provenance-URI to
dereference. If this is known in advance, there is nothing more to
specify. If a provenance-URI is not known then a mechanism to discover
one must be based on information that is available to the would-be
accessor."

- I don't understand this, and I don't understand why this is in the
document. Could we try to write the document more like a specification
rather than a philosophical "what-if" paper?
 >>> The text has been re-worked.


11) "provider     is an agent that collects or constructs some
information and makes it available. The nature of the information or
the means by which it is made available are not constrained, but the
following discussion focuses on provenance descriptions made available
by HTTP transactions (i.e. where the provenance provider is an HTTP
server), "
-- Just simplify this to the same style as consumer:
"provider   is an agent that makes available provenance descriptions"
 >>> Done (and moved to section 1.1 (Concepts))

I don't think we need to mention HTTP at all here, as only one of the
3 mechanisms deal with HTTP.
 >>> I've lost track of the exact referent of this, but I think I've addressed 
your point.

12) "We consider here mechanisms for a provider to indicate a
provenance-URI or service-URI along with a target-URI. "

This document is not a paper that considers things and reports results
- this is a specification on how to do things. Change to "We here
define"
 >>> Text re-worked.


13) "primary current web protocol and data formats" -> "current
primary web protocol and data formats"
 >>> Done.


14) " While a provider should avoid giving spurious information, there
are no fixed semantics, particularly when multiple resources are
indicated, and a client should not assume that a specific given
provenance-URI will yield information about a specific given
target-URI. In the general case, a client presented with multiple
provenance-URIs and multiple target-URIs should look at all of the
provenance-URIs for information about any or all of the target-URIs. "
- this paragraph sounds of out of place - and it's anyway too early as
we have not yet seen a single way to get to this information. Delete
and keep it only in appendix "Security Considerations".
 >>> This is not a security consideration
 >>> I don't see that it's relevant that specific mechanisms come later - this 
part of the discussion is intended to be independent of mechanism used.
 >>> I've moved this discussion to section 1.3, and the producer/consumer 
definitions to section 1.1.


15) " In the general case, a client presented with multiple
provenance-URIs and multiple target-URIs should look at all of the
provenance-URIs for information about any or all of the target-URIs. "
- this is very low-level detail, and I don't understand it at this
point (I've not seen my first target-URI yet!), so it's simply too
heavy and too early to start with all the exceptions and edge-cases
before I have even read about how to do it in the first place. Move
all such considerations to the end.
 >>> I've moved this up to section 1.3, and trimmed the text in that section.


16) "does not preclude the possibility that other publishers may "  -
not heard about "publisher" before - perhaps "provider"?
 >>> Done


17) "Provenance indicated in this way is not guaranteed to be
authoritative. Trust in the linked provenance descriptions must be
determined separately from trust in the original resource. Just as in
the web at large, it is a user's responsibility to determine an
appropriate level of trust in any other linked resource; e.g. based on
the domain that serves it, or an associated digital signature. (See
also section 6. Security considerations.) "  - this is just repeated
blurb from half a screen up - although I think this is a slightly
better place to mention it, so I am OK to leave it here as long as the
previous blurb goes.
 >>> Moved to section 1.3, and trimmed
 >>> Removed duplicate material in section 2.


18)  The document talks about URIs - but generally these days
specifications talk about IRIs. Any reason for this (like HTTP Link
headers must be URIs), and could we clarify this in an appendix?
 >>> I think this needs wider discussion.  It's not clear to me what term is in 
most current use, though in my mind URI is the more established term (though not 
necessarily the most correct term).  Maybe discussion in an appendix would be 
the right way?
 >>> It's true that the latest RDF concepts and abstract syntax refers to IRIs 
(http://www.w3.org/TR/rdf11-concepts/#section-IRIs), and that's a significant 
element of the usage we're considering.

Maybe for an appendix or NOTE somewhere?  If adopted, I think it should appear 
sooner rather than later:
[[
This document uses the term URI as this is the term used in many of the 
currently ratified specifications that this document builds upon.  In many 
situations, a URI may also be an IRI [[RFC3987]], which is a generalisation of a 
URI allowing a wider range of Unicode characters.  Every absolute URI is an IRI, 
but not every IRI is an URI. When IRIs are used in situations that require a 
URI, they must first be converted according to the mapping defined in section 
3.1 of [RFC3987]. A notable example is retrieval over the HTTP protocol. The 
mapping involves UTF-8 encoding of non-ASCII characters, %-encoding of octets 
not allowed in URIs, and Punycode-encoding of domain names.
]]
(some of this text stolen/adapted from 
http://tools.ietf.org/html/rfc3987#section-3.1)
 >>> Added text to Concepts section.


19) "There may be multiple hasQueryService link header fields, and
these may appear in an HTTP response together with hasProvenance link
header fields (though, in simple cases, we anticipate that
hasProvenance and hasQueryService link relations will not be used
together). " - I think both 'may' should be 'MAY' - to correspond with
equivalent section in 3.2.
 >>> Done


20) Can the Link:  <pre> blocks be broken into several lines? On my
printout it is cut out just after #hasProvenance. I suggest:

Link: <provenance-service-URI>;
   rel="http://www.w3.org/ns/prov#hasQueryService";
   anchor="target-URI"

This should also be valid HTTP (and is used in the 3.1.2 example).
 >>> Done
 >>> (though I believe it's not valid in the forthcoming release of HTTP, but 
it's still a reasonable thing to do for readability).

21) Can we have an example of the two Link headers in use here? I find
it confusing due to the <two> "styles" of URIs.

 >>> Added example sequences at the end of sections 3.1 and 3.1.1


3.1.2 Content negotiation

22) The example seems to use HTTP 0.9. Could it be updated for HTTP 1.1?
 >>> Done (throughout document)


3.2 Resource represented as HTML

23) Can the two <link> header lines be <b>old in both examples?
 >>> Done


24) "The provenance-URI given by the hasProvenance link element" ...
"The target-URI given by the hasAnchor link element "
-  I found these confusing, because I could not easily find
"hasProvenance" and "hasAnchor" above - as they are bits of the URI.
If you don't want to repeat the full URIs here, then highlight the two
terms more (super-bold?) in the pre above. This is particularly
confusing for hasAnchor - because in this style you have two  <link>
entries while in the HTTP example this was just a single link entry
with an optional parameter.
 >>> I've reorganised the text to try and make this clearer, also including '#' 
for each link type mentioned.


I don't like the approach here with the anchors disconnected from the
hasProvenance - specially not as it is not consistent with the
approach of 3.1. I would have preferred the two approaches to be
equivalent. I now can't construct the Link headers of 3.1 based on the
HTML in 3.2 or the RDF in 3.3.  Although I don't particularly like it,
I might recommend changing 3.1 to also have a separate 'hasAnchor'
relation, to make it consistent.   (Also it would allow the off-spec
use of hasAnchor without provenance links).

 >>> (sect 3.2?  check)
 >>> I don't particularly like it either.  But we're constrained by use of 
existing features.  We've been over this is previous iterations, and this is 
what we settled on - the inconsistency was deemed preferable to gratuitous 
reinvention.  In practice I think it will be less of an issue that may at first 
appear, as I don't see having multiple provenance links *and* anchors as being a 
common requirement.
 >>> Issue raised: http://www.w3.org/2011/prov/track/issues/628


3.2.1 specifying provenance query service

25) " (though, in simple cases, we anticipate that hasProvenance and
hasQueryService link relations would not be used together). "  - I
would drop this sentence. I thought hasProvenance was for simple
cases.
 >>> "hasProvenance was for simple cases" -- not necessarily
 >>> We had previously been asked for clarification of this point, so I don't 
see dropping it as an easy option. But it might be rephrased.
 >>> Re-phrased to avoid "simple cases"


26) " (These terms may be used to indicate provenance of arbitrary
other resources too, but discussion of such usage is beyond the scope
of this section.) " - so where is the section where I can read about
this? It sounds important and useful.
 >>> s/section/document/
 >>> Should we actually try to say more about this?  I'm not sure - it seems 
like dwelling on an exceptional case.  In any case, given the descriptions in 
appendix B, and knowledge of RDF, I'd have thought such use was obvious.
 >>> I've reworded this slightly, and moved to it a separate Note paragraph 
where it's hopefully less of a distraction.

27) "The RDF property prov:hasProvenance is a relation between two
resources, where the object of the property is a resource that
presents a provenance description of the subject resource. "  - I
would add the term provenance-URI here.
 >>> My error: should be "the object of the property is a provenance-URI that 
denotes a resource ..." (I thought I had it technically correct, but the object 
is the RDF graph node, not what it denotes).
 >>> Revised


28) " This property corresponds to a hasProvenance link relation used
with an HTTP Link header field, or HTML <link> element (see above). "
and " This corresponds to use of the anchor parameter in an HTTP
provenance Link header field, or a hasAnchor link relation in an HTML
<link> element, which similarly indicate a URI used by the provenance
description to refer to the described document.", "This property
corresponds to a hasQueryService link relation used with an HTTP Link
header field, or HTML <link> element. "   - I would totally drop these
sentences - as long as you specify in funny font that it is target-URI
and provenance-URI you are defining, it's OK.  Section 3.2 don't have
an equivalent statement, and reads quite easily.
 >>> Done.


29) Example  [sect 3.3, I assume]
Add "Turtle syntax [TURTLE]" somewhere near this example.
 >>> Done.


30) Example
Remove the use of invalid and confusing ":"  for continuation - if anything use
    # .. RDF data ...
 >>> Done.


31) Why are the provenance relations long URIs, rather than registered
Link Types? I might have missed something, because earlier we
suggested to register such link types as "provenance".
 >>> Because we (the group) discussed this, and decided not to register the link 
types, because we felt it would be more consistent to use URIs throughout.
 >>> No change


32) According to http://tools.ietf.org/html/rfc5988#section-4.2

When extension relation types are compared, they MUST be compared as
    strings (after converting to URIs if serialised in a different
    format, such as a Curie [W3C.CR-curie-20090116]) in a case-
    insensitive fashion, character-by-character.  Because of this, all-
    lowercase URIs SHOULD be used for extension relations.

Should we not have relation URIs that are all lowercase to avoid problems?  ie.

Link: <http://acme.example.org/provenance/super-widget>;
            rel="http://www.w3.org/ns/prov#hasprovenance"

 >>> Hmmm... Good catch, I missed that.
 >>> Per discussion, properties changed to 
"http://www.w3.org/ns/prov#has_provenance", etc.


33) Section 5 - Link examples don't have appropriate quoting.of rel and anchor.

 >>> Checking...

[[
   Link           = "Link" ":" #link-value
   link-value     = "<" URI-Reference ">" *( ";" link-param )
   link-param     = ( ( "rel" "=" relation-types )
                  | ( "anchor" "=" <"> URI-Reference <"> )
                  | ( "rev" "=" relation-types )
                  | ( "hreflang" "=" Language-Tag )
                  | ( "media" "=" ( MediaDesc | ( <"> MediaDesc <"> ) ) )
                  | ( "title" "=" quoted-string )
                  | ( "title*" "=" ext-value )
                  | ( "type" "=" ( media-type | quoted-mt ) )
                  | ( link-extension ) )
   link-extension = ( parmname [ "=" ( ptoken | quoted-string ) ] )
                  | ( ext-name-star "=" ext-value )
   ext-name-star  = parmname "*" ; reserved for RFC2231-profiled
                                 ; extensions.  Whitespace NOT
                                 ; allowed in between.
   ptoken         = 1*ptokenchar
   ptokenchar     = "!" | "#" | "$" | "%" | "&" | "'" | "("
                  | ")" | "*" | "+" | "-" | "." | "/" | DIGIT
                  | ":" | "<" | "=" | ">" | "?" | "@" | ALPHA
                  | "[" | "]" | "^" | "_" | "`" | "{" | "|"
                  | "}" | "~"
   media-type     = type-name "/" subtype-name
   quoted-mt      = <"> media-type <">
   relation-types = relation-type
                  | <"> relation-type *( 1*SP relation-type ) <">
   relation-type  = reg-rel-type | ext-rel-type
   reg-rel-type   = LOALPHA *( LOALPHA | DIGIT | "." | "-" )
   ext-rel-type   = URI
]]
-- http://www.ietf.org/rfc/rfc5988.txt

 >>> Unquoted URI (without spaces) is OK for relation type as ext-rel-type; 
quoting is optional
 >>> But anchor *does* need to be quoted - I missed that.  Good catch.
 >>> Added quotes to anchor parameters
Received on Monday, 11 March 2013 10:01:49 UTC