- From: Graham Klyne <graham.klyne@zoo.ox.ac.uk>
- Date: Mon, 11 Mar 2013 09:54:45 +0000
- To: W3C provenance WG <public-prov-wg@w3.org>, Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Stian (http://lists.w3.org/Archives/Public/public-prov-wg/2013Jan/0069.html) >>> My responses are prefixed like this. 1) Could we have a more detailed "Changes since last version" appendix, like in our other documents? >>> Added dump of the HG commit log. 1.1 Concepts 2) Why the term "Target-URI"? As far as I can understand, this is "Entity-URI". It is only vaguely hinted that this is the identifier for the prov:Entity I should be looking for. >>> May also be an activity. Stuck with target-URI, but updated definition to make this clearer. 1.2 Provenance and resources 3) These paragraphs talk about 'revisions' and 'versions' interchangeably. In terms of provenance this can get a bit confusing. I would use only the term "revision" >>> I didn't see them that way. It's subtle, but "revision" is used in the context of examples that are revisions of a document in an editing process. There is one use of "version" (para 3) that is more generic, and I felt might be something other than a "revision" (is "Luc in Boston" a revision of Luc?) - here "version seems to me to be more encompassing. >>> No change. 4) "must be persistent and not themselves dependent on context" --> "must be persistent and must not themselves be dependent on context" >>> Changed. 5) "In summary, a provenance description may be not universally applicable to a resource, but may be expressed with respect to that resource in a restricted context (e.g. at a particular time). This restriction is itself just another resource (e.g. the weather forecast for a give date as opposed to the current weather forecast), with its own URI for referring to it within a provenance description. " - this summary is I'm afraid more confusing then the previous 3 paragraphs. Could this be written in a lighter language? >>> This discussion has been moved to section 1.2, and re-worked 1.4 URI types and dereferencing 6) "Service-URI A provenance query service (i.e. a resource of type prov:ProvenanceQueryService). " You can't use "i.e." here - we've never heard about prov:ProvenanceQueryService before. >>> I disagree that I *can't* use "i.e." here, even if the forward reference is unhelpful. >>> Text re-worked, no longer has i.e. I don't think the type should be listed here as that is specific to section 4. (and possibly 3.3 although it is not mentioned there). <<< Changed. 7) "Provenance-URI A provenance description in the sense described by [PROV-DM] (PROV Overview)." I am uncertain as to what this mean. Does this mean a PROV structure description - as given in PROV-DM, or any odd provenance description? >From the feeling of the rest of the document I understand it is any kind of provenance description, so I find the reference to PROV-DM odd here. (I do recognize that we should say strongly that a PROV format SHOULD be one of the formats - but this table is not the right place for it) >>> Specific reference provided >>> I note PROV-DM uses both "provenance description" and "provenance-record". In response to a previous comment, I've adopted provenance-record throughout - but I've included province description here as that term is used in the referenced description. 2. Accessing provenance descriptions 8) " There is no requirement that a bundle identifier can be dereferenced to access the corresponding provenance, but where practical it is RECOMMENDED that matters be arranged so this is possible. " - although this is not a formal specification, I don't think we need to write in 1850's legal English, so I would kindly request the honourable gentlemen to provide a more directly specified recommendation than "matters to be arranged". >>> Re-organized and tightened up text. But I don't know if I've gone far enough to address your comment, which I didn't fully understand. 9) " One possible realization of a bundle is that it is published as part of an RDF Dataset [RDF-CONCEPTS11] or similar composite structure containing multiple RDF graphs in a single document. To access such a bundle would require accessing the RDF Dataset and then extracting the identified component; this in turn would require knowing a URI or some other way to retrieve the dataset. This specification does not describe a specific mechanism for extracting components from a document containing multiple graphs. " - this sounds all very speculative and I don't see why this belongs in here at all. The various PROV serializations to larger and smaller extend already define how to represent PROV bundles. >>> The text has been re-worked, and incorporated into to a supporting note, as I agree it's not appropriate as part of the specification per se. I was previously asked to add some discussion of this, so I hope you find this is a sensible compromise. 3. Locating provenance descriptions 10) "If a provenance description is a resource that can be accessed using web retrieval, one needs to know its provenance-URI to dereference. If this is known in advance, there is nothing more to specify. If a provenance-URI is not known then a mechanism to discover one must be based on information that is available to the would-be accessor." - I don't understand this, and I don't understand why this is in the document. Could we try to write the document more like a specification rather than a philosophical "what-if" paper? >>> The text has been re-worked. 11) "provider is an agent that collects or constructs some information and makes it available. The nature of the information or the means by which it is made available are not constrained, but the following discussion focuses on provenance descriptions made available by HTTP transactions (i.e. where the provenance provider is an HTTP server), " -- Just simplify this to the same style as consumer: "provider is an agent that makes available provenance descriptions" >>> Done (and moved to section 1.1 (Concepts)) I don't think we need to mention HTTP at all here, as only one of the 3 mechanisms deal with HTTP. >>> I've lost track of the exact referent of this, but I think I've addressed your point. 12) "We consider here mechanisms for a provider to indicate a provenance-URI or service-URI along with a target-URI. " This document is not a paper that considers things and reports results - this is a specification on how to do things. Change to "We here define" >>> Text re-worked. 13) "primary current web protocol and data formats" -> "current primary web protocol and data formats" >>> Done. 14) " While a provider should avoid giving spurious information, there are no fixed semantics, particularly when multiple resources are indicated, and a client should not assume that a specific given provenance-URI will yield information about a specific given target-URI. In the general case, a client presented with multiple provenance-URIs and multiple target-URIs should look at all of the provenance-URIs for information about any or all of the target-URIs. " - this paragraph sounds of out of place - and it's anyway too early as we have not yet seen a single way to get to this information. Delete and keep it only in appendix "Security Considerations". >>> This is not a security consideration >>> I don't see that it's relevant that specific mechanisms come later - this part of the discussion is intended to be independent of mechanism used. >>> I've moved this discussion to section 1.3, and the producer/consumer definitions to section 1.1. 15) " In the general case, a client presented with multiple provenance-URIs and multiple target-URIs should look at all of the provenance-URIs for information about any or all of the target-URIs. " - this is very low-level detail, and I don't understand it at this point (I've not seen my first target-URI yet!), so it's simply too heavy and too early to start with all the exceptions and edge-cases before I have even read about how to do it in the first place. Move all such considerations to the end. >>> I've moved this up to section 1.3, and trimmed the text in that section. 16) "does not preclude the possibility that other publishers may " - not heard about "publisher" before - perhaps "provider"? >>> Done 17) "Provenance indicated in this way is not guaranteed to be authoritative. Trust in the linked provenance descriptions must be determined separately from trust in the original resource. Just as in the web at large, it is a user's responsibility to determine an appropriate level of trust in any other linked resource; e.g. based on the domain that serves it, or an associated digital signature. (See also section 6. Security considerations.) " - this is just repeated blurb from half a screen up - although I think this is a slightly better place to mention it, so I am OK to leave it here as long as the previous blurb goes. >>> Moved to section 1.3, and trimmed >>> Removed duplicate material in section 2. 18) The document talks about URIs - but generally these days specifications talk about IRIs. Any reason for this (like HTTP Link headers must be URIs), and could we clarify this in an appendix? >>> I think this needs wider discussion. It's not clear to me what term is in most current use, though in my mind URI is the more established term (though not necessarily the most correct term). Maybe discussion in an appendix would be the right way? >>> It's true that the latest RDF concepts and abstract syntax refers to IRIs (http://www.w3.org/TR/rdf11-concepts/#section-IRIs), and that's a significant element of the usage we're considering. Maybe for an appendix or NOTE somewhere? If adopted, I think it should appear sooner rather than later: [[ This document uses the term URI as this is the term used in many of the currently ratified specifications that this document builds upon. In many situations, a URI may also be an IRI [[RFC3987]], which is a generalisation of a URI allowing a wider range of Unicode characters. Every absolute URI is an IRI, but not every IRI is an URI. When IRIs are used in situations that require a URI, they must first be converted according to the mapping defined in section 3.1 of [RFC3987]. A notable example is retrieval over the HTTP protocol. The mapping involves UTF-8 encoding of non-ASCII characters, %-encoding of octets not allowed in URIs, and Punycode-encoding of domain names. ]] (some of this text stolen/adapted from http://tools.ietf.org/html/rfc3987#section-3.1) >>> Added text to Concepts section. 19) "There may be multiple hasQueryService link header fields, and these may appear in an HTTP response together with hasProvenance link header fields (though, in simple cases, we anticipate that hasProvenance and hasQueryService link relations will not be used together). " - I think both 'may' should be 'MAY' - to correspond with equivalent section in 3.2. >>> Done 20) Can the Link: <pre> blocks be broken into several lines? On my printout it is cut out just after #hasProvenance. I suggest: Link: <provenance-service-URI>; rel="http://www.w3.org/ns/prov#hasQueryService"; anchor="target-URI" This should also be valid HTTP (and is used in the 3.1.2 example). >>> Done >>> (though I believe it's not valid in the forthcoming release of HTTP, but it's still a reasonable thing to do for readability). 21) Can we have an example of the two Link headers in use here? I find it confusing due to the <two> "styles" of URIs. >>> Added example sequences at the end of sections 3.1 and 3.1.1 3.1.2 Content negotiation 22) The example seems to use HTTP 0.9. Could it be updated for HTTP 1.1? >>> Done (throughout document) 3.2 Resource represented as HTML 23) Can the two <link> header lines be <b>old in both examples? >>> Done 24) "The provenance-URI given by the hasProvenance link element" ... "The target-URI given by the hasAnchor link element " - I found these confusing, because I could not easily find "hasProvenance" and "hasAnchor" above - as they are bits of the URI. If you don't want to repeat the full URIs here, then highlight the two terms more (super-bold?) in the pre above. This is particularly confusing for hasAnchor - because in this style you have two <link> entries while in the HTTP example this was just a single link entry with an optional parameter. >>> I've reorganised the text to try and make this clearer, also including '#' for each link type mentioned. I don't like the approach here with the anchors disconnected from the hasProvenance - specially not as it is not consistent with the approach of 3.1. I would have preferred the two approaches to be equivalent. I now can't construct the Link headers of 3.1 based on the HTML in 3.2 or the RDF in 3.3. Although I don't particularly like it, I might recommend changing 3.1 to also have a separate 'hasAnchor' relation, to make it consistent. (Also it would allow the off-spec use of hasAnchor without provenance links). >>> (sect 3.2? check) >>> I don't particularly like it either. But we're constrained by use of existing features. We've been over this is previous iterations, and this is what we settled on - the inconsistency was deemed preferable to gratuitous reinvention. In practice I think it will be less of an issue that may at first appear, as I don't see having multiple provenance links *and* anchors as being a common requirement. >>> Issue raised: http://www.w3.org/2011/prov/track/issues/628 3.2.1 specifying provenance query service 25) " (though, in simple cases, we anticipate that hasProvenance and hasQueryService link relations would not be used together). " - I would drop this sentence. I thought hasProvenance was for simple cases. >>> "hasProvenance was for simple cases" -- not necessarily >>> We had previously been asked for clarification of this point, so I don't see dropping it as an easy option. But it might be rephrased. >>> Re-phrased to avoid "simple cases" 26) " (These terms may be used to indicate provenance of arbitrary other resources too, but discussion of such usage is beyond the scope of this section.) " - so where is the section where I can read about this? It sounds important and useful. >>> s/section/document/ >>> Should we actually try to say more about this? I'm not sure - it seems like dwelling on an exceptional case. In any case, given the descriptions in appendix B, and knowledge of RDF, I'd have thought such use was obvious. >>> I've reworded this slightly, and moved to it a separate Note paragraph where it's hopefully less of a distraction. 27) "The RDF property prov:hasProvenance is a relation between two resources, where the object of the property is a resource that presents a provenance description of the subject resource. " - I would add the term provenance-URI here. >>> My error: should be "the object of the property is a provenance-URI that denotes a resource ..." (I thought I had it technically correct, but the object is the RDF graph node, not what it denotes). >>> Revised 28) " This property corresponds to a hasProvenance link relation used with an HTTP Link header field, or HTML <link> element (see above). " and " This corresponds to use of the anchor parameter in an HTTP provenance Link header field, or a hasAnchor link relation in an HTML <link> element, which similarly indicate a URI used by the provenance description to refer to the described document.", "This property corresponds to a hasQueryService link relation used with an HTTP Link header field, or HTML <link> element. " - I would totally drop these sentences - as long as you specify in funny font that it is target-URI and provenance-URI you are defining, it's OK. Section 3.2 don't have an equivalent statement, and reads quite easily. >>> Done. 29) Example [sect 3.3, I assume] Add "Turtle syntax [TURTLE]" somewhere near this example. >>> Done. 30) Example Remove the use of invalid and confusing ":" for continuation - if anything use # .. RDF data ... >>> Done. 31) Why are the provenance relations long URIs, rather than registered Link Types? I might have missed something, because earlier we suggested to register such link types as "provenance". >>> Because we (the group) discussed this, and decided not to register the link types, because we felt it would be more consistent to use URIs throughout. >>> No change 32) According to http://tools.ietf.org/html/rfc5988#section-4.2 When extension relation types are compared, they MUST be compared as strings (after converting to URIs if serialised in a different format, such as a Curie [W3C.CR-curie-20090116]) in a case- insensitive fashion, character-by-character. Because of this, all- lowercase URIs SHOULD be used for extension relations. Should we not have relation URIs that are all lowercase to avoid problems? ie. Link: <http://acme.example.org/provenance/super-widget>; rel="http://www.w3.org/ns/prov#hasprovenance" >>> Hmmm... Good catch, I missed that. >>> Per discussion, properties changed to "http://www.w3.org/ns/prov#has_provenance", etc. 33) Section 5 - Link examples don't have appropriate quoting.of rel and anchor. >>> Checking... [[ Link = "Link" ":" #link-value link-value = "<" URI-Reference ">" *( ";" link-param ) link-param = ( ( "rel" "=" relation-types ) | ( "anchor" "=" <"> URI-Reference <"> ) | ( "rev" "=" relation-types ) | ( "hreflang" "=" Language-Tag ) | ( "media" "=" ( MediaDesc | ( <"> MediaDesc <"> ) ) ) | ( "title" "=" quoted-string ) | ( "title*" "=" ext-value ) | ( "type" "=" ( media-type | quoted-mt ) ) | ( link-extension ) ) link-extension = ( parmname [ "=" ( ptoken | quoted-string ) ] ) | ( ext-name-star "=" ext-value ) ext-name-star = parmname "*" ; reserved for RFC2231-profiled ; extensions. Whitespace NOT ; allowed in between. ptoken = 1*ptokenchar ptokenchar = "!" | "#" | "$" | "%" | "&" | "'" | "(" | ")" | "*" | "+" | "-" | "." | "/" | DIGIT | ":" | "<" | "=" | ">" | "?" | "@" | ALPHA | "[" | "]" | "^" | "_" | "`" | "{" | "|" | "}" | "~" media-type = type-name "/" subtype-name quoted-mt = <"> media-type <"> relation-types = relation-type | <"> relation-type *( 1*SP relation-type ) <"> relation-type = reg-rel-type | ext-rel-type reg-rel-type = LOALPHA *( LOALPHA | DIGIT | "." | "-" ) ext-rel-type = URI ]] -- http://www.ietf.org/rfc/rfc5988.txt >>> Unquoted URI (without spaces) is OK for relation type as ext-rel-type; quoting is optional >>> But anchor *does* need to be quoted - I missed that. Good catch. >>> Added quotes to anchor parameters
Received on Monday, 11 March 2013 10:01:49 UTC