- From: Graham Klyne <graham.klyne@zoo.ox.ac.uk>
- Date: Mon, 11 Mar 2013 09:55:41 +0000
- To: W3C provenance WG <public-prov-wg@w3.org>, Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Stian part 2 (http://lists.w3.org/Archives/Public/public-prov-wg/2013Jan/0121.html) >>> My responses are prefixed like this. Summary: ======== PROV-AQ is a very interesting document, because it describes how to connect provenance to the world, or more specifically to resources on the Internet. For my own domain of scientific workflow preservation, there is a particular need for this kind of standardization as currently there is no recognized mechanism for a service to provide provenance data in any form. The core concepts of PROV-AQ are very easy to understand, simple to use and clearly scoped. The document is however at times heavy to read, as edge cases are often explored in detail before introducing the main concepts and how a functionality is to be used. The terminology is a bit odd compared to the rest of the PROV documents, I particularly wonder why the authors are using the term target-URI rather than entity-URI; however I understand this is careful threading as in this particular document there is necessarily a lot of talk about *resources*. >>> I think the terminology is now more closely aligned to other PROV specs. I think the remaining differences are due to different intent, and hopefully these have been clarified. It is unclear as to whether PROV-AQ can and should be used for finding non-PROV provenance descriptions, such as alternative models (OPM, DCTerms), application-specific resources (logfiles, commit logs), and human-readable documents (HTML, Word). My view: "PROV-AQ MAY be used for such purposes, but that PROV-AQ provenance descriptions SHOULD be available as PROV. PROV SHOULD be represented as PROV-O RDF, and MAY be represented in other W3C specified PROV serializations.". >>> See issue http://www.w3.org/2011/prov/track/issues/428 >>> I mostly agree with Stian's position here >>> I'm not sure if we actually need to say anything about non-PROV formats >>> The introduction now explicitly states: [[ Most mechanisms described in this note are independent of the provenance format used, and may be used to access provenance in any available format. For interoperable provenance publication, use of PROV-O represented in a standardized RDF format is recommended. Where alternative formats are available, selection may be made by content negotiation. ]] I find that the section about pingback service is out of scope for a PROV-AQ service, and therefore below (point 56) suggest an alternative approach where the pingback service simply receives link that a provenance service may later return or include in its store. I don't distinguish between 'forward' and 'backward' provenance, so for me1 "has provenance" means I will find some provenance data where this entity ("target-URI") is present - but the WG might have a different view and could want to distinguish between the two directions, as popular resources could accumulate a lot of forward traces. >>> Comments at point 56. These changes have been substantially adopted. Detailed review - numbering continues from previous email: ======= 4. Provenance query service 35) "the naming authority associated with the target-URI is not the same as the service offering provenance descriptions" - why is this a problem? "multiple services have provenance descriptions about the same resource" - why is this a problem? Neither of these seem like a problem from the previous bits of this specification. Section 3 specifically allows multiple provenance-uris and don't require these to be hosted at the sane "naming authority". I think what you are trying to say in these two is something like: * "third-party providers of provenance descriptions who can't use the mechanisms of Section 3 because the target-URI is outside their control" >>> Yes, I've revised the text to reflect this intent, using substantially this wording. 36) "the service associated with the target-URI is not accessible for adding additional information when handling retrieval requests" I don't know what this means. Which service? Adding on retrieval? Not accessible? >>> Covered by revised wording above. 37) "query services may provide additional control over what provenance is returned" perhaps change "control" to "filters" - make it sound like a good thing when there is too much provenance! >>> Changed. 38) I suggest to add consideration: "query services may support more complex queries such as "which entities were derived from entities attributed to agent X"" >>> I've reworked the motivation taking this on board. 39) "such usage is not described here" -> ".. not described here" >>> OBE - text no longer exists 40) "use the information obtained to query for required provenance." ... add "according to the specified query mechanism" >>> Agree >>> Updated with similar 41) "Dereferencing a provenance query service URI" --> "... service-URI" >>> OK (sect 4.1) >>> Done. 42) "this specification does not preclude the use of non-RDF formats" JSON-LD <http://json-ld.org/spec/latest/json-ld-syntax/> is growing in popularity, should we perhaps propose a JSON-LD context? I think it would be quite straight forward, and actually managed to do it in about 15 minutes (including learning the syntax). >>> I regard this option as being covered by "RDF (in any of its common serializations...)" >>> I'm quite open to use of JSON-LD, but I feel it may be too early to push this >>> The current text sticks with "The service description presented here may be supplied as RDF (in any of its common serializations as determined by HTTP content negotiation)," >>> See also: http://www.w3.org/2011/prov/track/issues/622 [...example elided...] 43) As shown in the complete example in 4.1.3, the ProvenanceQueryService is not connected to the DirectQueryService or sd:Service. Given that services don't have a general name, it would be difficult for implementers to know if a node in the graph is a service or just happens to be further/additional data (for instance details about the publisher of the service). It also means I can't mention at all a service, without implying that I am somehow providing it as part of my service description. I therefore suggest that the ProvenanceQueryService should link to the services using a term like prov:describesService - see modified example: >>> My intention was that they could be located by type, but I would be happy to include a prov:describesService relation @prefix prov: <http://www.w3c.org/ns/prov#> @prefix sd: <http://www.w3.org/ns/sparql-service-description#> <> a prov:ProvenanceQueryService ; prov:describesService <#direct>, <#sparql> ; dcterms:publisher <#us> . <#us> a foaf:Organization ; foaf:name "and not a service!" . <#direct> a prov:DirectQueryService ; prov:provenanceUriTemplate "?target={+uri}" . <#sparql> a sd:Service ; sd:endpoint </sparql/> ; sd:supportedLanguage sd:SPARQL11Query . The added advantage of this is that you can do the bnode shorthand when you don't know quite know or care what to call your service entries: <> a prov:ProvenanceQueryService ; prov:describesService [ [ a prov:DirectQueryService ; prov:provenanceUriTemplate "?target={+uri}" ], [ a sd:Service ; sd:endpoint "?target={+uri}", sd:supportedLanguage sd:SPARQL11Query ] . >>> I like that - it has the added advantage of making the relationship between a service description document and an individual service description more explicit. >>> Done 44) I suggest renaming the verbose prov:ProvenanceQueryService to prov:ServiceDescription. We don't need to say Provenance because of the namespace. It's also not a service itself, just descriptions. This avoids confusion whether the DirectQueryService is a ProvenanceQueryService. Combined with the prov:describesService from above, the distinction should be clear. >>> Done 45) This protocol typically combines the target-URI with the service-URI to formulate an HTTP GET request, according to the following convention: Typically..? Is this not meant to *define* the protocol? Remove "typically". >>> Partly due to other changes, this has been reworked to require the URI to be defined by the URI template in the service description. 46) "provenance description for the resource-URI" - while I like "resource-URI" over "target-URI" (and perhaps entity-URI even more) - I think this is a typo. --> target-URI >>> Changed to target-uri. (We considered using entity-uri throughout, but this would not haver covered activities.) 47) "Any server that implements this protocol and receives a request URI in this form SHOULD return a provenance description for the resource-URI embedded in the query component, where that URI is the result of percent-decoding the value associated with the provenance-resource key" - a bit heavy and cryptic sentence. What is "the value associated with the the provenance-resource key"? >>> Sect 4.2 >>> Re-worked. 48) "If the supplied resource-URI includes a fragment identifier, the '#' MUST be %-encoded as %23 when constructing the provenance-URI value; similarly, any '&' character in the resource-URI must be %-encoded as %26 [[RFC3986]]." - I am a bit uncertain about this - are you implying that only those characters need to be escaped? What about "%"? It should be clearly specified if a URL like http://example.com/with%20spaces should be sent along as-is with %20, or double-encoded as %2520. I agree that it's very important to highlight that # and & must be %-encoded as they would otherwise fall out - but it should also here clearly indicate the regular encoding. As this is getting a bit long - perhaps split into a second paragraph which is only about encoding. (Ie. first paragraph says what is to be returned, etc, second paragraph just details about the URI encoding) >>> This has been substantially re-worked. Some of the discussion has been moved to a supporting note. 49) "If the provenance described by the request does not exist in the server, a 404 Not Found response code SHOULD be returned." This section does not define other error conditions, like what the server should do if access is restricted. Obviously the regular HTTP status codes apply, but it might be worth pointing out that the server is not required to make such responses public - so it might for instance require authentication with 401, or 'hide' the existence of a response with 404. " This status code is commonly used when the server does not wish to reveal exactly why the request has been refused, or when no other response is applicable.". >>> I've re-worked the text Probably this is out of scope - but I was thinking that it could be useful if the server could return 403 Forbidden, for instance because it refuses to give provenance details for resources that are not 'his' (not under example.com for instance). It could return a text/uri-list of base URIs of which the server will support. (this is slight abuse of text/uri-list because there might be no resource with that particular URI - more appropriate would be a list of URI templates, but there are no media type for that). >>> I agree it's out of scope. 50) "does not exist in the server" --> change to "is unknown to the server" - as there is no requirement that the provenance resource is on the same server. (and neither should there be!) >>> Done as part of above. 51) "should be capable of returning RDF using the vocabulary defined by [PROV-O], in any standard RDF serialization (e.g. RDF/XML), or any other standard serialization of the Provenance Model specification [PROV-DM]." - both "any" change to "a" - only one of them is needed, not all - which 'any' might imply! >>> sect 4.2 >>> Re-worked 52) "other standard serialization (..) PROV-DM" - Is this something we've defined somewhere? How would you know if say PROV JSON is a standard serialization? >>> You'd know because it's defined in a standard specification :^) >>> There intent is to leave the way open to future standards. >>> The text is re-workedm and now more open to any format trough content negotiation. >>> See also: http://www.w3.org/2011/prov/track/issues/428 53) "A provenance query service SHOULD be capable of returning RDF ... , or any other standard serialization of the Provenance Model specification" - it is unclear if second part is covered by the SHOULD or not. I can see 4 interpretations: a) Service SHOULD return PROV-O RDF, and MAY return other PROV serializations b) Service SHOULD return ( either PROV-O RDF or other PROV serialization ) c) Service SHOULD return at least one of ( PROV-O RDF, other PROV serialization) (ie. simply "one of the PROV serializations") d) Service SHOULD return PROV-O RDF. Other PROV serializations could be used. (no MAY/SHOULD). I would recommend a) above - as then the clients would have some reasonable expectation about what is generally supported, rather than having to build in support for PROVXML, PROV-N, etc. just because they are all covered by the same SHOULD of b). >>> Agree -- see comments above at introduction to review. >>> SHOULD dropped in re-work >>> See also: http://www.w3.org/2011/prov/track/issues/428 54) "Previously, section 3. Locating provenance descriptions has described use of HTTP Link: header fields and HTML <link> elements to indicate provenance query services. Beyond that, this specification does not define any specific mechanism for discovering query services. " - this forgot about section 3.3 Resource represented as RDF. >>> Section 4.3 >>> "RDF statements" added 5. Forward provenance S: Link: <http://acme.example.org/pingback/super-widget>; rel=http://www.w3.org/ns/prov#provPingback 55) I would rename this to just "pingback" why double "prov"? rel=http://www.w3.org/ns/prov#pingback >>> Done A consumer of the resource, or some other system, may perform an HTTP POST operation to the pingback URI where the POST request body contains provenance in one of the recognized provenance description formats. For interoperability, a ping-back receiving service should be able to accept at least PROV-O provenance presented as RDF/XML or Turtle. 56) I think this kind of "provenance posting" (and hence intended provenance-URI creation) sounds out of scope for a pingback service and probably also for this whole document. There are many existing protocols on how to manage and create resources, such as AtomPub, WebDav (uggh..), SFTP, etc. I don't think we need to go into that area to define yet another way on how to create HTTP resources. I would not expected to have to post my actual provenance to the service, which implies that the service then should keep this and present it willy-nilly to others as its own. This document also does not say much about what the server is expected or not to do with this, or how it can refuse provenance which it does not like or permit. I would rather think that a pingback service should work like pingbacks in blogs, where the pingback simply gives the blog anURI of a third-party site which talks about a given blog post at the pingback host. [Details from original message elided] >>> This proposal has been adopted and discussed with Stian. I think it does indeed sit better with the goals of PROV-AQ. 6. Security considerations When retrieving a provenance URI from a document, steps should be taken to ensure the document itself is an accurate copy of the original whose author is being trusted (e.g. signature checking, or use of a trusted secure web service). 57) What is "document" above? Should this refer to section 3.2? >>> Yes - cross-ref section 3.2, 3.3 >>> Discussion moved to 1.3, and cross-ref added. 58) A paragraph should be added about cross-site request forgery and distributed denial attacks, similar to my blurb above: When clients and servers are retrieving submitted URIs such as provenance descriptions and following or registering links; reasonable care should be taken to prevent malicious use such as distributed denial of service attacks (DDoS), cross-site request forgery (CSRF), spamming and hosting of inappropriate materials. Reasonable preventions might include same-origin policy, HTTP authorization, SSL, rate-limiting, spam filters, moderation queues, user acknowledgements and validation. It is out of scope for this document to specify how such mechanisms work and should be applied. >>> I'm not sure how CSRF applies here: my understanding is that that's a browser issue, not a general application issue >>> I've added this, but have an outstanding query about CSRF Provenance descriptions may provide a route for leakage of privacy-related information 59) We should also add something obvious like: Accessing provenance services might reveal to the service and third-parties information which is considered private, including which resources a client has taken interest in. For instance, a browser extension which collects all provenance data for a resource which is being saved to the local disk, could be revealing user interest in a sensitive resource to a third-party site listed by prov:hasProvenance or prov:hasQueryService relation. A detailed query submitted to a third-party provenance query service might be revealing personal information such as social security numbers. >>> Worked in B. Names added to prov: namespace 60) Broken definition links: DirectQueryService, provenanceURITemplate >>> Fixed. 61) Where can I download the OWL for the additional relations? >>> Placeholders pointing into mercurial added, with TODO to fix. 62) After table, add a note like "In addition, PROV-AQ reuses these terms from the SPARQL service description vocabulary: sd:AA sd: BB" >>> Actually, I don't think PROV-AQ is re-using those terms so much as providing a framework within which they, and others, MAY be applicable. >>> The intent of this summary was to provide a summary of terms over and above other PROV-x specs that are in the prove namespace. >>> No change It is is tempting to think of prov:DirectQueryService as a particular kind of prov:ProvenanceQueryService (..) 63) This section can be deleted if you follow my previous suggestion to rename the latter to prov:ServiceDescription and add prov:describesService relation. (See 43/44 above) >>> Yes, the explicit relation makes that clearer >>> Done - deleted C. References I have NOT checked the validity or correctness of most of these links. Should not SPARQL-SD and URI-template be given as normative references, as this specification depends on them? >>> Ivan confirms we cannot have normative references.
Received on Monday, 11 March 2013 10:01:48 UTC