- From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
- Date: Mon, 11 Mar 2013 13:43:59 +0000
- To: Graham Klyne <graham.klyne@zoo.ox.ac.uk>
- Cc: W3C provenance WG <public-prov-wg@w3.org>
Thank's for the extensive reworking. I am very happy with the responses to my review(s). On Mon, Mar 11, 2013 at 9:55 AM, Graham Klyne <graham.klyne@zoo.ox.ac.uk> wrote: > Stian part 2 > (http://lists.w3.org/Archives/Public/public-prov-wg/2013Jan/0121.html) > >>>> My responses are prefixed like this. > > Summary: > ======== > > PROV-AQ is a very interesting document, because it describes how to > connect provenance to the world, or more specifically to resources on > the Internet. For my own domain of scientific workflow preservation, > there is a particular need for this kind of standardization as > currently there is no recognized mechanism for a service to provide > provenance data in any form. > > The core concepts of PROV-AQ are very easy to understand, simple to > use and clearly scoped. The document is however at times heavy to > read, as edge cases are often explored in detail before introducing > the main concepts and how a functionality is to be used. > > The terminology is a bit odd compared to the rest of the PROV > documents, I particularly wonder why the authors are using the term > target-URI rather than entity-URI; however I understand this is > careful threading as in this particular document there is necessarily > a lot of talk about *resources*. > >>>> I think the terminology is now more closely aligned to other PROV specs. >>>> I think the remaining differences are due to different intent, and hopefully >>>> these have been clarified. > > > It is unclear as to whether PROV-AQ can and should be used for finding > non-PROV provenance descriptions, such as alternative models (OPM, > DCTerms), application-specific resources (logfiles, commit logs), and > human-readable documents (HTML, Word). My view: "PROV-AQ MAY be used > for such purposes, but that PROV-AQ provenance descriptions SHOULD be > available as PROV. PROV SHOULD be represented as PROV-O RDF, and MAY > be represented in other W3C specified PROV serializations.". >>>> See issue http://www.w3.org/2011/prov/track/issues/428 >>>> I mostly agree with Stian's position here >>>> I'm not sure if we actually need to say anything about non-PROV formats >>>> The introduction now explicitly states: > [[ > Most mechanisms described in this note are independent of the provenance > format used, and may be used to access provenance in any available format. > For interoperable provenance publication, use of PROV-O represented in a > standardized RDF format is recommended. Where alternative formats are > available, selection may be made by content negotiation. > ]] > > > I find that the section about pingback service is out of scope for a > PROV-AQ service, and therefore below (point 56) suggest an alternative > approach where the pingback service simply receives link that a > provenance service may later return or include in its store. I don't > distinguish between 'forward' and 'backward' provenance, so for me1 > "has provenance" means I will find some provenance data where this > entity ("target-URI") is present - but the WG might have a different > view and could want to distinguish between the two directions, as > popular resources could accumulate a lot of forward traces. >>>> Comments at point 56. These changes have been substantially adopted. > > > Detailed review - numbering continues from previous email: > ======= > > > 4. Provenance query service > > 35) "the naming authority associated with the target-URI is not the > same as the service offering provenance descriptions" - why is this a > problem? > > "multiple services have provenance descriptions about the same > resource" - why is this a problem? > > Neither of these seem like a problem from the previous bits of this > specification. Section 3 specifically allows multiple provenance-uris > and don't require these to be hosted at the sane "naming authority". > > I think what you are trying to say in these two is something like: > > * "third-party providers of provenance descriptions who can't use the > mechanisms of Section 3 because the target-URI is outside their > control" >>>> Yes, I've revised the text to reflect this intent, using substantially >>>> this wording. > > 36) "the service associated with the target-URI is not accessible for > adding additional information when handling retrieval requests" > I don't know what this means. Which service? Adding on retrieval? Not > accessible? >>>> Covered by revised wording above. > > > 37) "query services may provide additional control over what > provenance is returned" > perhaps change "control" to "filters" - make it sound like a good > thing when there is too much provenance! >>>> Changed. > > 38) I suggest to add consideration: > "query services may support more complex queries such as "which > entities were derived from entities attributed to agent X"" >>>> I've reworked the motivation taking this on board. > > > 39) "such usage is not described here" -> ".. not described here" >>>> OBE - text no longer exists > > > 40) "use the information obtained to query for required provenance." > ... add "according to the specified query mechanism" >>>> Agree >>>> Updated with similar > > > 41) "Dereferencing a provenance query service URI" --> "... service-URI" >>>> OK (sect 4.1) >>>> Done. > > > > 42) "this specification does not preclude the use of non-RDF formats" > JSON-LD <http://json-ld.org/spec/latest/json-ld-syntax/> is growing in > popularity, should we perhaps propose a JSON-LD context? I think it > would be quite straight forward, and actually managed to do it in > about 15 minutes (including learning the syntax). >>>> I regard this option as being covered by "RDF (in any of its common >>>> serializations...)" >>>> I'm quite open to use of JSON-LD, but I feel it may be too early to push >>>> this >>>> The current text sticks with "The service description presented here may >>>> be supplied as RDF (in any of its common serializations as determined by >>>> HTTP content negotiation)," >>>> See also: http://www.w3.org/2011/prov/track/issues/622 > > [...example elided...] > > > 43) As shown in the complete example in 4.1.3, the > ProvenanceQueryService is not connected to the DirectQueryService or > sd:Service. Given that services don't have a general name, it would be > difficult for implementers to know if a node in the graph is a service > or just happens to be further/additional data (for instance details > about the publisher of the service). It also means I can't mention at > all a service, without implying that I am somehow providing it as part > of my service description. > > I therefore suggest that the ProvenanceQueryService should link to the > services using a term like prov:describesService - see modified > example: >>>> My intention was that they could be located by type, but I would be >>>> happy to include a prov:describesService relation > > @prefix prov: <http://www.w3c.org/ns/prov#> > @prefix sd: <http://www.w3.org/ns/sparql-service-description#> > > <> a prov:ProvenanceQueryService ; > prov:describesService <#direct>, <#sparql> ; > dcterms:publisher <#us> . > > <#us> a foaf:Organization ; > foaf:name "and not a service!" . > > <#direct> a prov:DirectQueryService ; > prov:provenanceUriTemplate "?target={+uri}" > . > <#sparql> a sd:Service ; > sd:endpoint </sparql/> ; > sd:supportedLanguage sd:SPARQL11Query . > > > The added advantage of this is that you can do the bnode shorthand > when you don't know quite know or care what to call your service > entries: > > <> a prov:ProvenanceQueryService ; > prov:describesService [ > [ a prov:DirectQueryService ; > prov:provenanceUriTemplate "?target={+uri}" ], > [ a sd:Service ; > sd:endpoint "?target={+uri}", > sd:supportedLanguage sd:SPARQL11Query > ] . > >>>> I like that - it has the added advantage of making the relationship >>>> between a service description document and an individual service description >>>> more explicit. >>>> Done > > > 44) I suggest renaming the verbose prov:ProvenanceQueryService to > prov:ServiceDescription. We don't need to say Provenance because of > the namespace. It's also not a service itself, just descriptions. This > avoids confusion whether the DirectQueryService is a > ProvenanceQueryService. Combined with the prov:describesService from > above, the distinction should be clear. >>>> Done > > 45) This protocol typically combines the target-URI with the > service-URI to formulate an HTTP GET request, according to the > following convention: > > Typically..? Is this not meant to *define* the protocol? Remove "typically". > >>>> Partly due to other changes, this has been reworked to require the URI >>>> to be defined by the URI template in the service description. > > > 46) "provenance description for the resource-URI" > - while I like "resource-URI" over "target-URI" (and perhaps > entity-URI even more) - I think this is a typo. --> target-URI >>>> Changed to target-uri. (We considered using entity-uri throughout, but >>>> this would not haver covered activities.) > > > 47) "Any server that implements this protocol and receives a request > URI in this form SHOULD return a provenance description for the > resource-URI embedded in the query component, where that URI is the > result of percent-decoding the value associated with the > provenance-resource key" - a bit heavy and cryptic sentence. What is > "the value associated with the the provenance-resource key"? >>>> Sect 4.2 >>>> Re-worked. > > 48) "If the supplied resource-URI includes a fragment identifier, the > '#' MUST be %-encoded as %23 when constructing the provenance-URI > value; similarly, any '&' character in the resource-URI must be > %-encoded as %26 [[RFC3986]]." - I am a bit uncertain about this - > are you implying that only those characters need to be escaped? What > about "%"? It should be clearly specified if a URL like > http://example.com/with%20spaces should be sent along as-is with %20, > or double-encoded as %2520. I agree that it's very important to > highlight that # and & must be %-encoded as they would otherwise fall > out - but it should also here clearly indicate the regular encoding. > As this is getting a bit long - perhaps split into a second paragraph > which is only about encoding. (Ie. first paragraph says what is to be > returned, etc, second paragraph just details about the URI encoding) >>>> This has been substantially re-worked. Some of the discussion has been >>>> moved to a supporting note. > > 49) "If the provenance described by the request does not exist in the > server, a 404 Not Found response code SHOULD be returned." > > This section does not define other error conditions, like what the > server should do if access is restricted. Obviously the regular HTTP > status codes apply, but it might be worth pointing out that the server > is not required to make such responses public - so it might for > instance require authentication with 401, or 'hide' the existence of a > response with 404. " This status code is commonly used when the server > does not wish to reveal exactly why the request has been refused, or > when no other response is applicable.". > >>>> I've re-worked the text > > > Probably this is out of scope - but I was thinking that it could be > useful if the server could return 403 Forbidden, for instance because > it refuses to give provenance details for resources that are not 'his' > (not under example.com for instance). It could return a text/uri-list > of base URIs of which the server will support. > (this is slight abuse of text/uri-list because there might be no > resource with that particular URI - more appropriate would be a list > of URI templates, but there are no media type for that). >>>> I agree it's out of scope. > > > 50) "does not exist in the server" --> change to "is unknown to the > server" - as there is no requirement that the provenance resource is > on the same server. (and neither should there be!) >>>> Done as part of above. > > > 51) "should be capable of returning RDF using the vocabulary defined > by [PROV-O], in any standard RDF serialization (e.g. RDF/XML), or any > other standard serialization of the Provenance Model specification > [PROV-DM]." - both "any" change to "a" - only one of them is needed, > not all - which 'any' might imply! >>>> sect 4.2 >>>> Re-worked > > > 52) "other standard serialization (..) PROV-DM" - Is this something > we've defined somewhere? How would you know if say PROV JSON is a > standard serialization? >>>> You'd know because it's defined in a standard specification :^) >>>> There intent is to leave the way open to future standards. >>>> The text is re-workedm and now more open to any format trough content >>>> negotiation. >>>> See also: http://www.w3.org/2011/prov/track/issues/428 > > > 53) "A provenance query service SHOULD be capable of returning RDF > ... , or any other standard serialization of the Provenance Model > specification" > - it is unclear if second part is covered by the SHOULD or not. I > can see 4 interpretations: > > a) Service SHOULD return PROV-O RDF, and MAY return other PROV > serializations > > b) Service SHOULD return ( either PROV-O RDF or other PROV serialization ) > > c) Service SHOULD return at least one of ( PROV-O RDF, other PROV > serialization) (ie. simply "one of the PROV serializations") > > d) Service SHOULD return PROV-O RDF. Other PROV serializations could > be used. (no MAY/SHOULD). > > I would recommend a) above - as then the clients would have some > reasonable expectation about what is generally supported, rather than > having to build in support for PROVXML, PROV-N, etc. just because they > are all covered by the same SHOULD of b). >>>> Agree -- see comments above at introduction to review. >>>> SHOULD dropped in re-work >>>> See also: http://www.w3.org/2011/prov/track/issues/428 > > > 54) "Previously, section 3. Locating provenance descriptions has > described use of HTTP Link: header fields and HTML <link> elements to > indicate provenance query services. Beyond that, this specification > does not define any specific mechanism for discovering query services. > " - this forgot about section 3.3 Resource represented as RDF. >>>> Section 4.3 >>>> "RDF statements" added > > 5. Forward provenance > > S: Link: <http://acme.example.org/pingback/super-widget>; > rel=http://www.w3.org/ns/prov#provPingback > > > 55) I would rename this to just "pingback" why double "prov"? > > rel=http://www.w3.org/ns/prov#pingback >>>> Done > > > A consumer of the resource, or some other system, may perform an HTTP POST > operation to the pingback URI where the POST request body contains > provenance in one of the recognized provenance description formats. For > interoperability, a ping-back receiving service should be able to accept at > least PROV-O provenance presented as RDF/XML or Turtle. > > 56) I think this kind of "provenance posting" (and hence intended > provenance-URI creation) sounds out of scope for a pingback service > and probably also for this whole document. There are many existing > protocols on how to manage and create resources, such as AtomPub, > WebDav (uggh..), SFTP, etc. I don't think we need to go into that area > to define yet another way on how to create HTTP resources. > > I would not expected to have to post my actual provenance to the > service, which implies that the service then should keep this and > present it willy-nilly to others as its own. This document also does > not say much about what the server is expected or not to do with this, > or how it can refuse provenance which it does not like or permit. > > I would rather think that a pingback service should work like > pingbacks in blogs, where the pingback simply gives the blog anURI of > a third-party site which talks about a given blog post at the pingback > host. > > [Details from original message elided] > >>>> This proposal has been adopted and discussed with Stian. I think it >>>> does indeed sit better with the goals of PROV-AQ. > > > 6. Security considerations > > When retrieving a provenance URI from a document, steps should be taken to > ensure the document itself is an accurate copy of the original whose author > is being trusted (e.g. signature checking, or use of a trusted secure web > service). > 57) What is "document" above? Should this refer to section 3.2? >>>> Yes - cross-ref section 3.2, 3.3 >>>> Discussion moved to 1.3, and cross-ref added. > > 58) A paragraph should be added about cross-site request forgery and > distributed denial attacks, similar to my blurb above: > > When clients and servers are retrieving submitted URIs such as > provenance descriptions and following or registering links; reasonable > care should be taken to prevent malicious use such as distributed > denial of service attacks (DDoS), cross-site request forgery (CSRF), > spamming and hosting of inappropriate materials. Reasonable > preventions might include same-origin policy, HTTP authorization, SSL, > rate-limiting, spam filters, moderation queues, user acknowledgements > and validation. It is out of scope for this document to specify how > such mechanisms work and should be applied. > >>>> I'm not sure how CSRF applies here: my understanding is that that's a >>>> browser issue, not a general application issue >>>> I've added this, but have an outstanding query about CSRF > > Provenance descriptions may provide a route for leakage of privacy-related > information > > > 59) We should also add something obvious like: > > Accessing provenance services might reveal to the service and > third-parties information which is considered private, including which > resources a client has taken interest in. For instance, a browser > extension which collects all provenance data for a resource which is > being saved to the local disk, could be revealing user interest in a > sensitive resource to a third-party site listed by prov:hasProvenance > or prov:hasQueryService relation. A detailed query submitted to a > third-party provenance query service might be revealing personal > information such as social security numbers. >>>> Worked in > > B. Names added to prov: namespace > > > 60) Broken definition links: DirectQueryService, provenanceURITemplate >>>> Fixed. > > > 61) Where can I download the OWL for the additional relations? >>>> Placeholders pointing into mercurial added, with TODO to fix. > > > 62) After table, add a note like "In addition, PROV-AQ reuses these > terms from the SPARQL service description vocabulary: sd:AA sd: BB" >>>> Actually, I don't think PROV-AQ is re-using those terms so much as >>>> providing a framework within which they, and others, MAY be applicable. >>>> The intent of this summary was to provide a summary of terms over and >>>> above other PROV-x specs that are in the prove namespace. >>>> No change > > > It is is tempting to think of prov:DirectQueryService as a particular kind > of prov:ProvenanceQueryService (..) > > > > 63) This section can be deleted if you follow my previous suggestion > to rename the latter to prov:ServiceDescription and add > prov:describesService relation. (See 43/44 above) >>>> Yes, the explicit relation makes that clearer >>>> Done - deleted > > > > C. References > I have NOT checked the validity or correctness of most of these links. > > Should not SPARQL-SD and URI-template be given as normative > references, as this specification depends on them? > >>>> Ivan confirms we cannot have normative references. > > > -- Stian Soiland-Reyes, myGrid team School of Computer Science The University of Manchester
Received on Monday, 11 March 2013 13:44:49 UTC