Re: PROV-AQ responses to Stian's review (part 2) from Stian Soiland-Reyes on 2013-03-11 (public-prov-wg@w3.org from March 2013)

From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Date: Mon, 11 Mar 2013 13:43:59 +0000
To: Graham Klyne <graham.klyne@zoo.ox.ac.uk>
Cc: W3C provenance WG <public-prov-wg@w3.org>
Message-ID: <CAPRnXtmBg4HAVC3fbwK7DvN79Ub06OB0VZbcXG9fsHSqgymvAA@mail.gmail.com>
Thank's for the extensive reworking.

I am very happy with the responses to my review(s).

On Mon, Mar 11, 2013 at 9:55 AM, Graham Klyne <graham.klyne@zoo.ox.ac.uk> wrote:
> Stian part 2
> (http://lists.w3.org/Archives/Public/public-prov-wg/2013Jan/0121.html)
>
>>>> My responses are prefixed like this.
>
> Summary:
> ========
>
> PROV-AQ is a very interesting document, because it describes how to
> connect provenance to the world, or more specifically to resources on
> the Internet. For my own domain of scientific workflow preservation,
> there is a particular need for this kind of standardization as
> currently there is no recognized mechanism for a service to provide
> provenance data in any form.
>
> The core concepts of PROV-AQ are very easy to understand, simple to
> use and clearly scoped. The document is however at times heavy to
> read, as edge cases are often explored in detail before introducing
> the main concepts and how a functionality is to be used.
>
> The terminology is a bit odd compared to the rest of the PROV
> documents, I particularly wonder why the authors are using the term
> target-URI rather than entity-URI; however I understand this is
> careful threading as in this particular document there is necessarily
> a lot of talk about *resources*.
>
>>>> I think the terminology is now more closely aligned to other PROV specs.
>>>> I think the remaining differences are due to different intent, and hopefully
>>>> these have been clarified.
>
>
> It is unclear as to whether PROV-AQ can and should be used for finding
> non-PROV provenance descriptions, such as alternative models (OPM,
> DCTerms), application-specific resources (logfiles, commit logs), and
> human-readable documents (HTML, Word). My view: "PROV-AQ MAY be used
> for such purposes, but that PROV-AQ provenance descriptions SHOULD be
> available as PROV. PROV SHOULD be represented as PROV-O RDF, and MAY
> be represented in other W3C specified PROV serializations.".
>>>> See issue http://www.w3.org/2011/prov/track/issues/428
>>>> I mostly agree with Stian's position here
>>>> I'm not sure if we actually need to say anything about non-PROV formats
>>>> The introduction now explicitly states:
> [[
> Most mechanisms described in this note are independent of the provenance
> format used, and may be used to access provenance in any available format.
> For interoperable provenance publication, use of PROV-O represented in a
> standardized RDF format is recommended. Where alternative formats are
> available, selection may be made by content negotiation.
> ]]
>
>
> I find that the section about pingback service is out of scope for a
> PROV-AQ service, and therefore below (point 56) suggest an alternative
> approach where the pingback service simply receives link that a
> provenance service may later return or include in its store. I don't
> distinguish between 'forward' and 'backward' provenance, so for me1
> "has provenance" means I will find some provenance data where this
> entity ("target-URI") is present - but the WG might have a different
> view and could want to distinguish between the two directions, as
> popular resources could accumulate a lot of forward traces.
>>>> Comments at point 56.  These changes have been substantially adopted.
>
>
> Detailed review - numbering continues from previous email:
> =======
>
>
> 4. Provenance query service
>
> 35) "the naming authority associated with the target-URI is not the
> same as the service offering provenance descriptions" - why is this a
> problem?
>
> "multiple services have provenance descriptions about the same
> resource" - why is this a problem?
>
> Neither of these seem like a problem from the previous bits of this
> specification. Section 3 specifically allows multiple provenance-uris
> and don't require these to be hosted at the sane "naming authority".
>
> I think what you are trying to say in these two is something like:
>
> * "third-party providers of provenance descriptions who can't use the
> mechanisms of Section 3 because the target-URI is outside their
> control"
>>>> Yes, I've revised the text to reflect this intent, using substantially
>>>> this wording.
>
> 36) "the service associated with the target-URI is not accessible for
> adding additional information when handling retrieval requests"
> I don't know what this means.  Which service? Adding on retrieval? Not
> accessible?
>>>> Covered by revised wording above.
>
>
> 37) "query services may provide additional control over what
> provenance is returned"
> perhaps change "control" to "filters" - make it sound like a good
> thing when there is too much provenance!
>>>> Changed.
>
> 38) I suggest to add consideration:
> "query services may support more complex queries such as "which
> entities were derived from entities attributed to agent X""
>>>> I've reworked the motivation taking this on board.
>
>
> 39) "such usage is not described here" -> ".. not described here"
>>>> OBE - text no longer exists
>
>
> 40) "use the information obtained to query for required provenance."
> ...  add "according to the specified query mechanism"
>>>> Agree
>>>> Updated with similar
>
>
> 41) "Dereferencing a provenance query service URI" --> "... service-URI"
>>>> OK (sect 4.1)
>>>> Done.
>
>
>
> 42) "this specification does not preclude the use of non-RDF formats"
> JSON-LD <http://json-ld.org/spec/latest/json-ld-syntax/> is growing in
> popularity, should we perhaps propose a JSON-LD context? I think it
> would be quite straight forward, and actually managed to do it in
> about 15 minutes (including learning the syntax).
>>>> I regard this option as being covered by "RDF (in any of its common
>>>> serializations...)"
>>>> I'm quite open to use of JSON-LD, but I feel it may be too early to push
>>>> this
>>>> The current text sticks with "The service description presented here may
>>>> be supplied as RDF (in any of its common serializations as determined by
>>>> HTTP content negotiation),"
>>>> See also: http://www.w3.org/2011/prov/track/issues/622
>
> [...example elided...]
>
>
> 43) As shown in the complete example in 4.1.3, the
> ProvenanceQueryService is not connected to the DirectQueryService or
> sd:Service. Given that services don't have a general name, it would be
> difficult for implementers to know if a node in the graph is a service
> or just happens to be further/additional data (for instance details
> about the publisher of the service). It also means I can't mention at
> all a service, without implying that I am somehow providing it as part
> of my service description.
>
> I therefore suggest that the ProvenanceQueryService should link to the
> services using a term like prov:describesService - see modified
> example:
>>>> My intention was that they could be located by type, but I would be
>>>> happy to include a prov:describesService relation
>
> @prefix prov: <http://www.w3c.org/ns/prov#>
> @prefix sd: <http://www.w3.org/ns/sparql-service-description#>
>
> <> a prov:ProvenanceQueryService ;
>     prov:describesService <#direct>, <#sparql> ;
>     dcterms:publisher <#us> .
>
> <#us> a foaf:Organization ;
>    foaf:name "and not a service!" .
>
> <#direct> a prov:DirectQueryService ;
>   prov:provenanceUriTemplate "?target={+uri}"
>   .
> <#sparql> a sd:Service ;
>     sd:endpoint </sparql/> ;
>     sd:supportedLanguage sd:SPARQL11Query .
>
>
> The added advantage of this is that you can do the bnode shorthand
> when you don't know quite know or care what to call your service
> entries:
>
> <> a prov:ProvenanceQueryService ;
>     prov:describesService [
>       [ a prov:DirectQueryService ;
>         prov:provenanceUriTemplate "?target={+uri}" ],
>       [ a sd:Service ;
>         sd:endpoint "?target={+uri}",
>         sd:supportedLanguage sd:SPARQL11Query
>       ] .
>
>>>> I like that - it has the added advantage of making the relationship
>>>> between a service description document and an individual service description
>>>> more explicit.
>>>> Done
>
>
> 44) I suggest renaming the verbose prov:ProvenanceQueryService to
> prov:ServiceDescription. We don't need to say Provenance because of
> the namespace. It's also not a service itself, just descriptions. This
> avoids confusion whether the DirectQueryService is a
> ProvenanceQueryService. Combined with the prov:describesService from
> above, the distinction should be clear.
>>>> Done
>
> 45) This protocol typically combines the target-URI with the
> service-URI to formulate an HTTP GET request, according to the
> following convention:
>
> Typically..? Is this not meant to *define* the protocol? Remove "typically".
>
>>>> Partly due to other changes, this has been reworked to require the URI
>>>> to be defined by the URI template in the service description.
>
>
> 46) "provenance description for the resource-URI"
>  - while I like "resource-URI" over "target-URI" (and perhaps
> entity-URI even more) - I think this is a typo.  --> target-URI
>>>> Changed to target-uri.  (We considered using entity-uri throughout, but
>>>> this would not haver covered activities.)
>
>
> 47) "Any server that implements this protocol and receives a request
> URI in this form SHOULD return a provenance description for the
> resource-URI embedded in the query component, where that URI is the
> result of percent-decoding the value associated with the
> provenance-resource key" - a bit heavy and cryptic sentence. What is
> "the value associated with the the provenance-resource key"?
>>>> Sect 4.2
>>>> Re-worked.
>
> 48) "If the supplied resource-URI includes a fragment identifier, the
> '#' MUST be %-encoded as %23 when constructing the provenance-URI
> value; similarly, any '&' character in the resource-URI must be
> %-encoded as %26 [[RFC3986]]."  - I am a bit uncertain about this -
> are you implying that only those characters need to be escaped? What
> about "%"? It should be clearly specified if a URL like
> http://example.com/with%20spaces should be sent along as-is with %20,
> or double-encoded as %2520.  I agree that it's very important to
> highlight that # and & must be %-encoded as they would otherwise fall
> out - but it should also here clearly indicate the regular encoding.
> As this is getting a bit long - perhaps split into a second paragraph
> which is only about encoding. (Ie. first paragraph says what is to be
> returned, etc, second paragraph just details about the URI encoding)
>>>> This has been substantially re-worked.  Some of the discussion has been
>>>> moved to a supporting note.
>
> 49) "If the provenance described by the request does not exist in the
> server, a 404 Not Found response code SHOULD be returned."
>
> This section does not define other error conditions, like what the
> server should do if access is restricted. Obviously the regular HTTP
> status codes apply, but it might be worth pointing out that the server
> is not required to make such responses public - so it might for
> instance require authentication with 401, or 'hide' the existence of a
> response with 404. " This status code is commonly used when the server
> does not wish to reveal exactly why the request has been refused, or
> when no other response is applicable.".
>
>>>> I've re-worked the text
>
>
> Probably this is out of scope - but I was thinking that it could be
> useful if the server could return 403 Forbidden, for instance because
> it refuses to give provenance details for resources that are not 'his'
> (not under example.com for instance). It could return a text/uri-list
> of base URIs of which the server will support.
> (this is slight abuse of text/uri-list because there might be no
> resource with that particular URI - more appropriate would be a list
> of URI templates, but there are no media type for that).
>>>> I agree it's out of scope.
>
>
> 50) "does not exist in the server"  --> change to "is unknown to the
> server" - as there is no requirement that the provenance resource is
> on the same server. (and neither should there be!)
>>>> Done as part of above.
>
>
> 51) "should be capable of returning RDF using the vocabulary defined
> by [PROV-O], in any standard RDF serialization (e.g. RDF/XML), or any
> other standard serialization of the Provenance Model specification
> [PROV-DM]."  - both "any" change to "a" - only one of them is needed,
> not all - which 'any' might imply!
>>>> sect 4.2
>>>> Re-worked
>
>
> 52) "other standard serialization (..) PROV-DM"  - Is this something
> we've defined somewhere? How would you know if say PROV JSON is a
> standard serialization?
>>>> You'd know because it's defined in a standard specification :^)
>>>> There intent is to leave the way open to future standards.
>>>> The text is re-workedm and now more open to any format trough content
>>>> negotiation.
>>>> See also: http://www.w3.org/2011/prov/track/issues/428
>
>
> 53) "A provenance query service SHOULD  be capable of returning RDF
> ... , or any other standard serialization of the Provenance Model
> specification"
> - it is unclear if second part is covered by the SHOULD or not.   I
> can see 4 interpretations:
>
> a) Service SHOULD return PROV-O RDF, and MAY return other PROV
> serializations
>
> b) Service SHOULD return ( either PROV-O RDF or other PROV serialization )
>
> c) Service SHOULD return at least one of ( PROV-O RDF, other PROV
> serialization)  (ie.  simply "one of the PROV serializations")
>
> d) Service SHOULD return PROV-O RDF.   Other PROV serializations could
> be used. (no MAY/SHOULD).
>
> I would recommend a) above - as then the clients would have some
> reasonable expectation about what is generally supported, rather than
> having to build in support for PROVXML, PROV-N, etc. just because they
> are all covered by the same SHOULD of b).
>>>> Agree -- see comments above at introduction to review.
>>>> SHOULD dropped in re-work
>>>> See also: http://www.w3.org/2011/prov/track/issues/428
>
>
> 54) "Previously, section 3. Locating provenance descriptions has
> described use of HTTP Link: header fields and HTML <link> elements to
> indicate provenance query services. Beyond that, this specification
> does not define any specific mechanism for discovering query services.
> "  - this forgot about section 3.3 Resource represented as RDF.
>>>> Section 4.3
>>>> "RDF statements" added
>
> 5. Forward provenance
>
>   S: Link: <http://acme.example.org/pingback/super-widget>;
>           rel=http://www.w3.org/ns/prov#provPingback
>
>
> 55) I would rename this to just "pingback" why double "prov"?
>
>           rel=http://www.w3.org/ns/prov#pingback
>>>> Done
>
>
>  A consumer of the resource, or some other system, may perform an HTTP POST
> operation to the pingback URI where the POST request body contains
> provenance in one of the recognized provenance description formats. For
> interoperability, a ping-back receiving service should be able to accept at
> least PROV-O provenance presented as RDF/XML or Turtle.
>
> 56) I think this kind of "provenance posting" (and hence intended
> provenance-URI creation) sounds out of scope for a pingback service
> and probably also for this whole document. There are many existing
> protocols on how to manage and create resources, such as AtomPub,
> WebDav (uggh..), SFTP, etc. I don't think we need to go into that area
> to define yet another way on how to create HTTP resources.
>
> I would not expected to have to post my actual provenance to the
> service, which implies that the service then should keep this and
> present it willy-nilly to others as its own.  This document also does
> not say much about what the server is expected or not to do with this,
> or how it can refuse provenance which it does not like or permit.
>
> I would rather think that a pingback service should work like
> pingbacks in blogs, where the pingback simply gives the blog anURI of
> a third-party site which talks about a given blog post at the pingback
> host.
>
> [Details from original message elided]
>
>>>> This proposal has been adopted and discussed with Stian.  I think it
>>>> does indeed sit better with the goals of PROV-AQ.
>
>
> 6. Security considerations
>
>  When retrieving a provenance URI from a document, steps should be taken to
> ensure the document itself is an accurate copy of the original whose author
> is being trusted (e.g. signature checking, or use of a trusted secure web
> service).
> 57) What is "document" above? Should this refer to section 3.2?
>>>> Yes - cross-ref section 3.2, 3.3
>>>> Discussion moved to 1.3, and cross-ref added.
>
> 58) A paragraph should be added about cross-site request forgery and
> distributed denial attacks, similar to my blurb above:
>
> When clients and servers are retrieving submitted URIs such as
> provenance descriptions and following or registering links; reasonable
> care should be taken to prevent malicious use such as distributed
> denial of service attacks (DDoS), cross-site request forgery (CSRF),
> spamming and hosting of inappropriate materials. Reasonable
> preventions might include same-origin policy, HTTP authorization, SSL,
> rate-limiting, spam filters, moderation queues, user acknowledgements
> and validation. It is out of scope for this document to specify how
> such mechanisms work and should be applied.
>
>>>> I'm not sure how CSRF applies here:  my understanding is that that's a
>>>> browser issue, not a general application issue
>>>> I've added this, but have an outstanding query about CSRF
>
> Provenance descriptions may provide a route for leakage of privacy-related
> information
>
>
> 59) We should also add something obvious like:
>
> Accessing provenance services might reveal to the service and
> third-parties information which is considered private, including which
> resources a client has taken interest in. For instance, a browser
> extension which collects all provenance data for a resource which is
> being saved to the local disk, could be revealing user interest in a
> sensitive resource to a third-party site listed by prov:hasProvenance
> or prov:hasQueryService relation. A detailed query submitted to a
> third-party provenance query service might be revealing personal
> information such as social security numbers.
>>>> Worked in
>
> B. Names added to prov: namespace
>
>
> 60) Broken definition links: DirectQueryService, provenanceURITemplate
>>>> Fixed.
>
>
> 61) Where can I download the OWL for the additional relations?
>>>> Placeholders pointing into mercurial added, with TODO to fix.
>
>
> 62) After table, add a note like "In addition, PROV-AQ reuses these
> terms from the SPARQL service description vocabulary: sd:AA sd: BB"
>>>> Actually, I don't think PROV-AQ is re-using those terms so much as
>>>> providing a framework within which they, and others, MAY be applicable.
>>>> The intent of this summary was to provide a summary of terms over and
>>>> above other PROV-x specs that are in the prove namespace.
>>>> No change
>
>
> It is is tempting to think of prov:DirectQueryService as a particular kind
> of prov:ProvenanceQueryService (..)
>
>
>
> 63) This section can be deleted if you follow my previous suggestion
> to rename the latter to prov:ServiceDescription and add
> prov:describesService relation. (See 43/44 above)
>>>> Yes, the explicit relation makes that clearer
>>>> Done - deleted
>
>
>
> C. References
> I have NOT checked the validity or correctness of most of these links.
>
> Should not SPARQL-SD and URI-template be given as normative
> references, as this specification depends on them?
>
>>>> Ivan confirms we cannot have normative references.
>
>
>



-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester
Received on Monday, 11 March 2013 13:44:49 UTC