Re: Review of PROV-AQ

I've gone through your response, and agree with your reasoning. I am fine
with the new state of the document.

On Fri, Apr 5, 2013 at 5:59 PM, Graham Klyne <graham.klyne@zoo.ox.ac.uk>wrote:

> On 03/04/2013 17:14, Stian Soiland-Reyes wrote:
>
>> Below is my review of
>> https://dvcs.w3.org/hg/prov/**raw-file/fa9bac23203a/paq/**prov-aq.html<https://dvcs.w3.org/hg/prov/raw-file/fa9bac23203a/paq/prov-aq.html>
>> (as of 2013-04-03)
>>
>> I know I promised a non-evil review.. which I believe this is - it is
>> unfortunately though still a bit long as now with a fresh eye I've
>> identified some editorial issues. None of these are considered
>> blocking.
>>
> Stian, many thanks for these.  There are many good comments and catches,
> most of which I've incorporated.  It would be great if you have a chance to
> eyeball the changes to see if they match your intent.
>
> More detailed responses below.
>
>
>
>>
>> 1)
>>
>>> Status of This Document
>>> This is the third public working.
>>>
>> This would have to be updated to fourth or Final Note or whatever it is
>> called.
>>
> I've removed that for now.  I think the correct wording can be sorted in
> the staging process when we have formal WG agreement to publish.
>
>
>  This section should add something like:
>>
>>  This document is intended to be published as a W3C Note, not as a formal
>>> W3C Specification. For clarity, the document does however use the key words
>>> MUST, MUST NOT, REQUIRED, SHOULD, SHOULD NOT, RECOMMENDED, MAY and OPTIONAL
>>> as described in [RFC2119].
>>>
>> I've added something along those lines at the end of the introduction.
>
>
>  2)
>>
>>> A provenance record consumer will need to isolate information about the
>>> specific entity or entities of interest. These may be constrained resources
>>> identified by separate target-URIs than the original resource, in which
>>> case it will need to know about the target-URIs used. The mechanisms
>>> defined later allow a provider to expose such URIs.
>>>
>> Confusing. Rewrite to something like:
>>
>> A provenance record consumer will need to isolate information about
>> the specific entity or entities of interest. These may be constrained
>> resources identified by target-URIs that differ from the resource URI,
>> in which case the consumer needs to discover those target-URIs. The
>> mechanisms defined later allow a provider to expose such URIs.
>>
> Thanks, that's better.
>
>
>>
>> 3)
>>
>>  Any resource that is described by some provenance - typically an entity
>>> (in the sense of [PROV-DM], but may be an activity).
>>>
>> --> typically an entity (in the sense of [PROV-DM]), but may be of
>> another type (such as [PROV-DM] activity).
>>
>>  Change applied.  But note that Luc has suggested removing this entry.
>
>
>
>> 4)
>>
>> The mechanisms used with HTTP and HTML/RDF are slightly inconsistent
>> in their approach to specifying target-URI values. In HTTP Link:
>> headers, an optional anchor= parameter may be supplied for each such
>> header. In HTML and RDF, separate #has_anchor relations are defined. I
>>
>>
>> can we move this note down to 3.2? I've not seen any anchors yet!
>>
> Yes, it's done.  (Also added cross-ref from 3.3)
>
>
>  5)
>>
>>> This specification does not define
>>>
>> --> This note
>>
> OK.
>
>
>> 6)
>>
>>> The presence of a has_provenance link in an HTTP response does not
>>> preclude the possibility that other providers may offer provenance records
>>> about the same resource.
>>>
>> --> that other providers may also offer ...
>>
> OK ("... also may offer")
>
>
>> 7)
>>
>>> An example request including provenance headers in its response (..)
>>>
>> --> An example HTTP response including provenance headers (..)
>>
> OK.
>
>
>> 8)
>>
>>> There may be multiple has_query_service link header fields
>>>
>> "may" -> "MAY"
>>
> OK.
>
>
>
>> 9)
>>    <html xmlns="http://www.w3.org/1999/**xhtml<http://www.w3.org/1999/xhtml>
>> ">
>>       <head>
>>          <link rel="http://www.w3.org/ns/**prov#has_provenance<http://www.w3.org/ns/prov#has_provenance>
>> "
>> href="provenance-URI">
>>          <link rel="http://www.w3.org/ns/**prov#has_anchor<http://www.w3.org/ns/prov#has_anchor>"
>> href="target-URI">
>>
>> If this is meant to be XHTML, then the <link> should be terminated as:
>>          <link rel="http://www.w3.org/ns/**prov#has_provenance<http://www.w3.org/ns/prov#has_provenance>
>> "
>> href="provenance-URI" />
>>          <link rel="http://www.w3.org/ns/**prov#has_anchor<http://www.w3.org/ns/prov#has_anchor>"
>> href="target-URI" />
>>
>> Due to the advent of HTML5, I would however simply remove the xmlns
>> declaration.
>>
> OK.
>
>
>> 10) HTML5 says that:
>>
>>  If the rel attribute is absent, has no keywords, or if none of the
>>> keywords used are allowed according to the definitions in this
>>> specification, then the element does not create any links.
>>> Registration of relation types in HTTP Link: header fields is distinct
>>> from HTML link types, and thus their semantics can be different from
>>> same-named HTML types.
>>> http://www.w3.org/TR/html5/**document-metadata.html#the-**link-element<http://www.w3.org/TR/html5/document-metadata.html#the-link-element>
>>>
>> and:
>>
>>  Extensions to the predefined set of link types may be registered in the
>>> microformats wiki existing-rel-values page.
>>> http://www.w3.org/TR/html5/**links.html#other-link-types<http://www.w3.org/TR/html5/links.html#other-link-types>
>>>
>> However none of the suggested rel's like
>> http://www.w3.org/ns/prov#has_**provenance<http://www.w3.org/ns/prov#has_provenance>have been registered at
>> http://microformats.org/wiki/**existing-rel-values#HTML5_**
>> link_type_extensions<http://microformats.org/wiki/existing-rel-values#HTML5_link_type_extensions>
>>
>> My suggestion is to add our relations here, with reference to the
>> current working draft.
>>
> Where "here" is the microformats wiki?   I'll create an action to do this.
>
> (Seems kinda dumb to me requiring registration of URI link relation types
> on a wiki, but that's HTML5 for you...)
>
>
>
>>
>> 11)  3.2 Resource represented as HTML
>>
>> Can these link types also be used with <a> and <area>? I assume a
>> similar disclaimer note like in RDF statements with other subjects
>> would be appropriate. For instance:
>>
>> The link relations
>> <code>http://www.w3.org/ns/**prov#has_provenance<http://www.w3.org/ns/prov#has_provenance>
>> </code>,
>> <code>http://www.w3.org/ns/**prov#has_anchor<http://www.w3.org/ns/prov#has_anchor></code>
>> and
>> <code>http://www.w3.org/ns/**prov#has_query_service<http://www.w3.org/ns/prov#has_query_service></code>
>> may be also
>> used on <a> and <area> links witin the HTML body, but discussion of
>> such use is beyond the scope of this document.
>>
> I'm less inclined to do this.  With RDF, it's a fairly obvious thing to
> want to do, so it seems worth noting that such use is OK.  With HTML5, such
> use seems quite marginal so I wonder if making mention of it may be more
> confusing than helpful.
>
> I'm not set against this, I just am not seeing the point.  Persuade me
> otherwise?
>
>
>
>> 12) 3.1 / 3.2 / 3.3 does not mention pingback
>>
>> If we keep section 5 on Provenance pingback (which I vote for), then
>> it should be included under 3.1/3.2/3.3, just like the provenance
>> query service.
>>
> There seem to be two issues here.
>
> (a) with respect to section 3.1 (HTTP), what can we say here that doesn't
> duplicate material in section 5?
>
> (b) with respect to 3.2 and 3.3, it's not clear to me that we should be
> encouraging publication of pingback URIs in documents.
>
> I also think that keeping the pingpack separate keeps the other material
> less complicated.
>
> What are the specific advantages or benefits you see compared with the
> present organization?
>
>
>
>> 13)
>>
>>  It allows for accessing provenance about a specified target-URI. The
>>> query URI to use is described by a URI Template [URI-template] (level 2 or
>>> above) in which which the variable uri stands for the target-URI; e.g.
>>>
>>> @prefix prov: <http://www.w3c.org/ns/prov#>
>>> <direct-query-description> a prov:DirectQueryService ;
>>>   prov:provenanceUriTemplate "query-URI?target={+uri}" .
>>>
>> This is an example, but it is not styled as one, and does not use real
>> URIs.
>>
>> I don't think we should suggest a "base URI" for the query service, as
>> there is no requirement in URI templates to use the ?query mechanism -
>> and neither do we want to mandate ?target=.
>>
>>
>> Change it to something like:
>>
>>  It allows for accessing provenance about a specified target-URI. The
>>> query URI to use is described by a URI Template [URI-template] (level 2 or
>>> above) in which which the variable uri stands for the target-URI. The URI
>>> template is specified as:
>>> <direct-query-description> a prov:DirectQueryService ;
>>>   prov:provenanceUriTemplate "uri-template" .
>>>
>>> where direct-query-description is any distinct RDF subject node (i.e. a
>>> blank node or a URI) and
>>> uri-template is an URI template [RFC3986].
>>>
>>
> Yes, that's better, thanks.
>
>
>  (with italics on direct-query-description and uri-template. )
>>
>>
>> Then before "MAY recognize additional parameters", insert a new example:
>>
>> _:direct a prov:DirectQueryService ;
>>    prov:provenanceUriTemplate
>>      "http://www.example.com/**provenance/service?target={+**uri}<http://www.example.com/provenance/service?target=%7B+uri%7D>"
>> .
>>
> Done.
>
>
>  14) Example 7
>> sparql-query-description a sd:Service ;
>>      sd:endpoint <query-URI/sparql/> ;
>>      sd:supportedLanguage sd:SPARQL11Query .
>>
>> Here query-URI is in italics - I am confused if this is an example or
>> a pattern. Am I required to use /sparql postfix? Also
>> "sparql-query-description" is not valid Turtle.
>>
>> I would recommend instead to make it match <sparql-query-description>
>> from earlier, and to drop /sparql:
>>
>> <sparql-query-description> a sd:Service ;
>>      sd:endpoint <query-URI> ;
>>      sd:supportedLanguage sd:SPARQL11Query .
>>
> I've gone the other way and made it a pure example.  In this case, the
> generic description is in the [SPARQL-SD] specification.
>
>
>
>> 15) Reference for SPARQL-SD says "Work in progress" - but in fact it
>> has just become "W3C Recommendation 21 March 2013"
>>
> Updated.
>
>
>
>
>> 16)
>>
>>  The SPARQL service description may be detailed or sparse, provided that
>>> it includes at a minimum the following:
>>>
>>> sparql-query-description a sd:Service ;
>>>      sd:endpoint <(SPARQL service endpoint URI reference)> .
>>>
>>
>> This is another syntax for insert-your-URI-here that I have not
>> encountered before in the document.  I would move this definition to
>> ABOVE Example 7 (which is playing double role at the moment as both
>> example and definition), and change it to:
>>
>> <sparql-query-description> a sd:Service ;
>>      sd:endpoint <query-URI> .
>>
>>
>> And then change Example 7 to be an actual example:
>>
>>
>>
>> <http://example.com/prov/**service <http://example.com/prov/service>> a
>> prov:ServiceDescription;
>>      prov:describesService _:sparql .
>>
>> _:sparql a sd:Service ;
>>      sd:endpoint <http://example.com/prov/**sparql<http://example.com/prov/sparql>>
>> ;
>>      sd:supportedLanguage sd:SPARQL11Query .
>>
> Done.
>
>
>
>>
>> 17) example 8
>>
>>> @prefix prov:    <http://www.w3c.org/ns/prov#>
>>> @prefix dcterms: <http://purl.org/dc/terms/>
>>> @prefix foaf:    <http://xmlns.com/foaf/0.1/>
>>> @prefix sd:      <http://www.w3.org/ns/sparql-**service-description#<http://www.w3.org/ns/sparql-service-description#>
>>> >
>>>
>> A terminating . missing for each of these lines.
>>
> Fixed, thanks.
>
> (I also rechecked with W3C validator)
>
>
>
>>
>> 18)
>>      sd:resultFormat <http://www.w3.org/ns/formats/**RDF_XML<http://www.w3.org/ns/formats/RDF_XML>>
>> ,
>>                      <http://www.w3.org/ns/formats/**Turtle<http://www.w3.org/ns/formats/Turtle>>
>> ,
>>                      <http://www.w3.org/ns/formats/**SPARQL_Results_XML<http://www.w3.org/ns/formats/SPARQL_Results_XML>>
>> ,
>>                      <http://www.w3.org/ns/formats/**SPARQL_Results_JSON<http://www.w3.org/ns/formats/SPARQL_Results_JSON>>
>> ,
>>                      <http://www.w3.org/ns/formats/**SPARQL_Results_CSV<http://www.w3.org/ns/formats/SPARQL_Results_CSV>>
>> ,
>>                      <http://www.w3.org/ns/formats/**SPARQL_Results_TSV<http://www.w3.org/ns/formats/SPARQL_Results_TSV>
>> >
>>
>> Would this not read better with a prefix for formats?
>>
> I'm ambivalent about this.  If I were coding it for myself I might do as
> you suggest.
>
> In a general document like this, it's another thing that the reader has to
> cross-reference.
>
>
>> @prefix format: <http://www.w3.org/ns/formats/**> .
>>
>> # (..)
>>
>>      sd:resultFormat format:RDF_XML,
>>                      format:Turtle,
>>                      format:SPARQL_Results_XML,
>>                      format:SPARQL_Results_JSON,
>>                      format:SPARQL_Results_CSV,
>>                      format:SPARQL_Results_TSV .
>>
>>
>>
>> 19) 4.2 Direct HTTP query service invocation
>>
>> This section comes a bit abrupt.. did we not just finish reading about
>> this?
>>
>> I think 4.1.1 should be made smaller, and the main content moved down
>> to 4.2 with a forward reference. Some duplicate content may then have
>> to be removed. Now there is not a single reference forward to 4.2.
>>
>> Similarly, 4.2 should start with something like:
>>
>>  This section explains the mechanism of the prov:DirectQueryService
>>> introduced in section 4.1.1.
>>>
>> There are (at least) two points here.
>
> 1. You suggest moving the material about the service description forward
> into the section about service invocation.  I'm reluctant to do this -
> these are separate (though related) topics, but I accept the lack of a
> forward reference.
>
> 2. Section 4.2 starts abruptly - I accept this point.
>
> What I've done:
> - added forward reference from 4.1.1. to 4.2
> - moved the invocation example with {&steps} from 4.1.1 to 4.2
> - added an introductory paragraph to section 4.2 to explain what is
> coming, and attempted to make it clear that it related to the service
> description in 4.1.1
>
>
>
>
>>
>> 20)
>>
>>  Any server that implements this protocol and receives a request URI in
>>> this form
>>>
>> What does "in this form" mean? That I have to use ?target={uri}?
>> Please generalize or specify. (I would hope for the first).
>>
> I think this addresses your request:
>
> Any server that implements this protocol and receives a request URI in a
> form corresponding to its published URI template /SHOULD/ return a
> provenance record for the embedded target-URI. [...]
>
>
>
>>
>> 21)  the request URI corresponding to {var}
>>
>> Where does {var} come from? Change to {uri}? That is the only
>> parameter we define, right?
>>
> That's a bug.  Fixed. Thanks.
>
>
>  22)
>>
>> In the Note about {+uri}, I would suggest deleting:
>>
>>    To prevent this, '#' and '&' characters in the target-URI may be
>>> replaced with %23 and %26 respectively, before performing the URI template
>>> expansion.
>>>
>> As that sounds like overriding the URI template mechanism with a
>> custom escaping. We should better discourage this at all - the note is
>> just meant to explain why {+uri} is a bad idea. Most people - like me
>> - don't even know what {uri+} means and might instead interpret this
>> as "ALWAYS replace blablabla".
>>
> Done.
>
>
>> 23) Provenance pingback
>>
>> As you already know, I'm strongly in favour of keeping this section.
>>
> :)
>
>
>> 24)
>>
>>> These questions can be opened up to consider provenance information
>>> created by unrelated third parties, like:
>>>
>>> what new resources are based on this resource?
>>> what has this resource been used for?
>>> who has used it?
>>> what other resources are derived from the same sources as this resource?
>>> etc.
>>>
>>
>> remove "etc"
>>
> Done.
>
>
>>
>> 25)
>>
>>  To facilitate such cooperation, a resource publisher may receive
>>> "ping-back"s.
>>>
>> To explain terminology and set the context (it seemed from LPD that
>> some got confused), change to:
>>
>> "may receive provenance "ping-backs". The mechanism described here is
>> inspired by
>> <a href="http://www.hixie.ch/**specs/pingback/pingback<http://www.hixie.ch/specs/pingback/pingback>
>> ">blog
>> pingbacks</a>, but avoids the need for XML-RPC
>> and is specific for provenance records.
>>
> Done.
>
>
>> 26)
>>
>>  using a pingback link relation instead of has_provenance.
>>>
>> Change to
>>
>>  using a prov:pingback link relation instead of prov:has_provenance.
>>>
>> Done.
>
>  Also see previous comment about introducing #pingback in the section 3.
>>
> See earlier response.  Do we need to discuss?
>
>  27)
>>
>>> For example, consider a resource that is published by acme.example.com,
>>> and is subsequently used by wile-e.example.org in the construction of
>>> some new entity; we might see an exchange along the following lines.
>>>
>> I have previously commented that these hostnames and example URIs are
>> confusing - specially wile-e does not read well for anyone who did not
>> use to watch The Road Runner in English.
>>
> Oh, sorry, I must have misunderstood previously.  I thought I had
> responded to those concerns.
>
> I'm wary about changing the examples completely at this stage, because of
> the likelihood of  messing up the correspondences.  How much of your
> concern is addressed by s/wile-e/coyote/ (which I've done)?
>
>
>  28)
>>    S: Link: <http://acme.example.org/**super-widget/provenance<http://acme.example.org/super-widget/provenance>
>> >;
>>             rel=http://www.w3.org/ns/prov#**has_provenance<http://www.w3.org/ns/prov#has_provenance>
>>
>> Although it is technically correct that rel= don't need quotes - it is
>> confusing to introduce the no-quote version in section 5. For
>> consistency with the earlier sections, use quotes here (and below).
>>
> OK.
>
>
>> 29) a client MAY post a pingback request
>>
>> MAY -> may
>>
> Done.
>
>
>> 30) Example 10
>>
>>
>>  C: POST http://acme.example.org/super-**widget/pingback<http://acme.example.org/super-widget/pingback>HTTP/1.1
>>> ..
>>> S: 204 No Content
>>> S: Link: <http://acme.example.org/**super-widget/provenance<http://acme.example.org/super-widget/provenance>
>>> >;
>>>          rel=http://www.w3.org/ns/prov#**has_provenance<http://www.w3.org/ns/prov#has_provenance>
>>> ;
>>>          anchor="http://acme.example.**org/super-widget<http://acme.example.org/super-widget>
>>> "
>>>
>> Remove the Link here. It is confusing as #has_provenance beyond GET
>> and HEAD were not defined, we said in 3.1.
>>
> Good catch!  Done.
>
>  31)
>>
>>> The client may similarly include has_provenance links to specify
>>> provenance records with a different anchor.
>>> The provenance-URIs of those headers SHOULD  also be included in the
>>> content if the POSTed Content-type is text/uri-list.
>>>
>> Agree with the TODO - Drop this (but not the previous sentence) -
>> that's confusing when the Link: headers have a different anchor.
>>
>> Instead:
>>
>>
>>  The client may similarly include has_provenance links to specify
>>> provenance records when they have a different anchor, in which case those
>>> provenance-URIs SHOULD NOT be included in POSTed text/uri-list content.
>>>
>> Why the switch from SHOULD to SHOULD NOT?   I'd prefer to remain silent
> on this point.  I'll just drop it for now, but am open to persuasion.
>
> The previous sentence has already been re-phrased:
> [[
> The pingback client /MAY/ include extra |has_provenance| links to indicate
> provenance records related to a different resources, specified with
> correspondingly different anchor URIs.
> ]]
>
>
>>
>>  In the examples above, the pingback service responds with an empty
>>> response body, and links to provenance for the original resource. (Note
>>> that the Link: header returned contains an explicit anchor parameter with
>>> the URI of the original resource; without this, the link would relate the
>>> indicated URI to the pingback URI http://acme.example.org/super-**
>>> widget/pingback <http://acme.example.org/super-widget/pingback> rather
>>> than the original resource.)
>>>
>>
>> Change to:
>>
>>  In the examples above, the pingback service responds positively with 204
>>> No Content and an empty response body. HTTP statuses like 200 OK, 201
>>> Created, 202 Accepted, and 303 See Other might also be appropriate positive
>>> responses depending on the domain and application.
>>>
>> The original paragraph has been removed, but I've added your text to
> another paragraph, so we have:
>
> [[
> There is no required information in the server response to a pingback POST
> request. In the examples here, the pingback service responds positively
> with |204 No Content| and an empty response body. Other HTTP status values
> like |200 OK|, |201 Created|, |202 Accepted|, and |303 See Other| might
> also be appropriate positive responses depending on the domain and
> application.
>
> ]]
>
>
>> 32)
>>
>>  This leaves open a possibility that the pingback resource may have the
>>> same URI as the original resource, provided that the original does not
>>> respond to POST in some different way.
>>>
>> I think we should remove this, as some would read it as an suggestion
>> - but it would be a bit odd for a POST on any resource to be specific
>> for receiving *provenance* pingbacks.
>>
> OK, Done.
>
>
>>
>> 33)
>>
>>  Provenance may present a route for leakage of privacy
>>>
>>
>> I would add a paragraph below:
>>
>> The <a href="http://www.w3.org/**Protocols/rfc2616/rfc2616-**sec15.html<http://www.w3.org/Protocols/rfc2616/rfc2616-sec15.html>
>> ">HTTP
>> security considerations [RFC2616] generally apply for all of the
>> resources and services located through the mechanism in this document.
>> Implementations MAY choose to use standard HTTP authorization
>> mechanisms to restrict access to resources for instance using 401
>> Unauthorized, 403 Forbidden or 404 Not Found.
>>
> I've used that material, but split it up slightly
>
>
>> 34) Is CSRF a real threat here? How?
>>
>> Not CSRF within the PROV-AQ services, but it could be facilitating CSRF.
>>
>> Imagine there is a browser with a plugin that understands PROV-AQ links.
>>
>> A malicious server could post a link like this on an innocent looking
>> page about kittens:
>>
>> Link: <https://facebook.com/delete-**my-precious-kitten-images<https://facebook.com/delete-my-precious-kitten-images>
>> ;
>> rel="http://www.w3.org/ns/**prov#pingback<http://www.w3.org/ns/prov#pingback>"
>> />
>>
>>
>> The client might then be encourage to share this picture on Twitter;
>> the clever browser plugin faithfully POSTs to the #pingback to
>> register the derived tweet - but sadly in this case Facebook (which of
>> course you are always logged in to) thought this was a POST by
>> clicking on the button to delete all the kitten images.
>>
> That's a good observation.  I'll try and capture something of it.
>
>
>>
>> 35) https://dvcs.w3.org/hg/prov/**raw-file/default/paq/prov-aq.**ttl<https://dvcs.w3.org/hg/prov/raw-file/default/paq/prov-aq.ttl>
>>
>> some of the labels are in the wrong tense, like "hadAnchor" - and this
>> also needs to be updated to use the _underscore_style.
>>
>> What is :aq? This file contains various annotation properties that are
>> not relevant to prov-q.
>>
>> Many terms from table B are missing, for instance:
>> prov:ServiceDescription rov:DirectQueryService
>>
>>  There's a separate issue recorded to bring the .ttl file up to date.
>  For now, I'm focusing on the document.
>
> #g
> --
>
>


-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester

Received on Thursday, 11 April 2013 15:14:10 UTC