Re: Review of PROV-AQ from Graham Klyne on 2013-04-05 (public-prov-wg@w3.org from April 2013)

From: Graham Klyne <graham.klyne@zoo.ox.ac.uk>
Date: Fri, 05 Apr 2013 17:59:34 +0100
To: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
CC: W3C provenance WG <public-prov-wg@w3.org>
Message-ID: <515F02F6.7010509@zoo.ox.ac.uk>
On 03/04/2013 17:14, Stian Soiland-Reyes wrote:
> Below is my review of
> https://dvcs.w3.org/hg/prov/raw-file/fa9bac23203a/paq/prov-aq.html
> (as of 2013-04-03)
>
> I know I promised a non-evil review.. which I believe this is - it is
> unfortunately though still a bit long as now with a fresh eye I've
> identified some editorial issues. None of these are considered
> blocking.
Stian, many thanks for these.  There are many good comments and catches, most of 
which I've incorporated.  It would be great if you have a chance to eyeball the 
changes to see if they match your intent.

More detailed responses below.

>
>
> 1)
>> Status of This Document
>> This is the third public working.
> This would have to be updated to fourth or Final Note or whatever it is called.
I've removed that for now.  I think the correct wording can be sorted in the 
staging process when we have formal WG agreement to publish.

> This section should add something like:
>
>> This document is intended to be published as a W3C Note, not as a formal W3C Specification. For clarity, the document does however use the key words MUST, MUST NOT, REQUIRED, SHOULD, SHOULD NOT, RECOMMENDED, MAY and OPTIONAL as described in [RFC2119].
I've added something along those lines at the end of the introduction.

> 2)
>> A provenance record consumer will need to isolate information about the specific entity or entities of interest. These may be constrained resources identified by separate target-URIs than the original resource, in which case it will need to know about the target-URIs used. The mechanisms defined later allow a provider to expose such URIs.
> Confusing. Rewrite to something like:
>
> A provenance record consumer will need to isolate information about
> the specific entity or entities of interest. These may be constrained
> resources identified by target-URIs that differ from the resource URI,
> in which case the consumer needs to discover those target-URIs. The
> mechanisms defined later allow a provider to expose such URIs.
Thanks, that's better.
>
>
> 3)
>
>> Any resource that is described by some provenance - typically an entity (in the sense of [PROV-DM], but may be an activity).
> --> typically an entity (in the sense of [PROV-DM]), but may be of
> another type (such as [PROV-DM] activity).
>
Change applied.  But note that Luc has suggested removing this entry.

>
> 4)
>
> The mechanisms used with HTTP and HTML/RDF are slightly inconsistent
> in their approach to specifying target-URI values. In HTTP Link:
> headers, an optional anchor= parameter may be supplied for each such
> header. In HTML and RDF, separate #has_anchor relations are defined. I
>
>
> can we move this note down to 3.2? I've not seen any anchors yet!
Yes, it's done.  (Also added cross-ref from 3.3)

> 5)
>> This specification does not define
> --> This note
OK.
>
> 6)
>> The presence of a has_provenance link in an HTTP response does not preclude the possibility that other providers may offer provenance records about the same resource.
> --> that other providers may also offer ...
OK ("... also may offer")
>
> 7)
>> An example request including provenance headers in its response (..)
> --> An example HTTP response including provenance headers (..)
OK.
>
> 8)
>> There may be multiple has_query_service link header fields
> "may" -> "MAY"
OK.

>
> 9)
>    <html xmlns="http://www.w3.org/1999/xhtml">
>       <head>
>          <link rel="http://www.w3.org/ns/prov#has_provenance"
> href="provenance-URI">
>          <link rel="http://www.w3.org/ns/prov#has_anchor" href="target-URI">
>
> If this is meant to be XHTML, then the <link> should be terminated as:
>          <link rel="http://www.w3.org/ns/prov#has_provenance"
> href="provenance-URI" />
>          <link rel="http://www.w3.org/ns/prov#has_anchor" href="target-URI" />
>
> Due to the advent of HTML5, I would however simply remove the xmlns declaration.
OK.
>
> 10) HTML5 says that:
>
>> If the rel attribute is absent, has no keywords, or if none of the keywords used are allowed according to the definitions in this specification, then the element does not create any links.
>> Registration of relation types in HTTP Link: header fields is distinct from HTML link types, and thus their semantics can be different from same-named HTML types.
>> http://www.w3.org/TR/html5/document-metadata.html#the-link-element
> and:
>
>> Extensions to the predefined set of link types may be registered in the microformats wiki existing-rel-values page.
>> http://www.w3.org/TR/html5/links.html#other-link-types
> However none of the suggested rel's like
> http://www.w3.org/ns/prov#has_provenance have been registered at
> http://microformats.org/wiki/existing-rel-values#HTML5_link_type_extensions
>
> My suggestion is to add our relations here, with reference to the
> current working draft.
Where "here" is the microformats wiki?   I'll create an action to do this.

(Seems kinda dumb to me requiring registration of URI link relation types on a 
wiki, but that's HTML5 for you...)

>
>
> 11)  3.2 Resource represented as HTML
>
> Can these link types also be used with <a> and <area>? I assume a
> similar disclaimer note like in RDF statements with other subjects
> would be appropriate. For instance:
>
> The link relations
> <code>http://www.w3.org/ns/prov#has_provenance</code>,
> <code>http://www.w3.org/ns/prov#has_anchor</code> and
> <code>http://www.w3.org/ns/prov#has_query_service</code> may be also
> used on <a> and <area> links witin the HTML body, but discussion of
> such use is beyond the scope of this document.
I'm less inclined to do this.  With RDF, it's a fairly obvious thing to want to 
do, so it seems worth noting that such use is OK.  With HTML5, such use seems 
quite marginal so I wonder if making mention of it may be more confusing than 
helpful.

I'm not set against this, I just am not seeing the point.  Persuade me otherwise?

>
> 12) 3.1 / 3.2 / 3.3 does not mention pingback
>
> If we keep section 5 on Provenance pingback (which I vote for), then
> it should be included under 3.1/3.2/3.3, just like the provenance
> query service.
There seem to be two issues here.

(a) with respect to section 3.1 (HTTP), what can we say here that doesn't 
duplicate material in section 5?

(b) with respect to 3.2 and 3.3, it's not clear to me that we should be 
encouraging publication of pingback URIs in documents.

I also think that keeping the pingpack separate keeps the other material less 
complicated.

What are the specific advantages or benefits you see compared with the present 
organization?

>
> 13)
>
>> It allows for accessing provenance about a specified target-URI. The query URI to use is described by a URI Template [URI-template] (level 2 or above) in which which the variable uri stands for the target-URI; e.g.
>>
>> @prefix prov: <http://www.w3c.org/ns/prov#>
>> <direct-query-description> a prov:DirectQueryService ;
>>   prov:provenanceUriTemplate "query-URI?target={+uri}" .
> This is an example, but it is not styled as one, and does not use real URIs.
>
> I don't think we should suggest a "base URI" for the query service, as
> there is no requirement in URI templates to use the ?query mechanism -
> and neither do we want to mandate ?target=.
>
>
> Change it to something like:
>
>> It allows for accessing provenance about a specified target-URI. The query URI to use is described by a URI Template [URI-template] (level 2 or above) in which which the variable uri stands for the target-URI. The URI template is specified as:
>> <direct-query-description> a prov:DirectQueryService ;
>>   prov:provenanceUriTemplate "uri-template" .
>>
>> where direct-query-description is any distinct RDF subject node (i.e. a blank node or a URI) and
>> uri-template is an URI template [RFC3986].

Yes, that's better, thanks.

> (with italics on direct-query-description and uri-template. )
>
>
> Then before "MAY recognize additional parameters", insert a new example:
>
> _:direct a prov:DirectQueryService ;
>    prov:provenanceUriTemplate
>      "http://www.example.com/provenance/service?target={+uri}" .
Done.

> 14) Example 7
> sparql-query-description a sd:Service ;
>      sd:endpoint <query-URI/sparql/> ;
>      sd:supportedLanguage sd:SPARQL11Query .
>
> Here query-URI is in italics - I am confused if this is an example or
> a pattern. Am I required to use /sparql postfix? Also
> "sparql-query-description" is not valid Turtle.
>
> I would recommend instead to make it match <sparql-query-description>
> from earlier, and to drop /sparql:
>
> <sparql-query-description> a sd:Service ;
>      sd:endpoint <query-URI> ;
>      sd:supportedLanguage sd:SPARQL11Query .
I've gone the other way and made it a pure example.  In this case, the generic 
description is in the [SPARQL-SD] specification.

>
> 15) Reference for SPARQL-SD says "Work in progress" - but in fact it
> has just become "W3C Recommendation 21 March 2013"
Updated.


>
> 16)
>
>> The SPARQL service description may be detailed or sparse, provided that it includes at a minimum the following:
>>
>> sparql-query-description a sd:Service ;
>>      sd:endpoint <(SPARQL service endpoint URI reference)> .
>
> This is another syntax for insert-your-URI-here that I have not
> encountered before in the document.  I would move this definition to
> ABOVE Example 7 (which is playing double role at the moment as both
> example and definition), and change it to:
>
> <sparql-query-description> a sd:Service ;
>      sd:endpoint <query-URI> .
>
>
> And then change Example 7 to be an actual example:
>
>
>
> <http://example.com/prov/service> a prov:ServiceDescription;
>      prov:describesService _:sparql .
>
> _:sparql a sd:Service ;
>      sd:endpoint <http://example.com/prov/sparql> ;
>      sd:supportedLanguage sd:SPARQL11Query .
Done.

>
>
> 17) example 8
>> @prefix prov:    <http://www.w3c.org/ns/prov#>
>> @prefix dcterms: <http://purl.org/dc/terms/>
>> @prefix foaf:    <http://xmlns.com/foaf/0.1/>
>> @prefix sd:      <http://www.w3.org/ns/sparql-service-description#>
> A terminating . missing for each of these lines.
Fixed, thanks.

(I also rechecked with W3C validator)

>
>
> 18)
>      sd:resultFormat <http://www.w3.org/ns/formats/RDF_XML> ,
>                      <http://www.w3.org/ns/formats/Turtle> ,
>                      <http://www.w3.org/ns/formats/SPARQL_Results_XML> ,
>                      <http://www.w3.org/ns/formats/SPARQL_Results_JSON> ,
>                      <http://www.w3.org/ns/formats/SPARQL_Results_CSV> ,
>                      <http://www.w3.org/ns/formats/SPARQL_Results_TSV>
>
> Would this not read better with a prefix for formats?
I'm ambivalent about this.  If I were coding it for myself I might do as you 
suggest.

In a general document like this, it's another thing that the reader has to 
cross-reference.
>
> @prefix format: <http://www.w3.org/ns/formats/> .
>
> # (..)
>
>      sd:resultFormat format:RDF_XML,
>                      format:Turtle,
>                      format:SPARQL_Results_XML,
>                      format:SPARQL_Results_JSON,
>                      format:SPARQL_Results_CSV,
>                      format:SPARQL_Results_TSV .
>
>
>
> 19) 4.2 Direct HTTP query service invocation
>
> This section comes a bit abrupt.. did we not just finish reading about this?
>
> I think 4.1.1 should be made smaller, and the main content moved down
> to 4.2 with a forward reference. Some duplicate content may then have
> to be removed. Now there is not a single reference forward to 4.2.
>
> Similarly, 4.2 should start with something like:
>
>> This section explains the mechanism of the prov:DirectQueryService introduced in section 4.1.1.
There are (at least) two points here.

1. You suggest moving the material about the service description forward into 
the section about service invocation.  I'm reluctant to do this - these are 
separate (though related) topics, but I accept the lack of a forward reference.

2. Section 4.2 starts abruptly - I accept this point.

What I've done:
- added forward reference from 4.1.1. to 4.2
- moved the invocation example with {&steps} from 4.1.1 to 4.2
- added an introductory paragraph to section 4.2 to explain what is coming, and 
attempted to make it clear that it related to the service description in 4.1.1


>
>
> 20)
>
>> Any server that implements this protocol and receives a request URI in this form
> What does "in this form" mean? That I have to use ?target={uri}?
> Please generalize or specify. (I would hope for the first).
I think this addresses your request:

Any server that implements this protocol and receives a request URI in a form 
corresponding to its published URI template /SHOULD/ return a provenance record 
for the embedded target-URI. [...]

>
>
> 21)  the request URI corresponding to {var}
>
> Where does {var} come from? Change to {uri}? That is the only
> parameter we define, right?
That's a bug.  Fixed. Thanks.

> 22)
>
> In the Note about {+uri}, I would suggest deleting:
>
>>   To prevent this, '#' and '&' characters in the target-URI may be replaced with %23 and %26 respectively, before performing the URI template expansion.
> As that sounds like overriding the URI template mechanism with a
> custom escaping. We should better discourage this at all - the note is
> just meant to explain why {+uri} is a bad idea. Most people - like me
> - don't even know what {uri+} means and might instead interpret this
> as "ALWAYS replace blablabla".
Done.
>
> 23) Provenance pingback
>
> As you already know, I'm strongly in favour of keeping this section.
:)
>
> 24)
>> These questions can be opened up to consider provenance information created by unrelated third parties, like:
>>
>> what new resources are based on this resource?
>> what has this resource been used for?
>> who has used it?
>> what other resources are derived from the same sources as this resource?
>> etc.
>
> remove "etc"
Done.
>
>
> 25)
>
>> To facilitate such cooperation, a resource publisher may receive "ping-back"s.
> To explain terminology and set the context (it seemed from LPD that
> some got confused), change to:
>
> "may receive provenance "ping-backs". The mechanism described here is
> inspired by
> <a href="http://www.hixie.ch/specs/pingback/pingback">blog
> pingbacks</a>, but avoids the need for XML-RPC
> and is specific for provenance records.
Done.
>
> 26)
>
>> using a pingback link relation instead of has_provenance.
> Change to
>
>> using a prov:pingback link relation instead of prov:has_provenance.
Done.
> Also see previous comment about introducing #pingback in the section 3.
See earlier response.  Do we need to discuss?
> 27)
>> For example, consider a resource that is published by acme.example.com, and is subsequently used by wile-e.example.org in the construction of some new entity; we might see an exchange along the following lines.
> I have previously commented that these hostnames and example URIs are
> confusing - specially wile-e does not read well for anyone who did not
> use to watch The Road Runner in English.
Oh, sorry, I must have misunderstood previously.  I thought I had responded to 
those concerns.

I'm wary about changing the examples completely at this stage, because of the 
likelihood of  messing up the correspondences.  How much of your concern is 
addressed by s/wile-e/coyote/ (which I've done)?

> 28)
>    S: Link: <http://acme.example.org/super-widget/provenance>;
>             rel=http://www.w3.org/ns/prov#has_provenance
>
> Although it is technically correct that rel= don't need quotes - it is
> confusing to introduce the no-quote version in section 5. For
> consistency with the earlier sections, use quotes here (and below).
OK.
>
> 29) a client MAY post a pingback request
>
> MAY -> may
Done.
>
> 30) Example 10
>
>
>> C: POST http://acme.example.org/super-widget/pingback HTTP/1.1
>> ..
>> S: 204 No Content
>> S: Link: <http://acme.example.org/super-widget/provenance>;
>>          rel=http://www.w3.org/ns/prov#has_provenance;
>>          anchor="http://acme.example.org/super-widget"
> Remove the Link here. It is confusing as #has_provenance beyond GET
> and HEAD were not defined, we said in 3.1.
Good catch!  Done.
> 31)
>> The client may similarly include has_provenance links to specify provenance records with a different anchor.
>> The provenance-URIs of those headers SHOULD  also be included in the content if the POSTed Content-type is text/uri-list.
> Agree with the TODO - Drop this (but not the previous sentence) -
> that's confusing when the Link: headers have a different anchor.
>
> Instead:
>
>
>> The client may similarly include has_provenance links to specify provenance records when they have a different anchor, in which case those provenance-URIs SHOULD NOT be included in POSTed text/uri-list content.
Why the switch from SHOULD to SHOULD NOT?   I'd prefer to remain silent on this 
point.  I'll just drop it for now, but am open to persuasion.

The previous sentence has already been re-phrased:
[[
The pingback client /MAY/ include extra |has_provenance| links to indicate 
provenance records related to a different resources, specified with 
correspondingly different anchor URIs.
]]
>
>
>> In the examples above, the pingback service responds with an empty response body, and links to provenance for the original resource. (Note that the Link: header returned contains an explicit anchor parameter with the URI of the original resource; without this, the link would relate the indicated URI to the pingback URI http://acme.example.org/super-widget/pingback rather than the original resource.)
>
> Change to:
>
>> In the examples above, the pingback service responds positively with 204 No Content and an empty response body. HTTP statuses like 200 OK, 201 Created, 202 Accepted, and 303 See Other might also be appropriate positive responses depending on the domain and application.
The original paragraph has been removed, but I've added your text to another 
paragraph, so we have:

[[
There is no required information in the server response to a pingback POST 
request. In the examples here, the pingback service responds positively with 
|204 No Content| and an empty response body. Other HTTP status values like |200 
OK|, |201 Created|, |202 Accepted|, and |303 See Other| might also be 
appropriate positive responses depending on the domain and application.
]]

>
> 32)
>
>> This leaves open a possibility that the pingback resource may have the same URI as the original resource, provided that the original does not respond to POST in some different way.
> I think we should remove this, as some would read it as an suggestion
> - but it would be a bit odd for a POST on any resource to be specific
> for receiving *provenance* pingbacks.
OK, Done.
>
>
> 33)
>
>> Provenance may present a route for leakage of privacy
>
> I would add a paragraph below:
>
> The <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec15.html">HTTP
> security considerations [RFC2616] generally apply for all of the
> resources and services located through the mechanism in this document.
> Implementations MAY choose to use standard HTTP authorization
> mechanisms to restrict access to resources for instance using 401
> Unauthorized, 403 Forbidden or 404 Not Found.
I've used that material, but split it up slightly
>
> 34) Is CSRF a real threat here? How?
>
> Not CSRF within the PROV-AQ services, but it could be facilitating CSRF.
>
> Imagine there is a browser with a plugin that understands PROV-AQ links.
>
> A malicious server could post a link like this on an innocent looking
> page about kittens:
>
> Link: <https://facebook.com/delete-my-precious-kitten-images;
> rel="http://www.w3.org/ns/prov#pingback" />
>
>
> The client might then be encourage to share this picture on Twitter;
> the clever browser plugin faithfully POSTs to the #pingback to
> register the derived tweet - but sadly in this case Facebook (which of
> course you are always logged in to) thought this was a POST by
> clicking on the button to delete all the kitten images.
That's a good observation.  I'll try and capture something of it.
>
>
> 35) https://dvcs.w3.org/hg/prov/raw-file/default/paq/prov-aq.ttl
>
> some of the labels are in the wrong tense, like "hadAnchor" - and this
> also needs to be updated to use the _underscore_style.
>
> What is :aq? This file contains various annotation properties that are
> not relevant to prov-q.
>
> Many terms from table B are missing, for instance:
> prov:ServiceDescription rov:DirectQueryService
>
There's a separate issue recorded to bring the .ttl file up to date.  For now, 
I'm focusing on the document.

#g
--
Received on Friday, 5 April 2013 17:24:16 UTC