Review of PROV-AQ from Stian Soiland-Reyes on 2013-04-03 (public-prov-wg@w3.org from April 2013)

From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Date: Wed, 3 Apr 2013 17:14:12 +0100
To: public-prov-wg@w3.org
Message-ID: <CAPRnXt=shiE5fSf1x0ftHPqniyq=McW1SWxU8mm=Q25B72eUxA@mail.gmail.com>
Below is my review of
https://dvcs.w3.org/hg/prov/raw-file/fa9bac23203a/paq/prov-aq.html
(as of 2013-04-03)

I know I promised a non-evil review.. which I believe this is - it is
unfortunately though still a bit long as now with a fresh eye I've
identified some editorial issues. None of these are considered
blocking.



1)
> Status of This Document
> This is the third public working.

This would have to be updated to fourth or Final Note or whatever it is called.

This section should add something like:

> This document is intended to be published as a W3C Note, not as a formal W3C Specification. For clarity, the document does however use the key words MUST, MUST NOT, REQUIRED, SHOULD, SHOULD NOT, RECOMMENDED, MAY and OPTIONAL as described in [RFC2119].





2)
> A provenance record consumer will need to isolate information about the specific entity or entities of interest. These may be constrained resources identified by separate target-URIs than the original resource, in which case it will need to know about the target-URIs used. The mechanisms defined later allow a provider to expose such URIs.

Confusing. Rewrite to something like:

A provenance record consumer will need to isolate information about
the specific entity or entities of interest. These may be constrained
resources identified by target-URIs that differ from the resource URI,
in which case the consumer needs to discover those target-URIs. The
mechanisms defined later allow a provider to expose such URIs.


3)

> Any resource that is described by some provenance - typically an entity (in the sense of [PROV-DM], but may be an activity).

--> typically an entity (in the sense of [PROV-DM]), but may be of
another type (such as [PROV-DM] activity).



4)

The mechanisms used with HTTP and HTML/RDF are slightly inconsistent
in their approach to specifying target-URI values. In HTTP Link:
headers, an optional anchor= parameter may be supplied for each such
header. In HTML and RDF, separate #has_anchor relations are defined. I


can we move this note down to 3.2? I've not seen any anchors yet!


5)
> This specification does not define
--> This note


6)
> The presence of a has_provenance link in an HTTP response does not preclude the possibility that other providers may offer provenance records about the same resource.
--> that other providers may also offer ...


7)
> An example request including provenance headers in its response (..)
--> An example HTTP response including provenance headers (..)


8)
> There may be multiple has_query_service link header fields

"may" -> "MAY"

9)
  <html xmlns="http://www.w3.org/1999/xhtml">
     <head>
        <link rel="http://www.w3.org/ns/prov#has_provenance"
href="provenance-URI">
        <link rel="http://www.w3.org/ns/prov#has_anchor" href="target-URI">

If this is meant to be XHTML, then the <link> should be terminated as:
        <link rel="http://www.w3.org/ns/prov#has_provenance"
href="provenance-URI" />
        <link rel="http://www.w3.org/ns/prov#has_anchor" href="target-URI" />

Due to the advent of HTML5, I would however simply remove the xmlns declaration.


10) HTML5 says that:

> If the rel attribute is absent, has no keywords, or if none of the keywords used are allowed according to the definitions in this specification, then the element does not create any links.
> Registration of relation types in HTTP Link: header fields is distinct from HTML link types, and thus their semantics can be different from same-named HTML types.
> http://www.w3.org/TR/html5/document-metadata.html#the-link-element

and:

> Extensions to the predefined set of link types may be registered in the microformats wiki existing-rel-values page.
> http://www.w3.org/TR/html5/links.html#other-link-types

However none of the suggested rel's like
http://www.w3.org/ns/prov#has_provenance have been registered at
http://microformats.org/wiki/existing-rel-values#HTML5_link_type_extensions

My suggestion is to add our relations here, with reference to the
current working draft.



11)  3.2 Resource represented as HTML

Can these link types also be used with <a> and <area>? I assume a
similar disclaimer note like in RDF statements with other subjects
would be appropriate. For instance:

The link relations
<code>http://www.w3.org/ns/prov#has_provenance</code>,
<code>http://www.w3.org/ns/prov#has_anchor</code> and
<code>http://www.w3.org/ns/prov#has_query_service</code> may be also
used on <a> and <area> links witin the HTML body, but discussion of
such use is beyond the scope of this document.


12) 3.1 / 3.2 / 3.3 does not mention pingback

If we keep section 5 on Provenance pingback (which I vote for), then
it should be included under 3.1/3.2/3.3, just like the provenance
query service.


13)

> It allows for accessing provenance about a specified target-URI. The query URI to use is described by a URI Template [URI-template] (level 2 or above) in which which the variable uri stands for the target-URI; e.g.
>
> @prefix prov: <http://www.w3c.org/ns/prov#>
> <direct-query-description> a prov:DirectQueryService ;
>  prov:provenanceUriTemplate "query-URI?target={+uri}" .

This is an example, but it is not styled as one, and does not use real URIs.

I don't think we should suggest a "base URI" for the query service, as
there is no requirement in URI templates to use the ?query mechanism -
and neither do we want to mandate ?target=.


Change it to something like:

> It allows for accessing provenance about a specified target-URI. The query URI to use is described by a URI Template [URI-template] (level 2 or above) in which which the variable uri stands for the target-URI. The URI template is specified as:

> <direct-query-description> a prov:DirectQueryService ;
>  prov:provenanceUriTemplate "uri-template" .
>
> where direct-query-description is any distinct RDF subject node (i.e. a blank node or a URI) and
> uri-template is an URI template [RFC3986].

(with italics on direct-query-description and uri-template. )


Then before "MAY recognize additional parameters", insert a new example:

_:direct a prov:DirectQueryService ;
  prov:provenanceUriTemplate
    "http://www.example.com/provenance/service?target={+uri}" .




14) Example 7
sparql-query-description a sd:Service ;
    sd:endpoint <query-URI/sparql/> ;
    sd:supportedLanguage sd:SPARQL11Query .

Here query-URI is in italics - I am confused if this is an example or
a pattern. Am I required to use /sparql postfix? Also
"sparql-query-description" is not valid Turtle.

I would recommend instead to make it match <sparql-query-description>
from earlier, and to drop /sparql:

<sparql-query-description> a sd:Service ;
    sd:endpoint <query-URI> ;
    sd:supportedLanguage sd:SPARQL11Query .


15) Reference for SPARQL-SD says "Work in progress" - but in fact it
has just become "W3C Recommendation 21 March 2013"


16)

> The SPARQL service description may be detailed or sparse, provided that it includes at a minimum the following:
>
> sparql-query-description a sd:Service ;
>     sd:endpoint <(SPARQL service endpoint URI reference)> .


This is another syntax for insert-your-URI-here that I have not
encountered before in the document.  I would move this definition to
ABOVE Example 7 (which is playing double role at the moment as both
example and definition), and change it to:

<sparql-query-description> a sd:Service ;
    sd:endpoint <query-URI> .


And then change Example 7 to be an actual example:



<http://example.com/prov/service> a prov:ServiceDescription;
    prov:describesService _:sparql .

_:sparql a sd:Service ;
    sd:endpoint <http://example.com/prov/sparql> ;
    sd:supportedLanguage sd:SPARQL11Query .



17) example 8
> @prefix prov:    <http://www.w3c.org/ns/prov#>
> @prefix dcterms: <http://purl.org/dc/terms/>
> @prefix foaf:    <http://xmlns.com/foaf/0.1/>
> @prefix sd:      <http://www.w3.org/ns/sparql-service-description#>

A terminating . missing for each of these lines.


18)
    sd:resultFormat <http://www.w3.org/ns/formats/RDF_XML> ,
                    <http://www.w3.org/ns/formats/Turtle> ,
                    <http://www.w3.org/ns/formats/SPARQL_Results_XML> ,
                    <http://www.w3.org/ns/formats/SPARQL_Results_JSON> ,
                    <http://www.w3.org/ns/formats/SPARQL_Results_CSV> ,
                    <http://www.w3.org/ns/formats/SPARQL_Results_TSV>

Would this not read better with a prefix for formats?


@prefix format: <http://www.w3.org/ns/formats/> .

# (..)

    sd:resultFormat format:RDF_XML,
                    format:Turtle,
                    format:SPARQL_Results_XML,
                    format:SPARQL_Results_JSON,
                    format:SPARQL_Results_CSV,
                    format:SPARQL_Results_TSV .



19) 4.2 Direct HTTP query service invocation

This section comes a bit abrupt.. did we not just finish reading about this?

I think 4.1.1 should be made smaller, and the main content moved down
to 4.2 with a forward reference. Some duplicate content may then have
to be removed. Now there is not a single reference forward to 4.2.

Similarly, 4.2 should start with something like:

> This section explains the mechanism of the prov:DirectQueryService introduced in section 4.1.1.



20)

> Any server that implements this protocol and receives a request URI in this form

What does "in this form" mean? That I have to use ?target={uri}?
Please generalize or specify. (I would hope for the first).


21)  the request URI corresponding to {var}

Where does {var} come from? Change to {uri}? That is the only
parameter we define, right?


22)

In the Note about {+uri}, I would suggest deleting:

>  To prevent this, '#' and '&' characters in the target-URI may be replaced with %23 and %26 respectively, before performing the URI template expansion.

As that sounds like overriding the URI template mechanism with a
custom escaping. We should better discourage this at all - the note is
just meant to explain why {+uri} is a bad idea. Most people - like me
- don't even know what {uri+} means and might instead interpret this
as "ALWAYS replace blablabla".


23) Provenance pingback

As you already know, I'm strongly in favour of keeping this section.


24)
> These questions can be opened up to consider provenance information created by unrelated third parties, like:
>
> what new resources are based on this resource?
> what has this resource been used for?
> who has used it?
> what other resources are derived from the same sources as this resource?
> etc.


remove "etc"


25)

> To facilitate such cooperation, a resource publisher may receive "ping-back"s.

To explain terminology and set the context (it seemed from LPD that
some got confused), change to:

"may receive provenance "ping-backs". The mechanism described here is
inspired by
<a href="http://www.hixie.ch/specs/pingback/pingback">blog
pingbacks</a>, but avoids the need for XML-RPC
and is specific for provenance records.


26)

> using a pingback link relation instead of has_provenance.

Change to

> using a prov:pingback link relation instead of prov:has_provenance.

Also see previous comment about introducing #pingback in the section 3.


27)
> For example, consider a resource that is published by acme.example.com, and is subsequently used by wile-e.example.org in the construction of some new entity; we might see an exchange along the following lines.

I have previously commented that these hostnames and example URIs are
confusing - specially wile-e does not read well for anyone who did not
use to watch The Road Runner in English.



28)
  S: Link: <http://acme.example.org/super-widget/provenance>;
           rel=http://www.w3.org/ns/prov#has_provenance

Although it is technically correct that rel= don't need quotes - it is
confusing to introduce the no-quote version in section 5. For
consistency with the earlier sections, use quotes here (and below).


29) a client MAY post a pingback request

MAY -> may


30) Example 10


> C: POST http://acme.example.org/super-widget/pingback HTTP/1.1
> ..
> S: 204 No Content
> S: Link: <http://acme.example.org/super-widget/provenance>;
>         rel=http://www.w3.org/ns/prov#has_provenance;
>         anchor="http://acme.example.org/super-widget"

Remove the Link here. It is confusing as #has_provenance beyond GET
and HEAD were not defined, we said in 3.1.



31)
> The client may similarly include has_provenance links to specify provenance records with a different anchor.
> The provenance-URIs of those headers SHOULD  also be included in the content if the POSTed Content-type is text/uri-list.

Agree with the TODO - Drop this (but not the previous sentence) -
that's confusing when the Link: headers have a different anchor.

Instead:


> The client may similarly include has_provenance links to specify provenance records when they have a different anchor, in which case those provenance-URIs SHOULD NOT be included in POSTed text/uri-list content.



> In the examples above, the pingback service responds with an empty response body, and links to provenance for the original resource. (Note that the Link: header returned contains an explicit anchor parameter with the URI of the original resource; without this, the link would relate the indicated URI to the pingback URI http://acme.example.org/super-widget/pingback rather than the original resource.)


Change to:

> In the examples above, the pingback service responds positively with 204 No Content and an empty response body. HTTP statuses like 200 OK, 201 Created, 202 Accepted, and 303 See Other might also be appropriate positive responses depending on the domain and application.


32)

> This leaves open a possibility that the pingback resource may have the same URI as the original resource, provided that the original does not respond to POST in some different way.

I think we should remove this, as some would read it as an suggestion
- but it would be a bit odd for a POST on any resource to be specific
for receiving *provenance* pingbacks.



33)

> Provenance may present a route for leakage of privacy


I would add a paragraph below:

The <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec15.html">HTTP
security considerations [RFC2616] generally apply for all of the
resources and services located through the mechanism in this document.
Implementations MAY choose to use standard HTTP authorization
mechanisms to restrict access to resources for instance using 401
Unauthorized, 403 Forbidden or 404 Not Found.



34) Is CSRF a real threat here? How?

Not CSRF within the PROV-AQ services, but it could be facilitating CSRF.

Imagine there is a browser with a plugin that understands PROV-AQ links.

A malicious server could post a link like this on an innocent looking
page about kittens:

Link: <https://facebook.com/delete-my-precious-kitten-images;
rel="http://www.w3.org/ns/prov#pingback" />


The client might then be encourage to share this picture on Twitter;
the clever browser plugin faithfully POSTs to the #pingback to
register the derived tweet - but sadly in this case Facebook (which of
course you are always logged in to) thought this was a POST by
clicking on the button to delete all the kitten images.


35) https://dvcs.w3.org/hg/prov/raw-file/default/paq/prov-aq.ttl

some of the labels are in the wrong tense, like "hadAnchor" - and this
also needs to be updated to use the _underscore_style.

What is :aq? This file contains various annotation properties that are
not relevant to prov-q.

Many terms from table B are missing, for instance:
prov:ServiceDescription rov:DirectQueryService










-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester
Received on Wednesday, 3 April 2013 16:15:01 UTC