Re: partial review from Phil Archer on 2016-04-19 (public-dwbp-wg@w3.org from April 2016)

From: Phil Archer <phila@w3.org>
Date: Tue, 19 Apr 2016 12:53:08 +0100
To: Annette Greiner <amgreiner@lbl.gov>, DWBP Public List <public-dwbp-wg@w3.org>
Message-ID: <57161C24.6080401@w3.org>
OK, coming back to this...

>
> On 15/04/2016 03:12, Annette Greiner wrote:
[..]

>>
>> 10. Persistent URIs as identifiers
>> --

>>
>> The example uses the city domain instead of the transport agency's
>> domain, which is not realistic for a large city. The agency domain is
>> likely to persist as long as the information it makes available is
>> relevant. Try Googling "transit agency" and see what comes up for domain
>> names. The issue depends on how stable the transit service is. For a
>> small town, the transit function might not be given over to a separate
>> agency, and the guidance would be right, but for a big city, where the
>> transit function is run by an independent agency, it's not realistic.

I've no doubt that's the case, but we do need to get people away from 
putting agency names in domain names if they are to persist. Transport 
agencies are forever changing their name. UK gov provides an example of 
domain names tied to function rather than agency names.

>>
>> The example is rather redundant. It is data.mycity..., and yet /dataset
>> also appears in the path. The path also contains /bus as well as
>> /bus-stops. It's unlikely that the agency has so many transit modes that
>> they need to be split between road and rail and water.

True, but a government will. And a national science data service (like 
Australia's ANDS) covers multiple disciplines.

  The same info is
>> conveyed as well by the much shorter
>> http://data.mycitytransit.example.org/bus/stops

Hmm, maybe.

http://data.mycity.example.org/id/bus/stops/01
would be an ID for a bus stop, by which I mean the physical thing.

http://data.mycity.example.org/doc/bus/stops/01

would be the ID for a document about the bus stop.

http://data.mycity.example.org/dataset/bus/stops/

Is the ID for a dataset describing all the bus stops.

So I'd say that, OK, we could possibly remove the /public-transport path 
segment, and /bus/, but the shorter the URI, the more chance there is of 
it being a problem when a new dataset comes along.

The URIs from this BP are used throughout the doc so changing them would 
mean changing them all. Search and replace could take care of a lot of 
that but there may be awkward edge cases in the doc.

I leave it to the editors to decide whether to use the shorter URIs.


>>
>> We say "Ideally, the relevant Web site includes a description of the
>> process..." I think we mean a controlled scheme.

In context, I'd say the current text is OK:

"Check that each dataset in question is identified using a URI that has 
been assigned under a controlled process as set out in the previous 
section. Ideally, the relevant Web site includes a description of the 
process and a credible pledge of persistence should the publisher no 
longer be able to maintain the URI space themselves."



>>
>>
>> 11. Persistent URIs within datasets
>> --
>>
>> The word "affordances" is misused. Affordances are how we know what
>> something is intended to do, not what the thing does. Affordances do not
>> act on things, they inform.

That's what comes of me writing text about something that, as you 
instantly notice, I don't know a lot about.

We could remove the word affordance altogether like:

"These ideas are at the heart of the 5 Stars of Linked Data where one 
data point links to another, and of Hypermedia where links may be to 
further data or to services that can act on or relate to the data in 
some way."

>>
>> The intended outcome should be a free-standing piece of text. Starting
>> with "that one item" is confusing.

OK, I have removed the word 'that' so it just reads:

One data item can be related to others across the Web creating a global 
information space accessible to humans and machines alike.


>>
>> Much of the implementation section is about minting new URIs, which is
>> the subject of the previous BP. It is off topic here. Everything from
>> "If you can't find an existing set of identifiers that meet your needs,
>> you'll need to create your own" down to the end of the example doesn't
>> belong in a BP that is about using other people's identifiers.

Hmm, I disagree a little. The first BP is about persistent URIs for 
datasets, the second about persistent URIs within the data itself. It 
talks about using other people's sets for obvious things and then goes 
in to aspects of URI design that are not covered, or relevant, to the 
previous BP. There's info in that second one that I wouldn't like to see 
lost but, since I wrote it, I am too close to be a good judge.


>>
>> The last paragraph of the example is almost exactly the same as the last
>> paragraph before the example.

Correct. I have deleted it in my native speaker review copy.

>>
>>
>> 12.  URIs for versions and series
>> --

My suggested rewording for the intended outcome:

To enable references to a specific version of a dataset and to concepts 
such as a 'dataset series' and 'the latest version'.


>>
>> This BP is confusing two issues. One is the use of a shorter URI for the
>> latest version of a dataset while also assigning a version-specific URI
>> for it. The other issue is making a landing page for a collection of
>> datasets. The initial intent was the former.

I don't see any reference to landing pages for collections of datasets. 
I do think that the example could be improved slightly though, like this:

<p>Suppose that a new bus stop is created. To keep 
<code>bus-stops-2015-05-05 </code> up to date, a new version of the 
dataset (<code>bus-stops-2015-12-17</code>) is created. 
<code>bus-stops-2015-12-17 </code> includes all the data from 
<code>bus-stops-2015-05-05 </code> plus the data about the new bus stop. 
The two versions can be identified by the following URIs: </p>
<p><code>http://data.mycity.example.com/public-transport/road/bus/dataset/bus-stops-2015-05-05</code> 
is the versioned URI of the first version of the dataset</p>
<p><code>http://data.mycity.example.com/public-transport/road/bus/dataset/bus-stops-2015-12-17</code> 
is the version URI of the updated version of the dataset</p>
<p><code>http://data.mycity.example.com/public-transport/road/bus/dataset/bus-stops</code> 
always resolves to the latest version so it pointed to  resolved to 
<code>bus-stops-2015-05-05</code> <em>until</em> 17 December 2015 when 
the server configuration was updated to point that URL to 
<code>bus-stops-2015-12-17</code>.</p>

>>
>> The examples in the Why aren't series or groups except for the first
>> item, yet they are introduced as examples of series or groups.

True, I offer this as a better alternative:

<ul>
   <li>bus stops in my city (that change over time);</li>
   <li>a list of elected officials in My City</li>
   <li>evolving versions of a document through to completion.</li>
</ul>

I suggest this sentence " <p>In different circumstances, it will be 
appropriate to refer
               separately to each of these examples (and many like 
them). </p>" is replaced with

<p>In different circumstances, it will be appropriate to refer to the 
current situation (the current set of bus stops, the current elected 
officials etc.). In others, it may be appropriate to refer to the 
situation as it exists/existed at a specific time.</p>

>>
>> How to Test says to check "that logical groups of datasets are also
>> identifiable." That is vague. It should say "that a URI is also provided
>> for the latest version or most recent real-time value."

I would phrase it as:

Check that each version of a dataset has its own URI, and that there is 
also a 'latest version' URI.


>>
>> I don't think this applies to time series. What we're talking about here
>> is use of dates for version identifiers.
>>
>> The example is incomplete; it doesn't say what the latest version URI
>> would be.
>>

Yep, that's what I fixed above.

OK, I think we're done with this round.

Thanks again Annette - that kind of careful review is critical.

Phil.

-- 


Phil Archer
W3C Data Activity Lead
http://www.w3.org/2013/data/

http://philarcher.org
+44 (0)7887 767755
@philarcher1
Received on Tuesday, 19 April 2016 11:53:13 UTC