RE: Webby Data from Makx Dekkers on 2015-10-11 (public-dwbp-comments@w3.org from October 2015)

From: Makx Dekkers <mail@makxdekkers.com>
Date: Sun, 11 Oct 2015 12:12:51 +0200
To: "'Jeremy Tandy'" <jeremy.tandy@gmail.com>, "'Phil Archer'" <phila@w3.org>, "'Erik Wilde'" <dret@berkeley.edu>, <public-dwbp-comments@w3.org>
Message-ID: <000501d1040d$63ea7210$2bbf5630$@makxdekkers.com>
Jeremy,

 

I think that help with the subject of versioning would be very welcome.

 

I looked at https://github.com/UKGovLD/registry-core/wiki/Principles-and-concepts#history-and-versioning, and this could work on both levels: for versions of datasets in a dataset catalogue in the model of DCAT (http://www.w3.org/TR/vocab-dcat/#vocabulary-overview) as well as for versions of items within datasets. However, the approach for the Linked Data Registry is based on strong control – involving formal agreement by a registry manager – which may not always be doable in an environment where datasets and data items are exchanged, shared, modified, merged.

 

A main question that people are struggling with is how much change is allowed for an entity to be considered ‘the same’. Are there any hard rules in the LDR to determine this?

 

Makx.

 

 

 

 

From: Jeremy Tandy [mailto:jeremy.tandy@gmail.com] 
Sent: 10 October 2015 13:11
To: Phil Archer <phila@w3.org>; Erik Wilde <dret@berkeley.edu>; public-dwbp-comments@w3.org
Subject: Re: Webby Data

 

Phil- 

 

those changes look fine. Happy to help with the subject of versioning; Dave Reynolds and I spent some time working through the strategy implemented in the Linked Data Registry. It works in all the cases I have found so far.

 

Regards, Jeremy

On Sat, 10 Oct 2015 at 11:43 Phil Archer <phila@w3.org <mailto:phila@w3.org> > wrote:



On 10/10/2015 10:12, Jeremy Tandy wrote:
> Phil- thanks for drafting this update. It makes sense to me.
>
> There are 3 minor changes I would suggest ... and then there's Eric's
> concerns that 'webby data' is necessary but not sufficient for hypermedia.
>
> Starting with the three things:
>
> 1) your reference to the CSVW on the web method of assigning URIs to things
> that within a dataset only have locally scoped identifiers; would suggest
> you point folks directly to URI Template Properties [1] and the 'aboutUrl'
> [2]

Done at
http://philarcher1.github.io/dwbp/bp.html#identifiersWithinDatasets

It now says:
URIs can be long. In a dataset of even moderate size, storing each URI
is likely to be repetitive and obviously wasteful. Instead, define
locally unique identifiers for each element and provide data that allows
them to be converted to globally unique URIs programmatically. The
Metadata Vocabulary for Tabular Data [tabular-metadata] provides
mechanisms for doing this within tabular data such as CSV files, in
particular using URI template properties such as the about URL property.


>
> 2) you talk about 'confirming the versioning policy' ... a bit thorny this
> one.

Indeed. I've removed that bullet point, lazily copied from LD-BP.


In my opinion, only information resources can be versioned. Real-world
> resources can't be. For example, if I replace my car with another that is
> just like it, it this a new version of my car? No, it's a different car
> with a different identifier. Using version numbers in URIs means that you
> can only create durable links to that specific version ... and when a new
> version is released, your links are broken. That said, you might want to
> refer to a specific version of a document (or other information resource)
> as the basis of an analysis. I'm guessing that your need a section on the
> merits of when and where to use versioned URIs over and above what is
> already stated in http://www.w3.org/TR/dwbp/#dataVersioning (BTW, I agree
> that if you are going to use versioning, you should provide a version
> history, and that datasets, as information resources, are great candidates
> to be versioned). By way of example, please refer to the Linked Data
> Registry [3] that makes a distinction between versioned and non-versioned
> things [4]. You can see this in a live example [5]; the concept
> 'AGRICULTURE - SITE DRAINAGE' [6] is not versioned but the register item
> [7] that binds that concept into a controlled list (the register) is
> versioned (each version of a register item refers to a graph of information
> about the registered concept, so that the information held about the
> concept can be updated). Furthermore, we use a syntax (add a suffix `:n`
> where n is the version number) to allow people to access specific versions
> (see example [8] - although not very interesting as it only has one version
> ... in other examples you can traverse the version history). In the UI of
> the Linked Data Registry you can find the versions by clicking on the
> 'History' link.

That's really helpful info. The editors are struggling a little with the
issue of versioning so this should help us make progress. I'll need to
look at it too to see if it should be in this particular BP or elsewhere
in the doc.


>
> 3) in the 'How to test' section you say "Check that the URIs are
> resolvable". Now, IMHO, it's certainly best practice to have these URIs for
> data points resolve (I suppose even if it is only to the description of the
> dataset within which they're defined?), but there are cases where it's
> equally valid to use them just as (globally scoped) identifiers rather than
> URLs. This still adds value when you're trying to merge information from
> disparate datasets that you have downloaded and are working with, say, in a
> local triple store.

Fixed. It now says:

Check that within the dataset, references to things that don't change or
that change slowly, such as countries, regions, organizations and
people, as referred to by URIs or by short identifiers that can be
appended to a URI stub. Ideally the URIs should resolve, however, they
have value as globally scoped variables whether they resolve or not.



>
> ----
>
> Now, Eric's point [9] is that there is a "difference between 'web data
> only' and the 'web of hypermedia-driven services'" and that "'webby data'
> is a necessary but not sufficient condition to have hypermedia. [which
> requires providing navigational affordances to get things done with that
> data."
>
> I see that in the vast majority of cases, the data is accessed via a
> service end-point ... even if it is a trivial HTTP Get. But there are cases
> where (as I said in point #3 above) that you simply want to use URIs as
> identifiers. This clearly is not hypermedia. I wonder if there are two
> levels of requirements here? At this point, I'm unable to unpick this
> distinction further, but I'm sure it will be relevant in the Spatial Data
> on the Web WG.

I've given my first pass answer to Erik - let's see how it goes.

Thanks for the review - much appreciated.

Phil.


>
> More thinking required.
>
> Jeremy
>
>
> [1]: http://www.w3.org/TR/tabular-metadata/#uri-template-properties
> [2]: http://www.w3.org/TR/tabular-metadata/#cell-aboutUrl
> [3]: https://github.com/UKGovLD/registry-core
> [4]:
> https://github.com/UKGovLD/registry-core/wiki/Principles-and-concepts#versioned-types
>
> [5]: http://environment.data.gov.uk/registry/
> [6]:
> http://environment.data.gov.uk/registry/def/water-quality/sampling_point_types/AE
>
> [7]:
> http://environment.data.gov.uk/registry/def/water-quality/sampling_point_types/_AE
> [8]:
> http://environment.data.gov.uk/registry/def/water-quality/sampling_point_types/_AE:1
>
> [9]: https://lists.w3.org/Archives/Public/public-dwbp-wg/2015Oct/0026.html
>
> On Sat, 10 Oct 2015 at 08:53 Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk <mailto:jeremy.tandy@metoffice.gov.uk> >
> wrote:
>
>>
>>
>> -----Original Message-----
>> From: Phil Archer [mailto:phila@w3.org <mailto:phila@w3.org> ]
>> Sent: 09 October 2015 22:29
>> To: Public DWBP WG
>> Cc: Erik Wilde; Tandy, Jeremy
>> Subject: Webby Data
>>
>> Dear all,
>>
>> As the WG is well aware, Erik has been flying the flag for Webby
>> data/hypermedia.
>>
>> It took me a while to work out just what Erik was getting at, mainly
>> because I have been somewhat word blind. When you've seen a document as
>> much as we've seen the BP doc, you think things are there that aren't and
>> vice versa.
>>
>> It was Jeremy Tandy (SDW and CSV WG) pointed out to me last week what was
>> missing - which is what I think Erik has been saying for a while.
>> Erik says it differently but I dare to hope that what I've suggested as a
>> new BP addresses his issue.
>>
>> We had a BP that said "use persistent URIs as identifiers". And then  it
>> said *Datasets* must be identified by persistent URIs. What it didn't say
>> was that data points within the data should also be URIs where possible.
>>
>> I've drafted a BP to cover this, see
>> http://philarcher1.github.io/dwbp/bp.html#identifiersWithinDatasets
>>
>> For those who were there, this is the short form of my over-long talk in
>> Sao Paulo the other day ;-)
>>
>> The BP emphasises the importance of links between things that are
>> identified. It does this with reference to the Web in general and then
>> cites *both* 5 stars of linked data and Erik's words on hypermedia as
>> examples of what this means.
>>
>> @Erik - is that doc going to stay on GitHub? Any chance it might find a
>> more stable/permanent home? I really don't like linking to GH in a W3C Rec
>> track document.
>>
>> I very much doubt this BP will go through unchanged, but I've had a go at
>> drafting it and have created the pull request. I hope the WG will discuss
>> it and not just merge it.
>>
>> HTH
>>
>> Phil.
>>
>> --
>>
>>
>> Phil Archer
>> W3C Data Activity Lead
>> http://www.w3.org/2013/data/
>>
>> http://philarcher.org
>> +44 (0)7887 767755
>> @philarcher1
>>
>

--


Phil Archer
W3C Data Activity Lead
http://www.w3.org/2013/data/

http://philarcher.org
+44 (0)7887 767755
@philarcher1
Received on Sunday, 11 October 2015 10:13:27 UTC