W3C home > Mailing lists > Public > public-dwbp-comments@w3.org > October 2015

RE: Webby Data

From: Makx Dekkers <mail@makxdekkers.com>
Date: Sun, 11 Oct 2015 14:48:58 +0200
To: "'Jeremy Tandy'" <jeremy.tandy@gmail.com>, "'Phil Archer'" <phila@w3.org>, "'Erik Wilde'" <dret@berkeley.edu>, <public-dwbp-comments@w3.org>
Message-ID: <001101d10423$33389e10$99a9da30$@makxdekkers.com>
Yes, you are right, sameness is always subjective. That’s obviously why discussions that I was involved in did not come to a common point of view.

 

SKOS concepts might be on the easier end of the scale, while datasets might be more complicated, as they don’t describe a single ‘thing’ but are rather a more indirect observation or representation of some real-world phenomenon.

 

There is also the point which incarnations you want to keep accessible. If you have dataset A that contains errors, you might say that after correcting errors it would still be the same A and, to avoid ‘wrong data’ to propagate, you would overwrite the dataset under the same URI; on the other hand, if someone used the A with errors as part of an argument (such as in a published article), you really want to keep both the A with errors and the A with corrections separate, so that the published document keeps on referring to the erroneous data that was used to make the argument.

 

So I read the stuff of versioning in the LDR as guidance on HOW to do versioning, not on WHEN or WHY to do versioning. Correct?

 

Makx.

 

 

 

From: Jeremy Tandy [mailto:jeremy.tandy@gmail.com] 
Sent: 11 October 2015 12:43
To: Makx Dekkers <mail@makxdekkers.com>; Phil Archer <phila@w3.org>; Erik Wilde <dret@berkeley.edu>; public-dwbp-comments@w3.org
Subject: Re: Webby Data

 

Hi Makx

> how much change is allowed for an entity to be considered ‘the same’. Are there any hard rules in the LDR to determine this?

This is the crucial question! There are no hard rules, but the essence is described below. 

I think that it is fine to add new information about the entity, e.g. additional 
properties, new translations of labels (different languages), to fix errors etc. This is changing the information about the entity, but still referring to the _same_ entity. 

Things that change the entity? When dealing with (SKOS) Concepts you can often get away with broadening a definition and still treating it as the same- because all the data "in the wild" that references the concept still works. This is not true if you narrow the definition. In that case I would (in a registry context) deprecate the current definition (keep it available but say "don't use") and mint a new concept for the narrower concept. Another example; this time a physical thing. Think of a sports stadium. It might be completely rebuilt but people still call it by the same name ... Identity is a social construct. In this case you could consider the new and old to be the same sports stadium; with different information attributes. But if the stadium was moved (etc) people often give it a new name. In this case, I see these as two different entities. 

Sameness, as you can see, will always be subjective. But hopefully this gives you some ideas. 

Jeremy

On Sun, 11 Oct 2015 at 11:12, Makx Dekkers <mail@makxdekkers.com <mailto:mail@makxdekkers.com> > wrote:

Jeremy,

 

I think that help with the subject of versioning would be very welcome.

 

I looked at https://github.com/UKGovLD/registry-core/wiki/Principles-and-concepts#history-and-versioning, and this could work on both levels: for versions of datasets in a dataset catalogue in the model of DCAT (http://www.w3.org/TR/vocab-dcat/#vocabulary-overview) as well as for versions of items within datasets. However, the approach for the Linked Data Registry is based on strong control – involving formal agreement by a registry manager – which may not always be doable in an environment where datasets and data items are exchanged, shared, modified, merged.

 

A main question that people are struggling with is how much change is allowed for an entity to be considered ‘the same’. Are there any hard rules in the LDR to determine this?

 

Makx.

 

 

 

 

From: Jeremy Tandy [mailto:jeremy.tandy@gmail.com <mailto:jeremy.tandy@gmail.com> ] 
Sent: 10 October 2015 13:11
To: Phil Archer <phila@w3.org <mailto:phila@w3.org> >; Erik Wilde <dret@berkeley.edu <mailto:dret@berkeley.edu> >; public-dwbp-comments@w3.org <mailto:public-dwbp-comments@w3.org> 
Subject: Re: Webby Data

 

Phil- 

 

those changes look fine. Happy to help with the subject of versioning; Dave Reynolds and I spent some time working through the strategy implemented in the Linked Data Registry. It works in all the cases I have found so far.

 

Regards, Jeremy

On Sat, 10 Oct 2015 at 11:43 Phil Archer <phila@w3.org <mailto:phila@w3.org> > wrote:



On 10/10/2015 10:12, Jeremy Tandy wrote:
> Phil- thanks for drafting this update. It makes sense to me.
>
> There are 3 minor changes I would suggest ... and then there's Eric's
> concerns that 'webby data' is necessary but not sufficient for hypermedia.
>
> Starting with the three things:
>
> 1) your reference to the CSVW on the web method of assigning URIs to things
> that within a dataset only have locally scoped identifiers; would suggest
> you point folks directly to URI Template Properties [1] and the 'aboutUrl'
> [2]

Done at
http://philarcher1.github.io/dwbp/bp.html#identifiersWithinDatasets

It now says:
URIs can be long. In a dataset of even moderate size, storing each URI
is likely to be repetitive and obviously wasteful. Instead, define
locally unique identifiers for each element and provide data that allows
them to be converted to globally unique URIs programmatically. The
Metadata Vocabulary for Tabular Data [tabular-metadata] provides
mechanisms for doing this within tabular data such as CSV files, in
particular using URI template properties such as the about URL property.


>
> 2) you talk about 'confirming the versioning policy' ... a bit thorny this
> one.

Indeed. I've removed that bullet point, lazily copied from LD-BP.


In my opinion, only information resources can be versioned. Real-world
> resources can't be. For example, if I replace my car with another that is
> just like it, it this a new version of my car? No, it's a different car
> with a different identifier. Using version numbers in URIs means that you
> can only create durable links to that specific version ... and when a new
> version is released, your links are broken. That said, you might want to
> refer to a specific version of a document (or other information resource)
> as the basis of an analysis. I'm guessing that your need a section on the
> merits of when and where to use versioned URIs over and above what is
> already stated in http://www.w3.org/TR/dwbp/#dataVersioning (BTW, I agree
> that if you are going to use versioning, you should provide a version
> history, and that datasets, as information resources, are great candidates
> to be versioned). By way of example, please refer to the Linked Data
> Registry [3] that makes a distinction between versioned and non-versioned
> things [4]. You can see this in a live example [5]; the concept
> 'AGRICULTURE - SITE DRAINAGE' [6] is not versioned but the register item
> [7] that binds that concept into a controlled list (the register) is
> versioned (each version of a register item refers to a graph of information
> about the registered concept, so that the information held about the
> concept can be updated). Furthermore, we use a syntax (add a suffix `:n`
> where n is the version number) to allow people to access specific versions
> (see example [8] - although not very interesting as it only has one version
> ... in other examples you can traverse the version history). In the UI of
> the Linked Data Registry you can find the versions by clicking on the
> 'History' link.

That's really helpful info. The editors are struggling a little with the
issue of versioning so this should help us make progress. I'll need to
look at it too to see if it should be in this particular BP or elsewhere
in the doc.


>
> 3) in the 'How to test' section you say "Check that the URIs are
> resolvable". Now, IMHO, it's certainly best practice to have these URIs for
> data points resolve (I suppose even if it is only to the description of the
> dataset within which they're defined?), but there are cases where it's
> equally valid to use them just as (globally scoped) identifiers rather than
> URLs. This still adds value when you're trying to merge information from
> disparate datasets that you have downloaded and are working with, say, in a
> local triple store.

Fixed. It now says:

Check that within the dataset, references to things that don't change or
that change slowly, such as countries, regions, organizations and
people, as referred to by URIs or by short identifiers that can be
appended to a URI stub. Ideally the URIs should resolve, however, they
have value as globally scoped variables whether they resolve or not.



>
> ----
>
> Now, Eric's point [9] is that there is a "difference between 'web data
> only' and the 'web of hypermedia-driven services'" and that "'webby data'
> is a necessary but not sufficient condition to have hypermedia. [which
> requires providing navigational affordances to get things done with that
> data."
>
> I see that in the vast majority of cases, the data is accessed via a
> service end-point ... even if it is a trivial HTTP Get. But there are cases
> where (as I said in point #3 above) that you simply want to use URIs as
> identifiers. This clearly is not hypermedia. I wonder if there are two
> levels of requirements here? At this point, I'm unable to unpick this
> distinction further, but I'm sure it will be relevant in the Spatial Data
> on the Web WG.

I've given my first pass answer to Erik - let's see how it goes.

Thanks for the review - much appreciated.

Phil.


>
> More thinking required.
>
> Jeremy
>
>
> [1]: http://www.w3.org/TR/tabular-metadata/#uri-template-properties
> [2]: http://www.w3.org/TR/tabular-metadata/#cell-aboutUrl
> [3]: https://github.com/UKGovLD/registry-core
> [4]:
> https://github.com/UKGovLD/registry-core/wiki/Principles-and-concepts#versioned-types
>
> [5]: http://environment.data.gov.uk/registry/
> [6]:
> http://environment.data.gov.uk/registry/def/water-quality/sampling_point_types/AE
>
> [7]:
> http://environment.data.gov.uk/registry/def/water-quality/sampling_point_types/_AE
> [8]:
> http://environment.data.gov.uk/registry/def/water-quality/sampling_point_types/_AE:1
>
> [9]: https://lists.w3.org/Archives/Public/public-dwbp-wg/2015Oct/0026.html
>
> On Sat, 10 Oct 2015 at 08:53 Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk <mailto:jeremy.tandy@metoffice.gov.uk> >
> wrote:
>
>>
>>
>> -----Original Message-----
>> From: Phil Archer [mailto:phila@w3.org <mailto:phila@w3.org> ]
>> Sent: 09 October 2015 22:29
>> To: Public DWBP WG
>> Cc: Erik Wilde; Tandy, Jeremy
>> Subject: Webby Data
>>
>> Dear all,
>>
>> As the WG is well aware, Erik has been flying the flag for Webby
>> data/hypermedia.
>>
>> It took me a while to work out just what Erik was getting at, mainly
>> because I have been somewhat word blind. When you've seen a document as
>> much as we've seen the BP doc, you think things are there that aren't and
>> vice versa.
>>
>> It was Jeremy Tandy (SDW and CSV WG) pointed out to me last week what was
>> missing - which is what I think Erik has been saying for a while.
>> Erik says it differently but I dare to hope that what I've suggested as a
>> new BP addresses his issue.
>>
>> We had a BP that said "use persistent URIs as identifiers". And then  it
>> said *Datasets* must be identified by persistent URIs. What it didn't say
>> was that data points within the data should also be URIs where possible.
>>
>> I've drafted a BP to cover this, see
>> http://philarcher1.github.io/dwbp/bp.html#identifiersWithinDatasets
>>
>> For those who were there, this is the short form of my over-long talk in
>> Sao Paulo the other day ;-)
>>
>> The BP emphasises the importance of links between things that are
>> identified. It does this with reference to the Web in general and then
>> cites *both* 5 stars of linked data and Erik's words on hypermedia as
>> examples of what this means.
>>
>> @Erik - is that doc going to stay on GitHub? Any chance it might find a
>> more stable/permanent home? I really don't like linking to GH in a W3C Rec
>> track document.
>>
>> I very much doubt this BP will go through unchanged, but I've had a go at
>> drafting it and have created the pull request. I hope the WG will discuss
>> it and not just merge it.
>>
>> HTH
>>
>> Phil.
>>
>> --
>>
>>
>> Phil Archer
>> W3C Data Activity Lead
>> http://www.w3.org/2013/data/
>>
>> http://philarcher.org
>> +44 (0)7887 767755
>> @philarcher1
>>
>

--


Phil Archer
W3C Data Activity Lead
http://www.w3.org/2013/data/

http://philarcher.org
+44 (0)7887 767755
@philarcher1
Received on Sunday, 11 October 2015 12:49:37 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 11 October 2015 12:49:38 UTC