- From: Jeremy Tandy <jeremy.tandy@gmail.com>
- Date: Sun, 11 Oct 2015 14:08:51 +0000
- To: Makx Dekkers <mail@makxdekkers.com>, Phil Archer <phila@w3.org>, Erik Wilde <dret@berkeley.edu>, public-dwbp-comments@w3.org
- Message-ID: <CADtUq_1d=VdHZWODx8kjpt2OrJv5QZQ=GCT7tRO2B4jHDistVg@mail.gmail.com>
> So I read the stuff of versioning in the LDR as guidance on HOW to do versioning, not on WHEN or WHY to do versioning. Correct? I'd agree with that. FWIW, I think that datasets _should_ be versioned. As information resources they can evolve state over time. WRT datasets, I would like to see a durable identifier (URI) for the dataset that covers its entire lifetime and other URIs for each version plus links between the versions. This is similar to the register item behaviour in the Linked Data Registry. BR. Jeremy On Sun, 11 Oct 2015 at 13:49, Makx Dekkers <mail@makxdekkers.com> wrote: > Yes, you are right, sameness is always subjective. That’s obviously why > discussions that I was involved in did not come to a common point of view. > > > > SKOS concepts might be on the easier end of the scale, while datasets > might be more complicated, as they don’t describe a single ‘thing’ but are > rather a more indirect observation or representation of some real-world > phenomenon. > > > > There is also the point which incarnations you want to keep accessible. If > you have dataset A that contains errors, you might say that after > correcting errors it would still be the same A and, to avoid ‘wrong data’ > to propagate, you would overwrite the dataset under the same URI; on the > other hand, if someone used the A with errors as part of an argument (such > as in a published article), you really want to keep both the A with errors > and the A with corrections separate, so that the published document keeps > on referring to the erroneous data that was used to make the argument. > > > > So I read the stuff of versioning in the LDR as guidance on HOW to do > versioning, not on WHEN or WHY to do versioning. Correct? > > > > Makx. > > > > > > > > *From:* Jeremy Tandy [mailto:jeremy.tandy@gmail.com] > *Sent:* 11 October 2015 12:43 > *To:* Makx Dekkers <mail@makxdekkers.com>; Phil Archer <phila@w3.org>; > Erik Wilde <dret@berkeley.edu>; public-dwbp-comments@w3.org > *Subject:* Re: Webby Data > > > > Hi Makx > > > how much change is allowed for an entity to be considered ‘the same’. > Are there any hard rules in the LDR to determine this? > > This is the crucial question! There are no hard rules, but the essence is > described below. > > I think that it is fine to add new information about the entity, e.g. > additional > properties, new translations of labels (different languages), to fix > errors etc. This is changing the information about the entity, but still > referring to the _same_ entity. > > Things that change the entity? When dealing with (SKOS) Concepts you can > often get away with broadening a definition and still treating it as the > same- because all the data "in the wild" that references the concept still > works. This is not true if you narrow the definition. In that case I would > (in a registry context) deprecate the current definition (keep it available > but say "don't use") and mint a new concept for the narrower concept. > Another example; this time a physical thing. Think of a sports stadium. It > might be completely rebuilt but people still call it by the same name ... > Identity is a social construct. In this case you could consider the new and > old to be the same sports stadium; with different information attributes. > But if the stadium was moved (etc) people often give it a new name. In this > case, I see these as two different entities. > > Sameness, as you can see, will always be subjective. But hopefully this > gives you some ideas. > > Jeremy > > On Sun, 11 Oct 2015 at 11:12, Makx Dekkers <mail@makxdekkers.com> wrote: > > Jeremy, > > > > I think that help with the subject of versioning would be very welcome. > > > > I looked at > https://github.com/UKGovLD/registry-core/wiki/Principles-and-concepts#history-and-versioning, > and this could work on both levels: for versions of datasets in a dataset > catalogue in the model of DCAT ( > http://www.w3.org/TR/vocab-dcat/#vocabulary-overview) as well as for > versions of items within datasets. However, the approach for the Linked > Data Registry is based on strong control – involving formal agreement by a > registry manager – which may not always be doable in an environment where > datasets and data items are exchanged, shared, modified, merged. > > > > A main question that people are struggling with is how much change is > allowed for an entity to be considered ‘the same’. Are there any hard rules > in the LDR to determine this? > > > > Makx. > > > > > > > > > > *From:* Jeremy Tandy [mailto:jeremy.tandy@gmail.com] > *Sent:* 10 October 2015 13:11 > *To:* Phil Archer <phila@w3.org>; Erik Wilde <dret@berkeley.edu>; > public-dwbp-comments@w3.org > *Subject:* Re: Webby Data > > > > Phil- > > > > those changes look fine. Happy to help with the subject of versioning; > Dave Reynolds and I spent some time working through the strategy > implemented in the Linked Data Registry. It works in all the cases I have > found so far. > > > > Regards, Jeremy > > On Sat, 10 Oct 2015 at 11:43 Phil Archer <phila@w3.org> wrote: > > > > On 10/10/2015 10:12, Jeremy Tandy wrote: > > Phil- thanks for drafting this update. It makes sense to me. > > > > There are 3 minor changes I would suggest ... and then there's Eric's > > concerns that 'webby data' is necessary but not sufficient for > hypermedia. > > > > Starting with the three things: > > > > 1) your reference to the CSVW on the web method of assigning URIs to > things > > that within a dataset only have locally scoped identifiers; would suggest > > you point folks directly to URI Template Properties [1] and the > 'aboutUrl' > > [2] > > Done at > http://philarcher1.github.io/dwbp/bp.html#identifiersWithinDatasets > > It now says: > URIs can be long. In a dataset of even moderate size, storing each URI > is likely to be repetitive and obviously wasteful. Instead, define > locally unique identifiers for each element and provide data that allows > them to be converted to globally unique URIs programmatically. The > Metadata Vocabulary for Tabular Data [tabular-metadata] provides > mechanisms for doing this within tabular data such as CSV files, in > particular using URI template properties such as the about URL property. > > > > > > 2) you talk about 'confirming the versioning policy' ... a bit thorny > this > > one. > > Indeed. I've removed that bullet point, lazily copied from LD-BP. > > > In my opinion, only information resources can be versioned. Real-world > > resources can't be. For example, if I replace my car with another that is > > just like it, it this a new version of my car? No, it's a different car > > with a different identifier. Using version numbers in URIs means that you > > can only create durable links to that specific version ... and when a new > > version is released, your links are broken. That said, you might want to > > refer to a specific version of a document (or other information resource) > > as the basis of an analysis. I'm guessing that your need a section on the > > merits of when and where to use versioned URIs over and above what is > > already stated in http://www.w3.org/TR/dwbp/#dataVersioning (BTW, I > agree > > that if you are going to use versioning, you should provide a version > > history, and that datasets, as information resources, are great > candidates > > to be versioned). By way of example, please refer to the Linked Data > > Registry [3] that makes a distinction between versioned and non-versioned > > things [4]. You can see this in a live example [5]; the concept > > 'AGRICULTURE - SITE DRAINAGE' [6] is not versioned but the register item > > [7] that binds that concept into a controlled list (the register) is > > versioned (each version of a register item refers to a graph of > information > > about the registered concept, so that the information held about the > > concept can be updated). Furthermore, we use a syntax (add a suffix `:n` > > where n is the version number) to allow people to access specific > versions > > (see example [8] - although not very interesting as it only has one > version > > ... in other examples you can traverse the version history). In the UI of > > the Linked Data Registry you can find the versions by clicking on the > > 'History' link. > > That's really helpful info. The editors are struggling a little with the > issue of versioning so this should help us make progress. I'll need to > look at it too to see if it should be in this particular BP or elsewhere > in the doc. > > > > > > 3) in the 'How to test' section you say "Check that the URIs are > > resolvable". Now, IMHO, it's certainly best practice to have these URIs > for > > data points resolve (I suppose even if it is only to the description of > the > > dataset within which they're defined?), but there are cases where it's > > equally valid to use them just as (globally scoped) identifiers rather > than > > URLs. This still adds value when you're trying to merge information from > > disparate datasets that you have downloaded and are working with, say, > in a > > local triple store. > > Fixed. It now says: > > Check that within the dataset, references to things that don't change or > that change slowly, such as countries, regions, organizations and > people, as referred to by URIs or by short identifiers that can be > appended to a URI stub. Ideally the URIs should resolve, however, they > have value as globally scoped variables whether they resolve or not. > > > > > > > ---- > > > > Now, Eric's point [9] is that there is a "difference between 'web data > > only' and the 'web of hypermedia-driven services'" and that "'webby data' > > is a necessary but not sufficient condition to have hypermedia. [which > > requires providing navigational affordances to get things done with that > > data." > > > > I see that in the vast majority of cases, the data is accessed via a > > service end-point ... even if it is a trivial HTTP Get. But there are > cases > > where (as I said in point #3 above) that you simply want to use URIs as > > identifiers. This clearly is not hypermedia. I wonder if there are two > > levels of requirements here? At this point, I'm unable to unpick this > > distinction further, but I'm sure it will be relevant in the Spatial Data > > on the Web WG. > > I've given my first pass answer to Erik - let's see how it goes. > > Thanks for the review - much appreciated. > > Phil. > > > > > > More thinking required. > > > > Jeremy > > > > > > [1]: http://www.w3.org/TR/tabular-metadata/#uri-template-properties > > [2]: http://www.w3.org/TR/tabular-metadata/#cell-aboutUrl > > [3]: https://github.com/UKGovLD/registry-core > > [4]: > > > https://github.com/UKGovLD/registry-core/wiki/Principles-and-concepts#versioned-types > > > > [5]: http://environment.data.gov.uk/registry/ > > [6]: > > > http://environment.data.gov.uk/registry/def/water-quality/sampling_point_types/AE > > > > [7]: > > > http://environment.data.gov.uk/registry/def/water-quality/sampling_point_types/_AE > > [8]: > > > http://environment.data.gov.uk/registry/def/water-quality/sampling_point_types/_AE:1 > > > > [9]: > https://lists.w3.org/Archives/Public/public-dwbp-wg/2015Oct/0026.html > > > > On Sat, 10 Oct 2015 at 08:53 Tandy, Jeremy < > jeremy.tandy@metoffice.gov.uk> > > wrote: > > > >> > >> > >> -----Original Message----- > >> From: Phil Archer [mailto:phila@w3.org] > >> Sent: 09 October 2015 22:29 > >> To: Public DWBP WG > >> Cc: Erik Wilde; Tandy, Jeremy > >> Subject: Webby Data > >> > >> Dear all, > >> > >> As the WG is well aware, Erik has been flying the flag for Webby > >> data/hypermedia. > >> > >> It took me a while to work out just what Erik was getting at, mainly > >> because I have been somewhat word blind. When you've seen a document as > >> much as we've seen the BP doc, you think things are there that aren't > and > >> vice versa. > >> > >> It was Jeremy Tandy (SDW and CSV WG) pointed out to me last week what > was > >> missing - which is what I think Erik has been saying for a while. > >> Erik says it differently but I dare to hope that what I've suggested as > a > >> new BP addresses his issue. > >> > >> We had a BP that said "use persistent URIs as identifiers". And then it > >> said *Datasets* must be identified by persistent URIs. What it didn't > say > >> was that data points within the data should also be URIs where possible. > >> > >> I've drafted a BP to cover this, see > >> http://philarcher1.github.io/dwbp/bp.html#identifiersWithinDatasets > >> > >> For those who were there, this is the short form of my over-long talk in > >> Sao Paulo the other day ;-) > >> > >> The BP emphasises the importance of links between things that are > >> identified. It does this with reference to the Web in general and then > >> cites *both* 5 stars of linked data and Erik's words on hypermedia as > >> examples of what this means. > >> > >> @Erik - is that doc going to stay on GitHub? Any chance it might find a > >> more stable/permanent home? I really don't like linking to GH in a W3C > Rec > >> track document. > >> > >> I very much doubt this BP will go through unchanged, but I've had a go > at > >> drafting it and have created the pull request. I hope the WG will > discuss > >> it and not just merge it. > >> > >> HTH > >> > >> Phil. > >> > >> -- > >> > >> > >> Phil Archer > >> W3C Data Activity Lead > >> http://www.w3.org/2013/data/ > >> > >> http://philarcher.org > >> +44 (0)7887 767755 > >> @philarcher1 > >> > > > > -- > > > Phil Archer > W3C Data Activity Lead > http://www.w3.org/2013/data/ > > http://philarcher.org > +44 (0)7887 767755 > @philarcher1 > >
Received on Sunday, 11 October 2015 14:09:31 UTC