Comments (with use case) from Hugh Glaser on 2016-12-29 (public-dwbp-comments@w3.org from December 2016)

From: Hugh Glaser <hugh@glasers.org>
Date: Thu, 29 Dec 2016 11:32:22 +0000
To: public-dwbp-comments@w3.org
Message-Id: <8454B8E5-51BD-4EBC-99DA-CD782A0AD0F9@glasers.org>

As suggested by Phil Archer (when I posted this to public-LOD), I am reposting here.
(I read Hadley's Facebook post to mean that only W3C members could comment now.)

Repost:
https://www.w3.org/TR/dwbp/

Hi.
I have just seen a reference to this on Facebook, posted by Hadley - many thanks.

I guess it is all too late (sorry!), but thought I would raise one issue, in case someone here feels they can to take it up.
And it is sort of interesting for this list.

As far as I can see (really sorry if I have missed it), there is no suggestion of splitting datasets for licence purposes.
There is a bit on it in BP18 for different users and use cases.

The use case I am thinking about is the NYT (New York Times) LD release, all those years ago.
There was a bunch of data they had made into LD, and wanted to make it public; they also wanted to make the links that they had established to other datasets public.
So they gathered it all together, and put it in one dataset, with the appropriate licence, etc..
This would conform (if they did some more), with the Best Practices here.

However, this is probably not the best thing for them.
The basic dataset that they wanted to publish came with a bunch of licence restrictions - it is in some sense their treasure map, and they don't want to lose control of it.
The linkage, on the other hand, is exactly the stuff they want people to take away and do whatever they like with - after all, it is the very information that people need to find their data in the dataset; in SEO terms, it is driving traffic to their site.

(In my case, in very practical terms, I want to be able to harvest the owl:sameAs triples and put them in sameAs.org, safe in the knowledge that I am not violating any conditions.
And, I think, the NYT very much wants me to do that, so that their dataset gets found.)
In addition, in a related issue about splitting datasets, the provenance of the linkage is actually usually quite different from the provenance of the dataset. It may be that the linkage is the result of an intern spending the summer doing some work, whereas the rest of the dataset is in fact the result of decades of work (as was the case of the NYT).

DBpedia very helpfully splits out this sort of data - not for licence reasons, I think (at least at the moment, although it might be the case that there should be different licences), but for convenience, with a very large dataset.

An additional use case:
Many lhe libraries of the world are making their catalogue subject data available. They have also established links between their catalogue and other catalogues. Using these links, I was able to build http://sameas.org/store/kelle/ , which enables the closures of quite a few of the catalogue equivalences.
The libraries were all very happy to give me this linkage information - had this information been bundled up with the catalogue data, the process of allowing me free use would have been much more problematic, and indeed I might not have got any data.

So, is there any scope for comments somewhere about this?
I think it would be a great if the idea of providing linkage with a separate licence (even if it is in the same physical distribution of the dataset) could be included.

Best, and season's greetings to you all.

Hugh

Received on Thursday, 29 December 2016 11:32:57 UTC