Re: Comments (with use case) from Bernadette Farias Lóscio on 2017-01-11 (public-dwbp-comments@w3.org from January 2017)

From: Bernadette Farias Lóscio <bfl@cin.ufpe.br>
Date: Tue, 10 Jan 2017 21:55:18 -0300
To: Hugh Glaser <hugh@glasers.org>
Cc: Phil Archer <phila@w3.org>, W3C DWBP WG - Comments <public-dwbp-comments@w3.org>, POE Comment list <public-poe-comments@w3.org>
Message-ID: <CANx1PzxSkpzeRFUKKMy3XTZjbfrKW9cgNCLCimqu_j70UBigdg@mail.gmail.com>
Hi Hugh and Phil,

Thanks for the amazing discussion!

In general, the DWBP provide guidance to publishers who aim to publish data
on the Web. However, they don't give details about "how to do". We just
give possible approaches to implementation. We did this because the DWBP
should be technology agnostic.

However, I fully agree that we need more discussion about topics like
subsets, provenance, license and versioning. I am very interested in these
topics and I'm gonna be glad to continue this discussion. Maybe, we can
work on some complementary material for the DWBP.

Cheers,
Bernadette



2017-01-06 11:05 GMT-03:00 Hugh Glaser <hugh@glasers.org>:

> Hi Phil,
>
> > On 5 Jan 2017, at 11:22, Phil Archer <phila@w3.org> wrote:
> >
> > + POE WG
> >
> > Thanks again for posting this, Hugh.
> Pleasure.
> >
> > Now that I've actually read and thought about what you've said here, I
> think DWBP covers the topic as far as it will. BP4 [1] simply stresses the
> importance of including licence info. In the NYT example, yes, different
> licence info applies to different subsets, but the high level advice in the
> DWBP doc still obtains.
> Hmm. You are right about the licence issue.
> But I think there is an access issue.
> That's why I'm looking at BP18.
> It says things like "Another way to subset a dataset" - (this actually
> follows a paragraph that doesn't really tell you how, by the way - it talks
> about how to access, not how to actually subset).
> So the advice on how to subset is rather meagre - basically it says "split
> it into smaller units".
> The example of course helps to flesh it out in one way.
> So I go back to what DBpedia does, as a very commonly accessed large
> dataset.
> They provide http://wiki.dbpedia.org/downloads-2016-04
> Which is exactly the sort of thing I need.
> [Sorry - I should have said that this doesn't just relate to owl:sameAs -
> there are other predicates I would like to be able to get:
> owl:differentFrom (to power http://differentfrom.org ), and then all the
> SKOS and other vocabularies that have things like closeMatch, exactMatch
> etc..]
>
> I guess my concern is that you wouldn't arrive at the DBpedia solution if
> you used this document.
> All the suggestions in BP18 are about giving dynamic access to a stored
> dataset.
> I can't see where any advice on serving static files is given.
> The RDF Data Cube Vocabulary and the Example are both about retrieving
> subsets from a store.
> So, especially for a large dataset, there is little advice of how to
> subset the data into files, so that it can be served efficiently.
> Maybe that is because server efficiency is not in the list of aimed
> Benefits - although Processability might be thought to include that?
>
> Getting quite specific: :-)
> I realise that I have a problem with the paragraph in BP18 that starts
> "Consider the expected use cases for your dataset and determine what types
> of subsets are likely to be most useful. "
> I sort of expect it to go on to discuss how to do the considering of the
> types of substes.
> But it doesn't.
> It goes on to discuss how to do it.
> So, I would split that paragraph after the first sentence.
> And I would add, after the first sentence (in the new first para), some
> words about how it might be by place, by time, by common meaning (my bit!).
>
> Anyway, thanks for engaging with me - very enjoyable.
>
> >
> > *However*
> >
> > I think your use case is very pertinent to the Permissions and
> Obligations Expression WG (hence adding the additional list). Their use
> case doc includes the kind of thing you're after - I hope. See
> http://w3c.github.io/poe/ucr/#POE.UC.06 and the following one (which is
> from the news industry).
> Yes, the licence issue is probably a use case for the POE - although it
> wasn't the primary issue for me.
> >
> > As ever, I imagine that this will all come down to identifiers, subsets
> and, where relevant, named graphs, but the point about provenance and
> licensing being closely linked is well made.
> Thanks - I think it is most important in the Linked Data world, where
> people are encouraged to add links all the time, and the licence and
> provenance (and of course I should have mentioned trust!) just get lumped
> into the same dataset metadata.
>
> Very best
> Hugh
> >
> > Cheers
> >
> > Phil
> >
> > [1] https://www.w3.org/TR/dwbp/#DataLicense
> >
> > On 29/12/2016 11:32, Hugh Glaser wrote:
> >> As suggested by Phil Archer (when I posted this to public-LOD), I am
> reposting here.
> >> (I read Hadley's Facebook post to mean that only W3C members could
> comment now.)
> >>
> >> Repost:
> >> https://www.w3.org/TR/dwbp/
> >>
> >> Hi.
> >> I have just seen a reference to this on Facebook, posted by Hadley -
> many thanks.
> >>
> >> I guess it is all too late (sorry!), but thought I would raise one
> issue, in case someone here feels they can to take it up.
> >> And it is sort of interesting for this list.
> >>
> >> As far as I can see (really sorry if I have missed it), there is no
> suggestion of splitting datasets for licence purposes.
> >> There is a bit on it in BP18 for different users and use cases.
> >>
> >> The use case I am thinking about is the NYT (New York Times) LD
> release, all those years ago.
> >> There was a bunch of data they had made into LD, and wanted to make it
> public; they also wanted to make the links that they had established to
> other datasets public.
> >> So they gathered it all together, and put it in one dataset, with the
> appropriate licence, etc..
> >> This would conform (if they did some more), with the Best Practices
> here.
> >>
> >> However, this is probably not the best thing for them.
> >> The basic dataset that they wanted to publish came with a bunch of
> licence restrictions - it is in some sense their treasure map, and they
> don't want to lose control of it.
> >> The linkage, on the other hand, is exactly the stuff they want people
> to take away and do whatever they like with - after all, it is the very
> information that people need to find their data in the dataset; in SEO
> terms, it is driving traffic to their site.
> >>
> >> (In my case, in very practical terms, I want to be able to harvest the
> owl:sameAs triples and put them in sameAs.org, safe in the knowledge that I
> am not violating any conditions.
> >> And, I think, the NYT very much wants me to do that, so that their
> dataset gets found.)
> >> In addition, in a related issue about splitting datasets, the
> provenance of the linkage is actually usually quite different from the
> provenance of the dataset. It may be that the linkage is the result of an
> intern spending the summer doing some work, whereas the rest of the dataset
> is in fact the result of decades of work (as was the case of the NYT).
> >>
> >> DBpedia very helpfully splits out this sort of data - not for licence
> reasons, I think (at least at the moment, although it might be the case
> that there should be different licences), but for convenience, with a very
> large dataset.
> >>
> >> An additional use case:
> >> Many lhe libraries of the world are making their catalogue subject data
> available. They have also established links between their catalogue and
> other catalogues. Using these links, I was able to build
> http://sameas.org/store/kelle/ , which enables the closures of quite a
> few of the catalogue equivalences.
> >> The libraries were all very happy to give me this linkage information -
> had this information been bundled up with the catalogue data, the process
> of allowing me free use would have been much more problematic, and indeed I
> might not have got any data.
> >>
> >> So, is there any scope for comments somewhere about this?
> >> I think it would be a great if the idea of providing linkage with a
> separate licence (even if it is in the same physical distribution of the
> dataset) could be included.
> >>
> >> Best, and season's greetings to you all.
> >>
> >> Hugh
> >>
> >
> > --
> >
> >
> > Phil Archer
> > Data Strategist, W3C
> > http://www.w3.org/
> >
> > http://philarcher.org
> > +44 (0)7887 767755
> > @philarcher1
> >
>
>
>


-- 
Bernadette Farias Lóscio
Centro de Informática
Universidade Federal de Pernambuco - UFPE, Brazil
----------------------------------------------------------------------------
Received on Wednesday, 11 January 2017 00:56:13 UTC