- From: Laufer <laufer@globo.com>
- Date: Thu, 26 Mar 2015 14:01:06 -0300
- To: Steven Adler <adler1@us.ibm.com>
- Cc: Christophe Guéret <christophe.gueret@dans.knaw.nl>, Bernadette Farias Lóscio <bfl@cin.ufpe.br>, Eric Stephan <ericphb@gmail.com>, Phil Archer <phila@w3.org>, DWBP WG <public-dwbp-wg@w3.org>
- Message-ID: <CA+pXJihmwh4=+mLTeVFZc=V-xxmr1-8rx90cvjKXR-_VN72Gnw@mail.gmail.com>
Hi all, I've started this thread because the misunderstanding about the LOD 5 stars scale, and how persons are using it as a way of classifying the quality of data published on the web. I think that different axes of quality, each one with its own 5 stars scale, could confuse even more people when someone attach a number of stars to a dataset. Besidest that, there will be certificates around these issues, probably taking into account several axes of quality. ODI already has a certification process. So, I think we must be very careful with this subject and be very clear in our texts in the documents. Abraços, Laufer Em quinta-feira, 26 de março de 2015, Steven Adler <adler1@us.ibm.com <javascript:_e(%7B%7D,'cvml','adler1@us.ibm.com');>> escreveu: > I like that approach, but that 5-star is not a Data Quality rating system > which I still think we need as part of BP. > > > Best Regards, > > Steve > > Motto: "Do First, Think, Do it Again" > > [image: Inactive hide details for Christophe Guéret ---03/25/2015 09:53:36 > PM---BTW, speaking about stars and feedback we may want to h]Christophe > Guéret ---03/25/2015 09:53:36 PM---BTW, speaking about stars and feedback > we may want to have a look at the 5 star scheme for community > > > > From: > > > Christophe Guéret <christophe.gueret@dans.knaw.nl> > > To: > > > Steven Adler/Somers/IBM@IBMUS > > Cc: > > > Phil Archer <phila@w3.org>, Laufer <laufer@globo.com>, Bernadette Farias > Lóscio <bfl@cin.ufpe.br>, DWBP WG <public-dwbp-wg@w3.org>, Eric Stephan < > ericphb@gmail.com> > > Date: > > > 03/25/2015 09:53 PM > > Subject: > > > Re: The 5 stars path > ------------------------------ > > > > BTW, speaking about stars and feedback we may want to have a look at the 5 > star scheme for community engagement from Tim Davies: > *http://www.opendataimpacts.net/engagement/* > <http://www.opendataimpacts.net/engagement/> > > We could probably do something with it, if only linking to it somewhere. > > Cheers, > Christophe > > -- > Sent with difficulties. Sorry for the brievety and typos... > > Op 24 mrt. 2015 07:18 schreef "Steven Adler" <*adler1@us.ibm.com*>: > > Rating a dataset is only valuable if records within the dataset have > ratings whose sum or average validates the dataset rating. That is, there > has to be provenance to the ratings. > > > Best Regards, > > Steve > > Motto: "Do First, Think, Do it Again" > > [image: Inactive hide details for Bernadette Farias Lóscio > ---03/24/2015 10:11:38 AM---Hi all, Thanks for the great discussion!]Bernadette > Farias Lóscio ---03/24/2015 10:11:38 AM---Hi all, Thanks for the great > discussion! > From: > > Bernadette Farias Lóscio <*bfl@cin.ufpe.br*> > To: > > Eric Stephan <*ericphb@gmail.com*> > Cc: > > Phil Archer <*phila@w3.org*>, Laufer <*laufer@globo.com*>, Christophe > Guéret <*christophe.gueret@dans.knaw.nl*>, DWBP WG < > *public-dwbp-wg@w3.org*> > Date: > > 03/24/2015 10:11 AM > Subject: > > Re: The 5 stars path > ------------------------------ > > > > Hi all, > > Thanks for the great discussion! > > I like the idea of having a star rating discussion, but we need to be > aware that publishing data on the Web is more than just publishing data and > metadata. It also concerns issues like data access and feedback. > > I've been thinking a lot about this rating system and it would be > great to consider all aspects related to data on the Web (ex: data format, > metadata, identifiers, data access, feedback, versioning...), but I'm bot > sure if this is the best choice. Maybe, we can have a rating system based > just on data and metadata, which is similar to the initial proposal of Phil. > > Cheers, > Bernadette > > 2015-03-22 18:38 GMT-03:00 Eric Stephan <*ericphb@gmail.com*>: > Wow what a wonderful thread to read. Thank you Phil! Many many > thanks for this wonderful note of clarity! > > >>if Eric and Annette can provide similar examples for NetCDF that > would be terrific (I'm out of my depth here). > > Yes I think we can show this quite easily. Just off the top of my > heads. > > NetCDF: > - is an open format for storing multi-dimensional data streams > [NETCDF] > - can be annotated with self describing metadata (called > attributes) > - has existing conventions for representing different forms of > data. E.g. CF convention. > - has a CF vocabulary [CFNAMES] for curated climate and > forecasting terminology. > - In addition the climate community within the Earth System Grid > (ESG) has adopted fully documented protocols [CMIP5] to show how regional > and climate model datasets must be organized so that they can be > inter-related to support regional and global climate studies. > - Leverages existing ISO standards used in the geospatial, dublin > core, and metadata communities. > - Finally an ontology was developed by NASA JPL called SWEET > [SWEET], there is previous research showing how the CF terms can > inter-related. > > I would submit that even without the ontology in terms of open > data, the climate community is already at 5 star. > > > > Eric > > > References > > [NETCDF] *http://en.wikipedia.org/wiki/NetCDF* > <http://en.wikipedia.org/wiki/NetCDF> > [CFNAMES] > *http://cfconventions.org/Data/cf-standard-names/28/build/cf-standard-name-table.html* > <http://cfconventions.org/Data/cf-standard-names/28/build/cf-standard-name-table.html> > [CMIP5] *http://cmip-pcmdi.llnl.gov/cmip5/* > <http://cmip-pcmdi.llnl.gov/cmip5/> > [SWEET] *https://sweet.jpl.nasa.gov/* <https://sweet.jpl.nasa.gov/> > > > On Sun, Mar 22, 2015 at 10:45 AM, Phil Archer <*phila@w3.org*> > wrote: > We are in full agreement. > > One of my hopes for this WG is that we can indeed lead people to > publish formats like CSV in the best way (i.e. with good quality metadata) > without them feeling somehow inferior. > > If that leads us to define our own star rating system, I > wouldn't mind. Something like: > > * It's available on the Web in an open format with a declared > licence (anything less is all but useless). > > ** As level 1 with good quality discovery metadata (we might > refer to the DCAT Application profile work as an example). > > *** All the above plus structural metadata in the relevant > format (e.g. CSV+ for CSV, VoID for RDF etc). > > This doesn't include quality metrics (which it should), and > contact details (which it should) - but they might be defined at level 2? > > Maybe a start anyway. > > Phil. > > On 22/03/2015 13:50, Laufer wrote: > I agree, Phil. > > What I want to reinforce is that it would be nice if we could > make clear in > the document that 5 stars LD (or OD?) is not a scale of a > dataset that is > well published in the web. We can have, for example, a "CSV > dataset" (3 > stars) more well published than a "LD dataset" (5 stars). Or, > maybe, we can > avoid using the 5 stars when what we want to say is that a > dataset is being > published in a CSV format. > > If we say that one dataset is 3 stars and other is 5 stars, > people have the > idea that the 5 one is better than the 3 one (as in reviews > or hotels, for > example). > > We probably will not define our own scale but I hope that our > set of BPs > could help people to publish a "Well Published Data on The > Web". > > Best Regards, > Laufer > > Em domingo, 22 de março de 2015, Christophe Guéret < > *christophe.gueret@dans.knaw.nl* > <javascript:_e(%7B%7D,'cvml','*christophe.gueret@dans.knaw.nl*');>> > escreveu: > +1! > > Christophe > > -- > Sent with difficulties. Sorry for the brievety and typos... > Op 22 mrt. 2015 08:47 schreef "Phil Archer" <*phila@w3.org* > >: > I've just been reading through Friday's minutes and I > see that this was > the hot topic of the day. As ever, I'm sorry I wasn't > able to be there. > > Let me add my 2 cents. > > LD forms a small part of the available data on the Web. > It would be > silly of us to push for everyone to convert their data > into perfectly > linked 5 star data before they make it available > publicly or behind a > pay-wall of some kind. > > What we *can* do IMO is: > > - Promote the publication of human readable metadata as > Laufer has > described; > > - promote the publication of machine readable metadata > and then show how > this can be (and is) done with RDF using DCAT as an > example; > > - promote the publication of structural metadata which, > for CSV at > least, we have a very clear route - use the CSV on the > Web work; > > - if Eric and Annette can provide similar examples for > NetCDF that would > be terrific (I'm out of my depth here). > > - We can leave it to the Spatial Data on the Web WG to > handle spatial > stuff (as they are leaving some of their generic issues > to this group). > > As an aside, the CSV WG has resolved its issues now and > is expecting to > publish pretty much the stable version of its specs in > the first week of > April. > > If you publish data in your favourite format + > structural metadata in > whatever format goes with that (and the CSV WG is using > JSON for its > metadata) then you are providing a route through which > your users can > readily create 5 star data if they so wish. They may or > may not use LD > themselves but the concept behind it is, I hope, clear > enough to readers? > > From what I've read of Friday and the list since > then, I dare t hope > this is in line with the general mood of the WG? > > Phil. > > > > On 20/03/2015 18:09, Laufer wrote: > Thank, you, Eric. > > Abraços, > Laufer > > 2015-03-20 12:31 GMT-03:00 Eric Stephan < > *ericphb@gmail.com*>: > Laufer and Bernadette, > > I raised an issue relating to this asking the > question can we use 5 > star > as a metric and not a path? > *http://www.w3.org/2013/dwbp/track/issues/148* > <http://www.w3.org/2013/dwbp/track/issues/148> > > Eric S. > > On Fri, Mar 20, 2015 at 7:54 AM, Bernadette Farias > Lóscio < > *bfl@cin.ufpe.br* > wrote: > Hi Laufer, > > Thanks for the message! It is a very useful > explanation! > > I fully agree with you: "In this dataset > publishing I can see the > idea of > publishing metadata and using standard vocabularies, > but is not a LD > dataset." > > IMHO, we can use vocabularies to publish metadata, > but we are not > doing > linked data, i.e., there are no links between > resources. > > I also agree that "we should differentiate the idea > of a Best > Practice of > a non LD dataset of the idea of an implicit Best > Practice to go to a > LD > dataset, that is what the 5 stars scale says.". > > If we have a BP whose implementation proposes the > use of the RDF > model to > publish data, then we are moving towards the 5 > stars. It is important > to > note that, publishind data using the RDF model may > be just one of the > proposed approaches for implementation, i.e, we may > show other ways of > publishing data without using RDF. > > Cheers, > Bernadette > > > > > 2015-03-20 11:32 GMT-03:00 Laufer <*laufer@globo.com* > >: > > Hi all, > > I will start my comment using an example: > > Someone publish a page where there are links to 2 > files: > a csv file with a dataset; > a text file that explains the structure of the > dataset, in natural > language (metadata). > > In the page there are a lot of metadata provided > in natural > language, as > for example, an overview of the dataset, license, > organization, > version, > creator, rights, etc... > > At the same time, the page has an embedded dcat > instance using rdfa > where there are info about the dataset, the > distribution, etc. > > What I want to say is that we have here the metadata > concept mixed > with > semantic web concepts, and it is a way of publishing > data that, if > all the > things are well described, could be very useful to > the society. > > In this dataset publishing I can see the idea of > publishing metadata > and > using standard vocabularies, but is not a LD dataset. > > What I was discussing in the last meeting is: will > we support in the > document the idea that the best way to publish is > LD. I am not > saying that > I am against or not the idea. I am favorable to LD. > But we should > differentiate the idea of a Best Practice of a non > LD dataset of the > idea > of an implicit Best Practice to go to a LD dataset, > that is what the > 5 > stars scale says. > > Maybe is too much care with the words, sorry about > this. > > Best Regards, > Laufer > > -- > . . . .. . . > . . . .. > . .. . > > > > -- > Bernadette Farias Lóscio > Centro de Informática > Universidade Federal de Pernambuco - UFPE, Brazil > ---------------------------------------------------------------------------- > > > > -- > > > Phil Archer > W3C Data Activity Lead > *http://www.w3.org/2013/data/* <http://www.w3.org/2013/data/> > > *http://philarcher.org* <http://philarcher.org/> > *+44 (0)7887 767755* <%2B44%20%280%297887%20767755> > @philarcher1 > > -- > > > Phil Archer > W3C Data Activity Lead > *http://www.w3.org/2013/data/* <http://www.w3.org/2013/data/> > > *http://philarcher.org* <http://philarcher.org/> > *+44 (0)7887 767755* <%2B44%20%280%297887%20767755> > @philarcher1 > > > > -- > Bernadette Farias Lóscio > Centro de Informática > Universidade Federal de Pernambuco - UFPE, Brazil > > ---------------------------------------------------------------------------- > > > > -- . . . .. . . . . . .. . .. .
Received on Thursday, 26 March 2015 17:01:35 UTC