- From: Laufer <laufer@globo.com>
- Date: Thu, 26 Mar 2015 15:27:48 -0300
- To: Steven Adler <adler1@us.ibm.com>
- Cc: Bernadette Farias Lóscio <bfl@cin.ufpe.br>, Christophe Guéret <christophe.gueret@dans.knaw.nl>, Eric Stephan <ericphb@gmail.com>, Phil Archer <phila@w3.org>, DWBP WG <public-dwbp-wg@w3.org>
- Message-ID: <CA+pXJiiO85dq2pJKWPnFSsYCAkopw+FtGK-nx9Vuso5zu0G8dQ@mail.gmail.com>
I agree, Steve. And I am just pointing ODI certificates (not evaluating them). Best, Laufer Em quinta-feira, 26 de março de 2015, Steven Adler <adler1@us.ibm.com> escreveu: > Laufer, > > I agree we need to be careful and the discussion here is helping to > clarify the issues. I have been working with Data Quality for over 10 > years and have not seen any really good DQ rating systems in use beyond > very small scale enterprise deployments. I am not sure that ODI is a > source we need to rely on for guidance in this matter as their bench of DQ > experts is quite narrow. > > I would recommend that we continue to discuss this together and seek out > simple methods that can be easily implemented. It is easier to start > simple with something no one has today and then add to it as we gain > insights into usage patterns from use cases that emerge over time. > > > Best Regards, > > Steve > > Motto: "Do First, Think, Do it Again" > > [image: Inactive hide details for Laufer ---03/26/2015 01:02:17 PM---Hi > all, I've started this thread because the misunderstanding abou]Laufer > ---03/26/2015 01:02:17 PM---Hi all, I've started this thread because the > misunderstanding about the LOD 5 stars > > > > From: > > > Laufer <laufer@globo.com > <javascript:_e(%7B%7D,'cvml','laufer@globo.com');>> > > To: > > > Steven Adler/Somers/IBM@IBMUS > > Cc: > > > Christophe Guéret <christophe.gueret@dans.knaw.nl > <javascript:_e(%7B%7D,'cvml','christophe.gueret@dans.knaw.nl');>>, > Bernadette Farias Lóscio <bfl@cin.ufpe.br > <javascript:_e(%7B%7D,'cvml','bfl@cin.ufpe.br');>>, Eric Stephan < > ericphb@gmail.com <javascript:_e(%7B%7D,'cvml','ericphb@gmail.com');>>, > Phil Archer <phila@w3.org <javascript:_e(%7B%7D,'cvml','phila@w3.org');>>, > DWBP WG <public-dwbp-wg@w3.org > <javascript:_e(%7B%7D,'cvml','public-dwbp-wg@w3.org');>> > > Date: > > > 03/26/2015 01:02 PM > > Subject: > > > The 5 stars path > ------------------------------ > > > > Hi all, > > I've started this thread because the misunderstanding about the LOD 5 > stars scale, and how persons are using it as a way of classifying the > quality of data published on the web. > > I think that different axes of quality, each one with its own 5 stars > scale, could confuse even more people when someone attach a number of stars > to a dataset. Besidest that, there will be certificates around these > issues, probably taking into account several axes of quality. ODI already > has a certification process. > > So, I think we must be very careful with this subject and be very clear in > our texts in the documents. > > Abraços, > Laufer > > Em quinta-feira, 26 de março de 2015, Steven Adler <*adler1@us.ibm.com*> > escreveu: > > I like that approach, but that 5-star is not a Data Quality rating > system which I still think we need as part of BP. > > > Best Regards, > > Steve > > Motto: "Do First, Think, Do it Again" > > [image: Inactive hide details for Christophe Guéret ---03/25/2015 > 09:53:36 PM---BTW, speaking about stars and feedback we may want to h]Christophe > Guéret ---03/25/2015 09:53:36 PM---BTW, speaking about stars and feedback > we may want to have a look at the 5 star scheme for community > > > > From: > > > Christophe Guéret <christophe.gueret@dans.knaw.nl > <javascript:_e(%7B%7D,'cvml','christophe.gueret@dans.knaw.nl');>> > > To: > > > Steven Adler/Somers/IBM@IBMUS > > Cc: > > > Phil Archer <phila@w3.org <javascript:_e(%7B%7D,'cvml','phila@w3.org');>>, > Laufer <laufer@globo.com > <javascript:_e(%7B%7D,'cvml','laufer@globo.com');>>, Bernadette Farias > Lóscio <bfl@cin.ufpe.br <javascript:_e(%7B%7D,'cvml','bfl@cin.ufpe.br');>>, > DWBP WG <public-dwbp-wg@w3.org > <javascript:_e(%7B%7D,'cvml','public-dwbp-wg@w3.org');>>, Eric Stephan < > ericphb@gmail.com <javascript:_e(%7B%7D,'cvml','ericphb@gmail.com');>> > > Date: > > > 03/25/2015 09:53 PM > > Subject: > > > Re: The 5 stars path > > ------------------------------ > > > > BTW, speaking about stars and feedback we may want to have a look at > the 5 star scheme for community engagement from Tim Davies: > *http://www.opendataimpacts.net/engagement/* > <http://www.opendataimpacts.net/engagement/> > > We could probably do something with it, if only linking to it > somewhere. > > Cheers, > Christophe > > -- > Sent with difficulties. Sorry for the brievety and typos... > > Op 24 mrt. 2015 07:18 schreef "Steven Adler" <*adler1@us.ibm.com > <javascript:_e(%7B%7D,'cvml','adler1@us.ibm.com');>*>: > > Rating a dataset is only valuable if records within the dataset > have ratings whose sum or average validates the dataset rating. That is, > there has to be provenance to the ratings. > > > Best Regards, > > Steve > > Motto: "Do First, Think, Do it Again" > > [image: Inactive hide details for Bernadette Farias Lóscio > ---03/24/2015 10:11:38 AM---Hi all, Thanks for the great discussion!]Bernadette > Farias Lóscio ---03/24/2015 10:11:38 AM---Hi all, Thanks for the great > discussion! > From: > > Bernadette Farias Lóscio <*bfl@cin.ufpe.br > <javascript:_e(%7B%7D,'cvml','bfl@cin.ufpe.br');>*> > To: > > Eric Stephan <*ericphb@gmail.com > <javascript:_e(%7B%7D,'cvml','ericphb@gmail.com');>*> > Cc: > > Phil Archer <*phila@w3.org > <javascript:_e(%7B%7D,'cvml','phila@w3.org');>*>, Laufer <*laufer@globo.com > <javascript:_e(%7B%7D,'cvml','laufer@globo.com');>*>, Christophe > Guéret <*christophe.gueret@dans.knaw.nl > <javascript:_e(%7B%7D,'cvml','christophe.gueret@dans.knaw.nl');>*>, > DWBP WG <*public-dwbp-wg@w3.org > <javascript:_e(%7B%7D,'cvml','public-dwbp-wg@w3.org');>*> > Date: > > 03/24/2015 10:11 AM > Subject: > > Re: The 5 stars path > ------------------------------ > > > > Hi all, > > Thanks for the great discussion! > > I like the idea of having a star rating discussion, but we need to > be aware that publishing data on the Web is more than just publishing data > and metadata. It also concerns issues like data access and feedback. > > I've been thinking a lot about this rating system and it would be > great to consider all aspects related to data on the Web (ex: data format, > metadata, identifiers, data access, feedback, versioning...), but I'm bot > sure if this is the best choice. Maybe, we can have a rating system based > just on data and metadata, which is similar to the initial proposal of Phil. > > Cheers, > Bernadette > > 2015-03-22 18:38 GMT-03:00 Eric Stephan <*ericphb@gmail.com > <javascript:_e(%7B%7D,'cvml','ericphb@gmail.com');>*>: > Wow what a wonderful thread to read. Thank you Phil! Many many > thanks for this wonderful note of clarity! > > >>if Eric and Annette can provide similar examples for NetCDF > that would be terrific (I'm out of my depth here). > > Yes I think we can show this quite easily. Just off the top of > my heads. > > NetCDF: > - is an open format for storing multi-dimensional data > streams [NETCDF] > - can be annotated with self describing metadata (called > attributes) > - has existing conventions for representing different forms > of data. E.g. CF convention. > - has a CF vocabulary [CFNAMES] for curated climate and > forecasting terminology. > - In addition the climate community within the Earth System > Grid (ESG) has adopted fully documented protocols [CMIP5] to show how > regional and climate model datasets must be organized so that they can be > inter-related to support regional and global climate studies. > - Leverages existing ISO standards used in the geospatial, > dublin core, and metadata communities. > - Finally an ontology was developed by NASA JPL called SWEET > [SWEET], there is previous research showing how the CF terms can > inter-related. > > I would submit that even without the ontology in terms of open > data, the climate community is already at 5 star. > > > > Eric > > > References > > [NETCDF] *http://en.wikipedia.org/wiki/NetCDF* > <http://en.wikipedia.org/wiki/NetCDF> > [CFNAMES] > *http://cfconventions.org/Data/cf-standard-names/28/build/cf-standard-name-table.html* > <http://cfconventions.org/Data/cf-standard-names/28/build/cf-standard-name-table.html> > [CMIP5] *http://cmip-pcmdi.llnl.gov/cmip5/* > <http://cmip-pcmdi.llnl.gov/cmip5/> > [SWEET] *https://sweet.jpl.nasa.gov/* > <https://sweet.jpl.nasa.gov/> > > > On Sun, Mar 22, 2015 at 10:45 AM, Phil Archer <*phila@w3.org > <javascript:_e(%7B%7D,'cvml','phila@w3.org');>*> wrote: > We are in full agreement. > > One of my hopes for this WG is that we can indeed lead people > to publish formats like CSV in the best way (i.e. with good quality > metadata) without them feeling somehow inferior. > > If that leads us to define our own star rating system, I > wouldn't mind. Something like: > > * It's available on the Web in an open format with a declared > licence (anything less is all but useless). > > ** As level 1 with good quality discovery metadata (we might > refer to the DCAT Application profile work as an example). > > *** All the above plus structural metadata in the relevant > format (e.g. CSV+ for CSV, VoID for RDF etc). > > This doesn't include quality metrics (which it should), and > contact details (which it should) - but they might be defined at level 2? > > Maybe a start anyway. > > Phil. > > On 22/03/2015 13:50, Laufer wrote: > I agree, Phil. > > What I want to reinforce is that it would be nice if we > could make clear in > the document that 5 stars LD (or OD?) is not a scale of a > dataset that is > well published in the web. We can have, for example, a > "CSV dataset" (3 > stars) more well published than a "LD dataset" (5 stars). > Or, maybe, we can > avoid using the 5 stars when what we want to say is that a > dataset is being > published in a CSV format. > > If we say that one dataset is 3 stars and other is 5 > stars, people have the > idea that the 5 one is better than the 3 one (as in > reviews or hotels, for > example). > > We probably will not define our own scale but I hope that > our set of BPs > could help people to publish a "Well Published Data on > The Web". > > Best Regards, > Laufer > > Em domingo, 22 de março de 2015, Christophe Guéret < > * christophe.gueret@dans.knaw.nl > <javascript:_e(%7B%7D,'cvml','christophe.gueret@dans.knaw.nl');>* > <javascript:_e(%7B%7D,'cvml','*christophe.gueret@dans.knaw.nl > <javascript:_e(%7B%7D,'cvml','christophe.gueret@dans.knaw.nl');>*');>> > escreveu: > +1! > > Christophe > > -- > Sent with difficulties. Sorry for the brievety and > typos... > Op 22 mrt. 2015 08:47 schreef "Phil Archer" <*phila@w3.org > <javascript:_e(%7B%7D,'cvml','phila@w3.org');>*>: > I've just been reading through Friday's minutes and > I see that this was > the hot topic of the day. As ever, I'm sorry I > wasn't able to be there. > > Let me add my 2 cents. > > LD forms a small part of the available data on the > Web. It would be > silly of us to push for everyone to convert their > data into perfectly > linked 5 star data before they make it available > publicly or behind a > pay-wall of some kind. > > What we *can* do IMO is: > > - Promote the publication of human readable metadata > as Laufer has > described; > > - promote the publication of machine readable > metadata and then show how > this can be (and is) done with RDF using DCAT as an > example; > > - promote the publication of structural metadata > which, for CSV at > least, we have a very clear route - use the CSV on > the Web work; > > - if Eric and Annette can provide similar examples > for NetCDF that would > be terrific (I'm out of my depth here). > > - We can leave it to the Spatial Data on the Web WG > to handle spatial > stuff (as they are leaving some of their generic > issues to this group). > > As an aside, the CSV WG has resolved its issues now > and is expecting to > publish pretty much the stable version of its specs > in the first week of > April. > > If you publish data in your favourite format + > structural metadata in > whatever format goes with that (and the CSV WG is > using JSON for its > metadata) then you are providing a route through > which your users can > readily create 5 star data if they so wish. They may > or may not use LD > themselves but the concept behind it is, I hope, > clear enough to readers? > > From what I've read of Friday and the list since > then, I dare t hope > this is in line with the general mood of the WG? > > Phil. > > > > On 20/03/2015 18:09, Laufer wrote: > Thank, you, Eric. > > Abraços, > Laufer > > 2015-03-20 12:31 GMT-03:00 Eric Stephan <*ericphb@gmail.com > <javascript:_e(%7B%7D,'cvml','ericphb@gmail.com');>*>: > > Laufer and Bernadette, > > I raised an issue relating to this asking the > question can we use 5 > star > as a metric and not a path? > *http://www.w3.org/2013/dwbp/track/issues/148* > <http://www.w3.org/2013/dwbp/track/issues/148> > > Eric S. > > On Fri, Mar 20, 2015 at 7:54 AM, Bernadette > Farias Lóscio < > *bfl@cin.ufpe.br > <javascript:_e(%7B%7D,'cvml','bfl@cin.ufpe.br');>* > wrote: > Hi Laufer, > > Thanks for the message! It is a very useful > explanation! > > I fully agree with you: "In this dataset > publishing I can see the > idea of > publishing metadata and using standard > vocabularies, but is not a LD > dataset." > > IMHO, we can use vocabularies to publish > metadata, but we are not > doing > linked data, i.e., there are no links between > resources. > > I also agree that "we should differentiate the > idea of a Best > Practice of > a non LD dataset of the idea of an implicit Best > Practice to go to a > LD > dataset, that is what the 5 stars scale says.". > > If we have a BP whose implementation proposes the > use of the RDF > model to > publish data, then we are moving towards the 5 > stars. It is important > to > note that, publishind data using the RDF model > may be just one of the > proposed approaches for implementation, i.e, we > may show other ways of > publishing data without using RDF. > > Cheers, > Bernadette > > > > > 2015-03-20 11:32 GMT-03:00 Laufer <*laufer@globo.com > <javascript:_e(%7B%7D,'cvml','laufer@globo.com');>* > >: > > Hi all, > > I will start my comment using an example: > > Someone publish a page where there are links > to 2 files: > a csv file with a dataset; > a text file that explains the structure of the > dataset, in natural > language (metadata). > > In the page there are a lot of metadata > provided in natural > language, as > for example, an overview of the dataset, license, > organization, > version, > creator, rights, etc... > > At the same time, the page has an embedded dcat > instance using rdfa > where there are info about the dataset, the > distribution, etc. > > What I want to say is that we have here the > metadata concept mixed > with > semantic web concepts, and it is a way of > publishing data that, if > all the > things are well described, could be very useful > to the society. > > In this dataset publishing I can see the idea of > publishing metadata > and > using standard vocabularies, but is not a LD > dataset. > > What I was discussing in the last meeting is: > will we support in the > document the idea that the best way to publish is > LD. I am not > saying that > I am against or not the idea. I am favorable to > LD. But we should > differentiate the idea of a Best Practice of a > non LD dataset of the > idea > of an implicit Best Practice to go to a LD > dataset, that is what the > 5 > stars scale says. > > Maybe is too much care with the words, sorry > about this. > > Best Regards, > Laufer > > -- > . . . .. . . > . . . .. > . .. . > > > > -- > Bernadette Farias Lóscio > Centro de Informática > Universidade Federal de Pernambuco - UFPE, Brazil > ---------------------------------------------------------------------------- > > > -- > > > Phil Archer > W3C Data Activity Lead > *http://www.w3.org/2013/data/* <http://www.w3.org/2013/data/> > > *http://philarcher.org* <http://philarcher.org/> > *+44 (0)7887 767755* <%2B44%20%280%297887%20767755> > @philarcher1 > > -- > > > Phil Archer > W3C Data Activity Lead > *http://www.w3.org/2013/data/* <http://www.w3.org/2013/data/> > > *http://philarcher.org* <http://philarcher.org/> > *+44 (0)7887 767755* <%2B44%20%280%297887%20767755> > @philarcher1 > > > > -- > Bernadette Farias Lóscio > Centro de Informática > Universidade Federal de Pernambuco - UFPE, Brazil > > ---------------------------------------------------------------------------- > > > > > -- > . . . .. . . > . . . .. > . .. . > > -- . . . .. . . . . . .. . .. .
Attachments
- image/gif attachment: ecblank.gif
- image/gif attachment: graycol.gif
Received on Thursday, 26 March 2015 18:28:18 UTC