W3C home > Mailing lists > Public > public-dwbp-wg@w3.org > March 2015

Re: The 5 stars path

From: Laufer <laufer@globo.com>
Date: Thu, 26 Mar 2015 15:27:48 -0300
Message-ID: <CA+pXJiiO85dq2pJKWPnFSsYCAkopw+FtGK-nx9Vuso5zu0G8dQ@mail.gmail.com>
To: Steven Adler <adler1@us.ibm.com>
Cc: Bernadette Farias Lóscio <bfl@cin.ufpe.br>, Christophe Guéret <christophe.gueret@dans.knaw.nl>, Eric Stephan <ericphb@gmail.com>, Phil Archer <phila@w3.org>, DWBP WG <public-dwbp-wg@w3.org>
I agree, Steve.

And I am just pointing ODI certificates (not evaluating them).

Best,
Laufer

Em quinta-feira, 26 de março de 2015, Steven Adler <adler1@us.ibm.com>
escreveu:

> Laufer,
>
> I agree we need to be careful and the discussion here is helping to
> clarify the issues.  I have been working with Data Quality for over 10
> years and have not seen any really good DQ rating systems in use beyond
> very small scale enterprise deployments.  I am not sure that ODI is a
> source we need to rely on for guidance in this matter as their bench of DQ
> experts is quite narrow.
>
> I would recommend that we continue to discuss this together and seek out
> simple methods that can be easily implemented.  It is easier to start
> simple with something no one has today and then add to it as we gain
> insights into usage patterns from use cases that emerge over time.
>
>
> Best Regards,
>
> Steve
>
> Motto: "Do First, Think, Do it Again"
>
> [image: Inactive hide details for Laufer ---03/26/2015 01:02:17 PM---Hi
> all, I've started this thread because the misunderstanding abou]Laufer
> ---03/26/2015 01:02:17 PM---Hi all, I've started this thread because the
> misunderstanding about the LOD 5 stars
>
>
>
>    From:
>
>
> Laufer <laufer@globo.com
> <javascript:_e(%7B%7D,'cvml','laufer@globo.com');>>
>
>    To:
>
>
> Steven Adler/Somers/IBM@IBMUS
>
>    Cc:
>
>
> Christophe Guéret <christophe.gueret@dans.knaw.nl
> <javascript:_e(%7B%7D,'cvml','christophe.gueret@dans.knaw.nl');>>,
> Bernadette Farias Lóscio <bfl@cin.ufpe.br
> <javascript:_e(%7B%7D,'cvml','bfl@cin.ufpe.br');>>, Eric Stephan <
> ericphb@gmail.com <javascript:_e(%7B%7D,'cvml','ericphb@gmail.com');>>,
> Phil Archer <phila@w3.org <javascript:_e(%7B%7D,'cvml','phila@w3.org');>>,
> DWBP WG <public-dwbp-wg@w3.org
> <javascript:_e(%7B%7D,'cvml','public-dwbp-wg@w3.org');>>
>
>    Date:
>
>
> 03/26/2015 01:02 PM
>
>    Subject:
>
>
> The 5 stars path
> ------------------------------
>
>
>
> Hi all,
>
> I've started this thread because the misunderstanding about the LOD 5
> stars scale, and how persons are using it as a way of classifying the
> quality of data published on the web.
>
> I think that different axes of quality, each one with its own 5 stars
> scale, could confuse even more people when someone attach a number of stars
> to a dataset. Besidest that, there will be certificates around these
> issues, probably taking into account several axes of quality. ODI already
> has a certification process.
>
> So, I think we must be very careful with this subject and be very clear in
> our texts in the documents.
>
> Abraços,
> Laufer
>
> Em quinta-feira, 26 de março de 2015, Steven Adler <*adler1@us.ibm.com*>
> escreveu:
>
>    I like that approach, but that 5-star is not a Data Quality rating
>    system which I still think we need as part of BP.
>
>
>    Best Regards,
>
>    Steve
>
>    Motto: "Do First, Think, Do it Again"
>
>    [image: Inactive hide details for Christophe Guéret ---03/25/2015
>    09:53:36 PM---BTW, speaking about stars and feedback we may want to h]Christophe
>    Guéret ---03/25/2015 09:53:36 PM---BTW, speaking about stars and feedback
>    we may want to have a look at the 5 star scheme for community
>
>
>
>    From:
>
>
> Christophe Guéret <christophe.gueret@dans.knaw.nl
> <javascript:_e(%7B%7D,'cvml','christophe.gueret@dans.knaw.nl');>>
>
>    To:
>
>
> Steven Adler/Somers/IBM@IBMUS
>
>    Cc:
>
>
> Phil Archer <phila@w3.org <javascript:_e(%7B%7D,'cvml','phila@w3.org');>>,
> Laufer <laufer@globo.com
> <javascript:_e(%7B%7D,'cvml','laufer@globo.com');>>, Bernadette Farias
> Lóscio <bfl@cin.ufpe.br <javascript:_e(%7B%7D,'cvml','bfl@cin.ufpe.br');>>,
> DWBP WG <public-dwbp-wg@w3.org
> <javascript:_e(%7B%7D,'cvml','public-dwbp-wg@w3.org');>>, Eric Stephan <
> ericphb@gmail.com <javascript:_e(%7B%7D,'cvml','ericphb@gmail.com');>>
>
>    Date:
>
>
> 03/25/2015 09:53 PM
>
>    Subject:
>
>
> Re: The 5 stars path
>
>    ------------------------------
>
>
>
>    BTW, speaking about stars and feedback we may want to have a look at
>    the 5 star scheme for community engagement from Tim Davies:
> *http://www.opendataimpacts.net/engagement/*
>    <http://www.opendataimpacts.net/engagement/>
>
>    We could probably do something with it, if only linking to it
>    somewhere.
>
>    Cheers,
>    Christophe
>
>    --
>    Sent with difficulties. Sorry for the brievety and typos...
>
>    Op 24 mrt. 2015 07:18 schreef "Steven Adler" <*adler1@us.ibm.com
>    <javascript:_e(%7B%7D,'cvml','adler1@us.ibm.com');>*>:
>
>       Rating a dataset is only valuable if records within the dataset
>       have ratings whose sum or average validates the dataset rating.  That is,
>       there has to be provenance to the ratings.
>
>
>       Best Regards,
>
>       Steve
>
>       Motto: "Do First, Think, Do it Again"
>
>       [image: Inactive hide details for Bernadette Farias Lóscio
>       ---03/24/2015 10:11:38 AM---Hi all, Thanks for the great discussion!]Bernadette
>       Farias Lóscio ---03/24/2015 10:11:38 AM---Hi all, Thanks for the great
>       discussion!
>          From:
>
>       Bernadette Farias Lóscio <*bfl@cin.ufpe.br
>       <javascript:_e(%7B%7D,'cvml','bfl@cin.ufpe.br');>*>
>          To:
>
>       Eric Stephan <*ericphb@gmail.com
>       <javascript:_e(%7B%7D,'cvml','ericphb@gmail.com');>*>
>          Cc:
>
>       Phil Archer <*phila@w3.org
>       <javascript:_e(%7B%7D,'cvml','phila@w3.org');>*>, Laufer <*laufer@globo.com
>       <javascript:_e(%7B%7D,'cvml','laufer@globo.com');>*>, Christophe
>       Guéret <*christophe.gueret@dans.knaw.nl
>       <javascript:_e(%7B%7D,'cvml','christophe.gueret@dans.knaw.nl');>*>,
>       DWBP WG <*public-dwbp-wg@w3.org
>       <javascript:_e(%7B%7D,'cvml','public-dwbp-wg@w3.org');>*>
>          Date:
>
>       03/24/2015 10:11 AM
>          Subject:
>
>       Re: The 5 stars path
>       ------------------------------
>
>
>
>       Hi all,
>
>       Thanks for the great discussion!
>
>       I like the idea of having a star rating discussion, but we need to
>       be aware that publishing data on the Web is more than just publishing data
>       and metadata. It also concerns issues like data access and feedback.
>
>       I've been thinking a lot about this rating system and it would be
>       great to consider all aspects related to data on the Web (ex: data format,
>       metadata, identifiers, data access, feedback, versioning...), but I'm bot
>       sure if this is the best choice. Maybe, we can have a rating system based
>       just on data and metadata, which is similar to the initial proposal of Phil.
>
>       Cheers,
>       Bernadette
>
>       2015-03-22 18:38 GMT-03:00 Eric Stephan <*ericphb@gmail.com
>       <javascript:_e(%7B%7D,'cvml','ericphb@gmail.com');>*>:
>          Wow what a wonderful thread to read.  Thank you Phil!  Many many
>          thanks for this wonderful note of clarity!
>
>          >>if Eric and Annette can provide similar examples for NetCDF
>          that would be terrific (I'm out of my depth here).
>
>          Yes I think we can show this quite easily.  Just off the top of
>          my heads.
>
>          NetCDF:
>             - is an open format for storing multi-dimensional data
>          streams [NETCDF]
>             - can be annotated with self describing metadata (called
>          attributes)
>             - has existing conventions for representing different forms
>          of data.  E.g. CF convention.
>             - has a CF vocabulary [CFNAMES] for curated climate and
>          forecasting terminology.
>             - In addition the climate community within the Earth System
>          Grid (ESG) has adopted fully documented protocols [CMIP5] to show how
>          regional and climate model datasets must be organized so that they can be
>          inter-related to support regional and global climate studies.
>            - Leverages existing ISO standards used in the geospatial,
>          dublin core, and metadata communities.
>             - Finally an ontology was developed by NASA JPL called SWEET
>          [SWEET], there is previous research showing how the CF terms can
>          inter-related.
>
>          I would submit that even without the ontology in terms of open
>          data, the climate community is already at 5 star.
>
>
>
>          Eric
>
>
>          References
>
>          [NETCDF] *http://en.wikipedia.org/wiki/NetCDF*
>          <http://en.wikipedia.org/wiki/NetCDF>
>          [CFNAMES]
>          *http://cfconventions.org/Data/cf-standard-names/28/build/cf-standard-name-table.html*
>          <http://cfconventions.org/Data/cf-standard-names/28/build/cf-standard-name-table.html>
>          [CMIP5] *http://cmip-pcmdi.llnl.gov/cmip5/*
>          <http://cmip-pcmdi.llnl.gov/cmip5/>
>          [SWEET] *https://sweet.jpl.nasa.gov/*
>          <https://sweet.jpl.nasa.gov/>
>
>
>          On Sun, Mar 22, 2015 at 10:45 AM, Phil Archer <*phila@w3.org
>          <javascript:_e(%7B%7D,'cvml','phila@w3.org');>*> wrote:
>             We are in full agreement.
>
>             One of my hopes for this WG is that we can indeed lead people
>             to publish formats like CSV in the best way (i.e. with good quality
>             metadata) without them feeling somehow inferior.
>
>             If that leads us to define our own star rating system, I
>             wouldn't mind. Something like:
>
>             * It's available on the Web in an open format with a declared
>             licence (anything less is all but useless).
>
>             ** As level 1 with good quality discovery metadata (we might
>             refer to the DCAT Application profile work as an example).
>
>             *** All the above plus structural metadata in the relevant
>             format (e.g. CSV+ for CSV, VoID for RDF etc).
>
>             This doesn't include quality metrics (which it should), and
>             contact details (which it should) - but they might be defined at level 2?
>
>             Maybe a start anyway.
>
>             Phil.
>
>             On 22/03/2015 13:50, Laufer wrote:
>                I agree, Phil.
>
>                What I want to reinforce is that it would be nice if we
>                could make clear in
>                the document that 5 stars LD (or OD?) is not a scale of a
>                dataset that is
>                well published in the web. We can have, for example, a
>                "CSV dataset" (3
>                stars) more well published than a "LD dataset" (5 stars).
>                Or, maybe, we can
>                avoid using the 5 stars when what we want to say is that a
>                dataset is being
>                published in a CSV format.
>
>                If we say that one dataset is 3 stars and other is 5
>                stars, people have the
>                idea that the 5 one is better than the 3 one (as in
>                reviews or hotels, for
>                example).
>
>                We probably will not define our own scale but I hope that
>                our set of BPs
>                could help people to publish a  "Well Published Data on
>                The Web".
>
>                Best Regards,
>                Laufer
>
>                Em domingo, 22 de março de 2015, Christophe Guéret <
> * christophe.gueret@dans.knaw.nl
>                <javascript:_e(%7B%7D,'cvml','christophe.gueret@dans.knaw.nl');>*
>                <javascript:_e(%7B%7D,'cvml','*christophe.gueret@dans.knaw.nl
>                <javascript:_e(%7B%7D,'cvml','christophe.gueret@dans.knaw.nl');>*');>>
>                escreveu:
>                   +1!
>
>                   Christophe
>
>                   --
>                   Sent with difficulties. Sorry for the brievety and
>                   typos...
>                   Op 22 mrt. 2015 08:47 schreef "Phil Archer" <*phila@w3.org
>                   <javascript:_e(%7B%7D,'cvml','phila@w3.org');>*>:
>                      I've just been reading through Friday's minutes and
>                      I see that this was
>                      the hot topic of the day. As ever, I'm sorry I
>                      wasn't able to be there.
>
>                      Let me add my 2 cents.
>
>                      LD forms a small part of the available data on the
>                      Web. It would be
>                      silly of us to push for everyone to convert their
>                      data into perfectly
>                      linked 5 star data before they make it available
>                      publicly or behind a
>                      pay-wall of some kind.
>
>                      What we *can* do IMO is:
>
>                      - Promote the publication of human readable metadata
>                      as Laufer has
>                      described;
>
>                      - promote the publication of machine readable
>                      metadata and then show how
>                      this can be (and is) done with RDF using DCAT as an
>                      example;
>
>                      - promote the publication of structural metadata
>                      which, for CSV at
>                      least, we have a very clear route - use the CSV on
>                      the Web work;
>
>                      - if Eric and Annette can provide similar examples
>                      for NetCDF that would
>                      be terrific (I'm out of my depth here).
>
>                      - We can leave it to the Spatial Data on the Web WG
>                      to handle spatial
>                      stuff (as they are leaving some of their generic
>                      issues to this group).
>
>                      As an aside, the CSV WG has resolved its issues now
>                      and is expecting to
>                      publish pretty much the stable version of its specs
>                      in the first week of
>                      April.
>
>                      If you publish data in your favourite format +
>                      structural metadata in
>                      whatever format goes with that (and the CSV WG is
>                      using JSON for its
>                      metadata) then you are providing a route through
>                      which your users can
>                      readily create 5 star data if they so wish. They may
>                      or may not use LD
>                      themselves but the concept behind it is, I hope,
>                      clear enough to readers?
>
>                        From what I've read of Friday and the list since
>                      then, I dare t hope
>                      this is in line with the general mood of the WG?
>
>                      Phil.
>
>
>
>                      On 20/03/2015 18:09, Laufer wrote:
>                         Thank, you, Eric.
>
>                         Abraços,
>                         Laufer
>
>                         2015-03-20 12:31 GMT-03:00 Eric Stephan <*ericphb@gmail.com
>                         <javascript:_e(%7B%7D,'cvml','ericphb@gmail.com');>*>:
>
>                            Laufer and Bernadette,
>
>                            I raised an issue relating to this asking the
>                            question can we use 5
>                          star
>                         as a metric and not a path?
>                      *http://www.w3.org/2013/dwbp/track/issues/148*
>                      <http://www.w3.org/2013/dwbp/track/issues/148>
>
>                         Eric S.
>
>                         On Fri, Mar 20, 2015 at 7:54 AM, Bernadette
>                         Farias Lóscio <
>                      *bfl@cin.ufpe.br
>                      <javascript:_e(%7B%7D,'cvml','bfl@cin.ufpe.br');>*
>                         wrote:
>                            Hi Laufer,
>
>                            Thanks for the message! It is a very useful
>                            explanation!
>
>                            I fully agree with you: "In this dataset
>                            publishing I can see the
>                          idea of
>                         publishing metadata and using standard
>                         vocabularies, but is not a LD
>                         dataset."
>
>                         IMHO, we can use vocabularies to publish
>                         metadata, but we are not
>                      doing
>                         linked data, i.e., there are no links between
>                         resources.
>
>                         I also agree that "we should differentiate the
>                         idea of a Best
>                      Practice of
>                         a non LD dataset of the idea of an implicit Best
>                         Practice to go to a
>                      LD
>                         dataset, that is what the 5 stars scale says.".
>
>                         If we have a BP whose implementation proposes the
>                         use of the RDF
>                      model to
>                         publish data, then we are moving towards the 5
>                         stars. It is important
>                      to
>                         note that, publishind data using the RDF model
>                         may be just one of the
>                         proposed approaches for implementation, i.e, we
>                         may show other ways of
>                         publishing data without using RDF.
>
>                         Cheers,
>                         Bernadette
>
>
>
>
>                         2015-03-20 11:32 GMT-03:00 Laufer <*laufer@globo.com
>                         <javascript:_e(%7B%7D,'cvml','laufer@globo.com');>*
>                         >:
>
>                         Hi all,
>
>                            I will start my comment using an example:
>
>                            Someone publish a page where there are links
>                            to 2 files:
>                            a csv file with a dataset;
>                            a text file that explains the structure of the
>                            dataset, in natural
>                            language (metadata).
>
>                            In the page there are a lot of metadata
>                            provided in natural
>                          language, as
>                         for example, an overview of the dataset, license,
>                         organization,
>                      version,
>                         creator, rights, etc...
>
>                         At the same time, the page has an embedded dcat
>                         instance using rdfa
>                         where there are info about the dataset, the
>                         distribution, etc.
>
>                         What I want to say is that we have here the
>                         metadata concept mixed
>                      with
>                         semantic web concepts, and it is a way of
>                         publishing data that, if
>                      all the
>                         things are well described, could be very useful
>                         to the society.
>
>                         In this dataset publishing I can see the idea of
>                         publishing metadata
>                      and
>                         using standard vocabularies, but is not a LD
>                         dataset.
>
>                         What I was discussing in the last meeting is:
>                         will we support in the
>                         document the idea that the best way to publish is
>                         LD. I am not
>                      saying that
>                         I am against or not the idea. I am favorable to
>                         LD. But we should
>                         differentiate the idea of a Best Practice of a
>                         non LD dataset of the
>                      idea
>                         of an implicit Best Practice to go to a LD
>                         dataset, that is what the
>                      5
>                         stars scale says.
>
>                         Maybe is too much care with the words, sorry
>                         about this.
>
>                         Best Regards,
>                         Laufer
>
>                         --
>                         .  .  .  .. .  .
>                         .        .   . ..
>                         .     ..       .
>
>
>
>                         --
>                         Bernadette Farias Lóscio
>                         Centro de Informática
>                         Universidade Federal de Pernambuco - UFPE, Brazil
>                      ----------------------------------------------------------------------------
>
>
>                      --
>
>
>                      Phil Archer
>                      W3C Data Activity Lead
> *http://www.w3.org/2013/data/* <http://www.w3.org/2013/data/>
>
> *http://philarcher.org* <http://philarcher.org/>
> *+44 (0)7887 767755* <%2B44%20%280%297887%20767755>
>                      @philarcher1
>
>             --
>
>
>             Phil Archer
>             W3C Data Activity Lead
> *http://www.w3.org/2013/data/* <http://www.w3.org/2013/data/>
>
> *http://philarcher.org* <http://philarcher.org/>
> *+44 (0)7887 767755* <%2B44%20%280%297887%20767755>
>             @philarcher1
>
>
>
>       --
>       Bernadette Farias Lóscio
>       Centro de Informática
>       Universidade Federal de Pernambuco - UFPE, Brazil
>
>       ----------------------------------------------------------------------------
>
>
>
>
> --
> .  .  .  .. .  .
> .        .   . ..
> .     ..       .
>
>

-- 
.  .  .  .. .  .
.        .   . ..
.     ..       .


ecblank.gif
(image/gif attachment: ecblank.gif)

graycol.gif
(image/gif attachment: graycol.gif)

Received on Thursday, 26 March 2015 18:28:18 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 26 March 2015 18:28:18 UTC