The 5 stars path from Laufer on 2015-03-26 (public-dwbp-wg@w3.org from March 2015)

From: Laufer <laufer@globo.com>
Date: Thu, 26 Mar 2015 14:01:06 -0300
To: Steven Adler <adler1@us.ibm.com>
Cc: Christophe Guéret <christophe.gueret@dans.knaw.nl>, Bernadette Farias Lóscio <bfl@cin.ufpe.br>, Eric Stephan <ericphb@gmail.com>, Phil Archer <phila@w3.org>, DWBP WG <public-dwbp-wg@w3.org>
Message-ID: <CA+pXJihmwh4=+mLTeVFZc=V-xxmr1-8rx90cvjKXR-_VN72Gnw@mail.gmail.com>
Hi all,

I've started this thread because the misunderstanding about the LOD 5 stars
scale, and how persons are using it as a way of classifying the quality of
data published on the web.

I think that different axes of quality, each one with its own 5 stars
scale, could confuse even more people when someone attach a number of stars
to a dataset. Besidest that, there will be certificates around these
issues, probably taking into account several axes of quality. ODI already
has a certification process.

So, I think we must be very careful with this subject and be very clear in
our texts in the documents.

Abraços,
Laufer

Em quinta-feira, 26 de março de 2015, Steven Adler <adler1@us.ibm.com
<javascript:_e(%7B%7D,'cvml','adler1@us.ibm.com');>> escreveu:

> I like that approach, but that 5-star is not a Data Quality rating system
> which I still think we need as part of BP.
>
>
> Best Regards,
>
> Steve
>
> Motto: "Do First, Think, Do it Again"
>
> [image: Inactive hide details for Christophe Guéret ---03/25/2015 09:53:36
> PM---BTW, speaking about stars and feedback we may want to h]Christophe
> Guéret ---03/25/2015 09:53:36 PM---BTW, speaking about stars and feedback
> we may want to have a look at the 5 star scheme for community
>
>
>
>    From:
>
>
> Christophe Guéret <christophe.gueret@dans.knaw.nl>
>
>    To:
>
>
> Steven Adler/Somers/IBM@IBMUS
>
>    Cc:
>
>
> Phil Archer <phila@w3.org>, Laufer <laufer@globo.com>, Bernadette Farias
> Lóscio <bfl@cin.ufpe.br>, DWBP WG <public-dwbp-wg@w3.org>, Eric Stephan <
> ericphb@gmail.com>
>
>    Date:
>
>
> 03/25/2015 09:53 PM
>
>    Subject:
>
>
> Re: The 5 stars path
> ------------------------------
>
>
>
> BTW, speaking about stars and feedback we may want to have a look at the 5
> star scheme for community engagement from Tim Davies:
> *http://www.opendataimpacts.net/engagement/*
> <http://www.opendataimpacts.net/engagement/>
>
> We could probably do something with it, if only linking to it somewhere.
>
> Cheers,
> Christophe
>
> --
> Sent with difficulties. Sorry for the brievety and typos...
>
> Op 24 mrt. 2015 07:18 schreef "Steven Adler" <*adler1@us.ibm.com*>:
>
>    Rating a dataset is only valuable if records within the dataset have
>    ratings whose sum or average validates the dataset rating.  That is, there
>    has to be provenance to the ratings.
>
>
>    Best Regards,
>
>    Steve
>
>    Motto: "Do First, Think, Do it Again"
>
>    [image: Inactive hide details for Bernadette Farias Lóscio
>    ---03/24/2015 10:11:38 AM---Hi all, Thanks for the great discussion!]Bernadette
>    Farias Lóscio ---03/24/2015 10:11:38 AM---Hi all, Thanks for the great
>    discussion!
>      From:
>
>    Bernadette Farias Lóscio <*bfl@cin.ufpe.br*>
>       To:
>
>    Eric Stephan <*ericphb@gmail.com*>
>       Cc:
>
>    Phil Archer <*phila@w3.org*>, Laufer <*laufer@globo.com*>, Christophe
>    Guéret <*christophe.gueret@dans.knaw.nl*>, DWBP WG <
>    *public-dwbp-wg@w3.org*>
>       Date:
>
>    03/24/2015 10:11 AM
>       Subject:
>
>    Re: The 5 stars path
>    ------------------------------
>
>
>
>    Hi all,
>
>    Thanks for the great discussion!
>
>    I like the idea of having a star rating discussion, but we need to be
>    aware that publishing data on the Web is more than just publishing data and
>    metadata. It also concerns issues like data access and feedback.
>
>    I've been thinking a lot about this rating system and it would be
>    great to consider all aspects related to data on the Web (ex: data format,
>    metadata, identifiers, data access, feedback, versioning...), but I'm bot
>    sure if this is the best choice. Maybe, we can have a rating system based
>    just on data and metadata, which is similar to the initial proposal of Phil.
>
>    Cheers,
>    Bernadette
>
>    2015-03-22 18:38 GMT-03:00 Eric Stephan <*ericphb@gmail.com*>:
>       Wow what a wonderful thread to read.  Thank you Phil!  Many many
>       thanks for this wonderful note of clarity!
>
>       >>if Eric and Annette can provide similar examples for NetCDF that
>       would be terrific (I'm out of my depth here).
>
>       Yes I think we can show this quite easily.  Just off the top of my
>       heads.
>
>       NetCDF:
>          - is an open format for storing multi-dimensional data streams
>       [NETCDF]
>          - can be annotated with self describing metadata (called
>       attributes)
>          - has existing conventions for representing different forms of
>       data.  E.g. CF convention.
>          - has a CF vocabulary [CFNAMES] for curated climate and
>       forecasting terminology.
>          - In addition the climate community within the Earth System Grid
>       (ESG) has adopted fully documented protocols [CMIP5] to show how regional
>       and climate model datasets must be organized so that they can be
>       inter-related to support regional and global climate studies.
>         - Leverages existing ISO standards used in the geospatial, dublin
>       core, and metadata communities.
>          - Finally an ontology was developed by NASA JPL called SWEET
>       [SWEET], there is previous research showing how the CF terms can
>       inter-related.
>
>       I would submit that even without the ontology in terms of open
>       data, the climate community is already at 5 star.
>
>
>
>       Eric
>
>
>       References
>
>       [NETCDF] *http://en.wikipedia.org/wiki/NetCDF*
>       <http://en.wikipedia.org/wiki/NetCDF>
>       [CFNAMES]
>       *http://cfconventions.org/Data/cf-standard-names/28/build/cf-standard-name-table.html*
>       <http://cfconventions.org/Data/cf-standard-names/28/build/cf-standard-name-table.html>
>       [CMIP5] *http://cmip-pcmdi.llnl.gov/cmip5/*
>       <http://cmip-pcmdi.llnl.gov/cmip5/>
>       [SWEET] *https://sweet.jpl.nasa.gov/* <https://sweet.jpl.nasa.gov/>
>
>
>       On Sun, Mar 22, 2015 at 10:45 AM, Phil Archer <*phila@w3.org*>
>       wrote:
>          We are in full agreement.
>
>          One of my hopes for this WG is that we can indeed lead people to
>          publish formats like CSV in the best way (i.e. with good quality metadata)
>          without them feeling somehow inferior.
>
>          If that leads us to define our own star rating system, I
>          wouldn't mind. Something like:
>
>          * It's available on the Web in an open format with a declared
>          licence (anything less is all but useless).
>
>          ** As level 1 with good quality discovery metadata (we might
>          refer to the DCAT Application profile work as an example).
>
>          *** All the above plus structural metadata in the relevant
>          format (e.g. CSV+ for CSV, VoID for RDF etc).
>
>          This doesn't include quality metrics (which it should), and
>          contact details (which it should) - but they might be defined at level 2?
>
>          Maybe a start anyway.
>
>          Phil.
>
>          On 22/03/2015 13:50, Laufer wrote:
>             I agree, Phil.
>
>             What I want to reinforce is that it would be nice if we could
>             make clear in
>             the document that 5 stars LD (or OD?) is not a scale of a
>             dataset that is
>             well published in the web. We can have, for example, a "CSV
>             dataset" (3
>             stars) more well published than a "LD dataset" (5 stars). Or,
>             maybe, we can
>             avoid using the 5 stars when what we want to say is that a
>             dataset is being
>             published in a CSV format.
>
>             If we say that one dataset is 3 stars and other is 5 stars,
>             people have the
>             idea that the 5 one is better than the 3 one (as in reviews
>             or hotels, for
>             example).
>
>             We probably will not define our own scale but I hope that our
>             set of BPs
>             could help people to publish a  "Well Published Data on The
>             Web".
>
>             Best Regards,
>             Laufer
>
>             Em domingo, 22 de março de 2015, Christophe Guéret <
> *christophe.gueret@dans.knaw.nl*
>             <javascript:_e(%7B%7D,'cvml','*christophe.gueret@dans.knaw.nl*');>>
>             escreveu:
>              +1!
>
>                Christophe
>
>                --
>                Sent with difficulties. Sorry for the brievety and typos...
>                Op 22 mrt. 2015 08:47 schreef "Phil Archer" <*phila@w3.org*
>                >:
>                   I've just been reading through Friday's minutes and I
>                   see that this was
>                   the hot topic of the day. As ever, I'm sorry I wasn't
>                   able to be there.
>
>                   Let me add my 2 cents.
>
>                   LD forms a small part of the available data on the Web.
>                   It would be
>                   silly of us to push for everyone to convert their data
>                   into perfectly
>                   linked 5 star data before they make it available
>                   publicly or behind a
>                   pay-wall of some kind.
>
>                   What we *can* do IMO is:
>
>                   - Promote the publication of human readable metadata as
>                   Laufer has
>                   described;
>
>                   - promote the publication of machine readable metadata
>                   and then show how
>                   this can be (and is) done with RDF using DCAT as an
>                   example;
>
>                   - promote the publication of structural metadata which,
>                   for CSV at
>                   least, we have a very clear route - use the CSV on the
>                   Web work;
>
>                   - if Eric and Annette can provide similar examples for
>                   NetCDF that would
>                   be terrific (I'm out of my depth here).
>
>                   - We can leave it to the Spatial Data on the Web WG to
>                   handle spatial
>                   stuff (as they are leaving some of their generic issues
>                   to this group).
>
>                   As an aside, the CSV WG has resolved its issues now and
>                   is expecting to
>                   publish pretty much the stable version of its specs in
>                   the first week of
>                   April.
>
>                   If you publish data in your favourite format +
>                   structural metadata in
>                   whatever format goes with that (and the CSV WG is using
>                   JSON for its
>                   metadata) then you are providing a route through which
>                   your users can
>                   readily create 5 star data if they so wish. They may or
>                   may not use LD
>                   themselves but the concept behind it is, I hope, clear
>                   enough to readers?
>
>                     From what I've read of Friday and the list since
>                   then, I dare t hope
>                   this is in line with the general mood of the WG?
>
>                   Phil.
>
>
>
>                   On 20/03/2015 18:09, Laufer wrote:
>                      Thank, you, Eric.
>
>                      Abraços,
>                      Laufer
>
>                      2015-03-20 12:31 GMT-03:00 Eric Stephan <
>                      *ericphb@gmail.com*>:
>                         Laufer and Bernadette,
>
>                         I raised an issue relating to this asking the
>                         question can we use 5
>                       star
>                      as a metric and not a path?
>                   *http://www.w3.org/2013/dwbp/track/issues/148*
>                   <http://www.w3.org/2013/dwbp/track/issues/148>
>
>                      Eric S.
>
>                      On Fri, Mar 20, 2015 at 7:54 AM, Bernadette Farias
>                      Lóscio <
>                   *bfl@cin.ufpe.br*
>                      wrote:
>                         Hi Laufer,
>
>                         Thanks for the message! It is a very useful
>                         explanation!
>
>                         I fully agree with you: "In this dataset
>                         publishing I can see the
>                       idea of
>                      publishing metadata and using standard vocabularies,
>                      but is not a LD
>                      dataset."
>
>                      IMHO, we can use vocabularies to publish metadata,
>                      but we are not
>                   doing
>                      linked data, i.e., there are no links between
>                      resources.
>
>                      I also agree that "we should differentiate the idea
>                      of a Best
>                   Practice of
>                      a non LD dataset of the idea of an implicit Best
>                      Practice to go to a
>                   LD
>                      dataset, that is what the 5 stars scale says.".
>
>                      If we have a BP whose implementation proposes the
>                      use of the RDF
>                   model to
>                      publish data, then we are moving towards the 5
>                      stars. It is important
>                   to
>                      note that, publishind data using the RDF model may
>                      be just one of the
>                      proposed approaches for implementation, i.e, we may
>                      show other ways of
>                      publishing data without using RDF.
>
>                      Cheers,
>                      Bernadette
>
>
>
>
>                      2015-03-20 11:32 GMT-03:00 Laufer <*laufer@globo.com*
>                      >:
>
>                      Hi all,
>
>                         I will start my comment using an example:
>
>                         Someone publish a page where there are links to 2
>                         files:
>                         a csv file with a dataset;
>                         a text file that explains the structure of the
>                         dataset, in natural
>                         language (metadata).
>
>                         In the page there are a lot of metadata provided
>                         in natural
>                       language, as
>                      for example, an overview of the dataset, license,
>                      organization,
>                   version,
>                      creator, rights, etc...
>
>                      At the same time, the page has an embedded dcat
>                      instance using rdfa
>                      where there are info about the dataset, the
>                      distribution, etc.
>
>                      What I want to say is that we have here the metadata
>                      concept mixed
>                   with
>                      semantic web concepts, and it is a way of publishing
>                      data that, if
>                   all the
>                      things are well described, could be very useful to
>                      the society.
>
>                      In this dataset publishing I can see the idea of
>                      publishing metadata
>                   and
>                      using standard vocabularies, but is not a LD dataset.
>
>                      What I was discussing in the last meeting is: will
>                      we support in the
>                      document the idea that the best way to publish is
>                      LD. I am not
>                   saying that
>                      I am against or not the idea. I am favorable to LD.
>                      But we should
>                      differentiate the idea of a Best Practice of a non
>                      LD dataset of the
>                   idea
>                      of an implicit Best Practice to go to a LD dataset,
>                      that is what the
>                   5
>                      stars scale says.
>
>                      Maybe is too much care with the words, sorry about
>                      this.
>
>                      Best Regards,
>                      Laufer
>
>                      --
>                      .  .  .  .. .  .
>                      .        .   . ..
>                      .     ..       .
>
>
>
>                      --
>                      Bernadette Farias Lóscio
>                      Centro de Informática
>                      Universidade Federal de Pernambuco - UFPE, Brazil
>                       ----------------------------------------------------------------------------
>
>
>
>                   --
>
>
>                   Phil Archer
>                   W3C Data Activity Lead
> *http://www.w3.org/2013/data/* <http://www.w3.org/2013/data/>
>
> *http://philarcher.org* <http://philarcher.org/>
> *+44 (0)7887 767755* <%2B44%20%280%297887%20767755>
>                   @philarcher1
>
>          --
>
>
>          Phil Archer
>          W3C Data Activity Lead
> *http://www.w3.org/2013/data/* <http://www.w3.org/2013/data/>
>
> *http://philarcher.org* <http://philarcher.org/>
> *+44 (0)7887 767755* <%2B44%20%280%297887%20767755>
>          @philarcher1
>
>
>
>    --
>    Bernadette Farias Lóscio
>    Centro de Informática
>    Universidade Federal de Pernambuco - UFPE, Brazil
>
>    ----------------------------------------------------------------------------
>
>
>
>

-- 
.  .  .  .. .  .
.        .   . ..
.     ..       .
Received on Thursday, 26 March 2015 17:01:35 UTC