Re: The 5 stars path from Steven Adler on 2015-03-26 (public-dwbp-wg@w3.org from March 2015)

From: Steven Adler <adler1@us.ibm.com>
Date: Thu, 26 Mar 2015 09:48:17 -0400
To: Christophe Guéret <christophe.gueret@dans.knaw.nl>
Cc: Bernadette Farias Lóscio <bfl@cin.ufpe.br>, Eric Stephan <ericphb@gmail.com>, Laufer <laufer@globo.com>, Phil Archer <phila@w3.org>, DWBP WG <public-dwbp-wg@w3.org>
Message-ID: <OF25D847A9.9328820B-ON85257E14.004BBAC9-85257E14.004BD4EE@us.ibm.com>
I like that approach, but that 5-star is not a Data Quality rating system
which I still think we need as part of BP.


Best Regards,

Steve

Motto: "Do First, Think, Do it Again"


|------------>
| From:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Christophe Guéret <christophe.gueret@dans.knaw.nl>                                                                                                |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| To:        |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Steven Adler/Somers/IBM@IBMUS                                                                                                                     |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Cc:        |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Phil Archer <phila@w3.org>, Laufer <laufer@globo.com>, Bernadette Farias Lóscio <bfl@cin.ufpe.br>, DWBP WG <public-dwbp-wg@w3.org>, Eric Stephan  |
  |<ericphb@gmail.com>                                                                                                                               |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Date:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |03/25/2015 09:53 PM                                                                                                                               |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Subject:   |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Re: The 5 stars path                                                                                                                              |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|





BTW, speaking about stars and feedback we may want to have a look at the 5
star scheme for community engagement from Tim Davies:
http://www.opendataimpacts.net/engagement/


We could probably do something with it, if only linking to it somewhere.


Cheers,
Christophe


--
Sent with difficulties. Sorry for the brievety and typos...


Op 24 mrt. 2015 07:18 schreef "Steven Adler" <adler1@us.ibm.com>:
  Rating a dataset is only valuable if records within the dataset have
  ratings whose sum or average validates the dataset rating.  That is,
  there has to be provenance to the ratings.


  Best Regards,

  Steve

  Motto: "Do First, Think, Do it Again"

  Inactive hide details for Bernadette Farias Lóscio ---03/24/2015 10:11:38
  AM---Hi all, Thanks for the great discussion!Bernadette Farias Lóscio
  ---03/24/2015 10:11:38 AM---Hi all, Thanks for the great discussion!
                                                                           
                                                                           
       Fro Bernadette Farias Lóscio <bfl@cin.ufpe.br>                      
       m:                                                                  
                                                                           
                                                                           
       To: Eric Stephan <ericphb@gmail.com>                                
                                                                           
                                                                           
       Cc: Phil Archer <phila@w3.org>, Laufer <laufer@globo.com>,          
           Christophe Guéret <christophe.gueret@dans.knaw.nl>, DWBP WG <   
           public-dwbp-wg@w3.org>                                          
                                                                           
                                                                           
       Dat 03/24/2015 10:11 AM                                             
       e:                                                                  
                                                                           
                                                                           
       Sub Re: The 5 stars path                                            
       jec                                                                 
       t:                                                                  
                                                                           





  Hi all,

  Thanks for the great discussion!

  I like the idea of having a star rating discussion, but we need to be
  aware that publishing data on the Web is more than just publishing data
  and metadata. It also concerns issues like data access and feedback.

  I've been thinking a lot about this rating system and it would be great
  to consider all aspects related to data on the Web (ex: data format,
  metadata, identifiers, data access, feedback, versioning...), but I'm bot
  sure if this is the best choice. Maybe, we can have a rating system based
  just on data and metadata, which is similar to the initial proposal of
  Phil.

  Cheers,
  Bernadette

  2015-03-22 18:38 GMT-03:00 Eric Stephan <ericphb@gmail.com>:
        Wow what a wonderful thread to read.  Thank you Phil!  Many many
        thanks for this wonderful note of clarity!

        >>if Eric and Annette can provide similar examples for NetCDF that
        would be terrific (I'm out of my depth here).

        Yes I think we can show this quite easily.  Just off the top of my
        heads.

        NetCDF:
           - is an open format for storing multi-dimensional data streams
        [NETCDF]
           - can be annotated with self describing metadata (called
        attributes)
           - has existing conventions for representing different forms of
        data.  E.g. CF convention.
           - has a CF vocabulary [CFNAMES] for curated climate and
        forecasting terminology.
           - In addition the climate community within the Earth System Grid
        (ESG) has adopted fully documented protocols [CMIP5] to show how
        regional and climate model datasets must be organized so that they
        can be inter-related to support regional and global climate
        studies.
          - Leverages existing ISO standards used in the geospatial, dublin
        core, and metadata communities.
           - Finally an ontology was developed by NASA JPL called SWEET
        [SWEET], there is previous research showing how the CF terms can
        inter-related.

        I would submit that even without the ontology in terms of open
        data, the climate community is already at 5 star.



        Eric


        References

        [NETCDF] http://en.wikipedia.org/wiki/NetCDF
        [CFNAMES]
        http://cfconventions.org/Data/cf-standard-names/28/build/cf-standard-name-table.html

        [CMIP5] http://cmip-pcmdi.llnl.gov/cmip5/
        [SWEET] https://sweet.jpl.nasa.gov/


        On Sun, Mar 22, 2015 at 10:45 AM, Phil Archer <phila@w3.org> wrote:
              We are in full agreement.

              One of my hopes for this WG is that we can indeed lead people
              to publish formats like CSV in the best way (i.e. with good
              quality metadata) without them feeling somehow inferior.

              If that leads us to define our own star rating system, I
              wouldn't mind. Something like:

              * It's available on the Web in an open format with a declared
              licence (anything less is all but useless).

              ** As level 1 with good quality discovery metadata (we might
              refer to the DCAT Application profile work as an example).

              *** All the above plus structural metadata in the relevant
              format (e.g. CSV+ for CSV, VoID for RDF etc).

              This doesn't include quality metrics (which it should), and
              contact details (which it should) - but they might be defined
              at level 2?

              Maybe a start anyway.

              Phil.

              On 22/03/2015 13:50, Laufer wrote:
                    I agree, Phil.

                    What I want to reinforce is that it would be nice if we
                    could make clear in
                    the document that 5 stars LD (or OD?) is not a scale of
                    a dataset that is
                    well published in the web. We can have, for example, a
                    "CSV dataset" (3
                    stars) more well published than a "LD dataset" (5
                    stars). Or, maybe, we can
                    avoid using the 5 stars when what we want to say is
                    that a dataset is being
                    published in a CSV format.

                    If we say that one dataset is 3 stars and other is 5
                    stars, people have the
                    idea that the 5 one is better than the 3 one (as in
                    reviews or hotels, for
                    example).

                    We probably will not define our own scale but I hope
                    that our set of BPs
                    could help people to publish a  "Well Published Data on
                    The Web".

                    Best Regards,
                    Laufer

                    Em domingo, 22 de março de 2015, Christophe Guéret <
                    christophe.gueret@dans.knaw.nl
                    <javascript:_e(%7B%7D,'cvml','
                    christophe.gueret@dans.knaw.nl');>> escreveu:

                          +1!

                          Christophe

                          --
                          Sent with difficulties. Sorry for the brievety
                          and typos...
                          Op 22 mrt. 2015 08:47 schreef "Phil Archer" <
                          phila@w3.org>:
                                I've just been reading through Friday's
                                minutes and I see that this was
                                the hot topic of the day. As ever, I'm
                                sorry I wasn't able to be there.

                                Let me add my 2 cents.

                                LD forms a small part of the available data
                                on the Web. It would be
                                silly of us to push for everyone to convert
                                their data into perfectly
                                linked 5 star data before they make it
                                available publicly or behind a
                                pay-wall of some kind.

                                What we *can* do IMO is:

                                - Promote the publication of human readable
                                metadata as Laufer has
                                described;

                                - promote the publication of machine
                                readable metadata and then show how
                                this can be (and is) done with RDF using
                                DCAT as an example;

                                - promote the publication of structural
                                metadata which, for CSV at
                                least, we have a very clear route - use the
                                CSV on the Web work;

                                - if Eric and Annette can provide similar
                                examples for NetCDF that would
                                be terrific (I'm out of my depth here).

                                - We can leave it to the Spatial Data on
                                the Web WG to handle spatial
                                stuff (as they are leaving some of their
                                generic issues to this group).

                                As an aside, the CSV WG has resolved its
                                issues now and is expecting to
                                publish pretty much the stable version of
                                its specs in the first week of
                                April.

                                If you publish data in your favourite
                                format + structural metadata in
                                whatever format goes with that (and the CSV
                                WG is using JSON for its
                                metadata) then you are providing a route
                                through which your users can
                                readily create 5 star data if they so wish.
                                They may or may not use LD
                                themselves but the concept behind it is, I
                                hope, clear enough to readers?

                                  From what I've read of Friday and the
                                list since then, I dare t hope
                                this is in line with the general mood of
                                the WG?

                                Phil.



                                On 20/03/2015 18:09, Laufer wrote:
                                      Thank, you, Eric.

                                      Abraços,
                                      Laufer

                                      2015-03-20 12:31 GMT-03:00 Eric
                                      Stephan <ericphb@gmail.com>:
                                            Laufer and Bernadette,

                                            I raised an issue relating to
                                            this asking the question can we
                                            use 5
                                star
                                      as a metric and not a path?
                                http://www.w3.org/2013/dwbp/track/issues/148


                                      Eric S.

                                      On Fri, Mar 20, 2015 at 7:54 AM,
                                      Bernadette Farias Lóscio <
                                bfl@cin.ufpe.br
                                      wrote:
                                            Hi Laufer,

                                            Thanks for the message! It is a
                                            very useful explanation!

                                            I fully agree with you: "In
                                            this dataset publishing I can
                                            see the
                                idea of
                                      publishing metadata and using
                                      standard vocabularies, but is not a
                                      LD
                                      dataset."

                                      IMHO, we can use vocabularies to
                                      publish metadata, but we are not
                                doing
                                      linked data, i.e., there are no links
                                      between resources.

                                      I also agree that "we should
                                      differentiate the idea of a Best
                                Practice of
                                      a non LD dataset of the idea of an
                                      implicit Best Practice to go to a
                                LD
                                      dataset, that is what the 5 stars
                                      scale says.".

                                      If we have a BP whose implementation
                                      proposes the use of the RDF
                                model to
                                      publish data, then we are moving
                                      towards the 5 stars. It is important
                                to
                                      note that, publishind data using the
                                      RDF model may be just one of the
                                      proposed approaches for
                                      implementation, i.e, we may show
                                      other ways of
                                      publishing data without using RDF.

                                      Cheers,
                                      Bernadette




                                      2015-03-20 11:32 GMT-03:00 Laufer <
                                      laufer@globo.com>:

                                      Hi all,

                                            I will start my comment using
                                            an example:

                                            Someone publish a page where
                                            there are links to 2 files:
                                            a csv file with a dataset;
                                            a text file that explains the
                                            structure of the dataset, in
                                            natural
                                            language (metadata).

                                            In the page there are a lot of
                                            metadata provided in natural
                                language, as
                                      for example, an overview of the
                                      dataset, license, organization,
                                version,
                                      creator, rights, etc...

                                      At the same time, the page has an
                                      embedded dcat instance using rdfa
                                      where there are info about the
                                      dataset, the distribution, etc.

                                      What I want to say is that we have
                                      here the metadata concept mixed
                                with
                                      semantic web concepts, and it is a
                                      way of publishing data that, if
                                all the
                                      things are well described, could be
                                      very useful to the society.

                                      In this dataset publishing I can see
                                      the idea of publishing metadata
                                and
                                      using standard vocabularies, but is
                                      not a LD dataset.

                                      What I was discussing in the last
                                      meeting is: will we support in the
                                      document the idea that the best way
                                      to publish is LD. I am not
                                saying that
                                      I am against or not the idea. I am
                                      favorable to LD. But we should
                                      differentiate the idea of a Best
                                      Practice of a non LD dataset of the
                                idea
                                      of an implicit Best Practice to go to
                                      a LD dataset, that is what the
                                5
                                      stars scale says.

                                      Maybe is too much care with the
                                      words, sorry about this.

                                      Best Regards,
                                      Laufer

                                      --
                                      .  .  .  .. .  .
                                      .        .   . ..
                                      .     ..       .



                                      --
                                      Bernadette Farias Lóscio
                                      Centro de Informática
                                      Universidade Federal de Pernambuco -
                                      UFPE, Brazil

                                ----------------------------------------------------------------------------




                                --


                                Phil Archer
                                W3C Data Activity Lead
                                http://www.w3.org/2013/data/

                                http://philarcher.org
                                +44 (0)7887 767755
                                @philarcher1


              --


              Phil Archer
              W3C Data Activity Lead
              http://www.w3.org/2013/data/

              http://philarcher.org
              +44 (0)7887 767755
              @philarcher1



  --
  Bernadette Farias Lóscio
  Centro de Informática
  Universidade Federal de Pernambuco - UFPE, Brazil
  ----------------------------------------------------------------------------
Attachments

image/gif attachment: graycol.gif
image/gif attachment: ecblank.gif
Received on Thursday, 26 March 2015 15:32:54 UTC