Re: The 5 stars path

Rating a dataset is only valuable if records within the dataset have
ratings whose sum or average validates the dataset rating.  That is, there
has to be provenance to the ratings.


Best Regards,

Steve

Motto: "Do First, Think, Do it Again"


|------------>
| From:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Bernadette Farias Lóscio <bfl@cin.ufpe.br>                                                                                                        |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| To:        |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Eric Stephan <ericphb@gmail.com>                                                                                                                  |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Cc:        |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Phil Archer <phila@w3.org>, Laufer <laufer@globo.com>, Christophe Guéret <christophe.gueret@dans.knaw.nl>, DWBP WG <public-dwbp-wg@w3.org>        |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Date:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |03/24/2015 10:11 AM                                                                                                                               |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Subject:   |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Re: The 5 stars path                                                                                                                              |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|





Hi all,

Thanks for the great discussion!

I like the idea of having a star rating discussion, but we need to be aware
that publishing data on the Web is more than just publishing data and
metadata. It also concerns issues like data access and feedback.

I've been thinking a lot about this rating system and it would be great to
consider all aspects related to data on the Web (ex: data format, metadata,
identifiers, data access, feedback, versioning...), but I'm bot sure if
this is the best choice. Maybe, we can have a rating system based just on
data and metadata, which is similar to the initial proposal of Phil.

Cheers,
Bernadette

2015-03-22 18:38 GMT-03:00 Eric Stephan <ericphb@gmail.com>:
  Wow what a wonderful thread to read.  Thank you Phil!  Many many thanks
  for this wonderful note of clarity!

  >>if Eric and Annette can provide similar examples for NetCDF that would
  be terrific (I'm out of my depth here).

  Yes I think we can show this quite easily.  Just off the top of my heads.

  NetCDF:
     - is an open format for storing multi-dimensional data streams
  [NETCDF]
     - can be annotated with self describing metadata (called attributes)
     - has existing conventions for representing different forms of data.
  E.g. CF convention.
     - has a CF vocabulary [CFNAMES] for curated climate and forecasting
  terminology.
     - In addition the climate community within the Earth System Grid (ESG)
  has adopted fully documented protocols [CMIP5] to show how regional and
  climate model datasets must be organized so that they can be
  inter-related to support regional and global climate studies.
    - Leverages existing ISO standards used in the geospatial, dublin core,
  and metadata communities.
     - Finally an ontology was developed by NASA JPL called SWEET [SWEET],
  there is previous research showing how the CF terms can inter-related.

  I would submit that even without the ontology in terms of open data, the
  climate community is already at 5 star.



  Eric


  References

  [NETCDF] http://en.wikipedia.org/wiki/NetCDF
  [CFNAMES]
  http://cfconventions.org/Data/cf-standard-names/28/build/cf-standard-name-table.html
  [CMIP5] http://cmip-pcmdi.llnl.gov/cmip5/
  [SWEET] https://sweet.jpl.nasa.gov/


  On Sun, Mar 22, 2015 at 10:45 AM, Phil Archer <phila@w3.org> wrote:
   We are in full agreement.

   One of my hopes for this WG is that we can indeed lead people to publish
   formats like CSV in the best way (i.e. with good quality metadata)
   without them feeling somehow inferior.

   If that leads us to define our own star rating system, I wouldn't mind.
   Something like:

   * It's available on the Web in an open format with a declared licence
   (anything less is all but useless).

   ** As level 1 with good quality discovery metadata (we might refer to
   the DCAT Application profile work as an example).

   *** All the above plus structural metadata in the relevant format (e.g.
   CSV+ for CSV, VoID for RDF etc).

   This doesn't include quality metrics (which it should), and contact
   details (which it should) - but they might be defined at level 2?

   Maybe a start anyway.

   Phil.

   On 22/03/2015 13:50, Laufer wrote:
     I agree, Phil.

     What I want to reinforce is that it would be nice if we could make
     clear in
     the document that 5 stars LD (or OD?) is not a scale of a dataset that
     is
     well published in the web. We can have, for example, a "CSV
     dataset" (3
     stars) more well published than a "LD dataset" (5 stars). Or, maybe,
     we can
     avoid using the 5 stars when what we want to say is that a dataset is
     being
     published in a CSV format.

     If we say that one dataset is 3 stars and other is 5 stars, people
     have the
     idea that the 5 one is better than the 3 one (as in reviews or hotels,
     for
     example).

     We probably will not define our own scale but I hope that our set of
     BPs
     could help people to publish a  "Well Published Data on The Web".

     Best Regards,
     Laufer

     Em domingo, 22 de março de 2015, Christophe Guéret <
     christophe.gueret@dans.knaw.nl
     <javascript:_e(%7B%7D,'cvml','christophe.gueret@dans.knaw.nl');>>
     escreveu:


      +1!

      Christophe

      --
      Sent with difficulties. Sorry for the brievety and typos...
      Op 22 mrt. 2015 08:47 schreef "Phil Archer" <phila@w3.org>:

        I've just been reading through Friday's minutes and I see that this
        was
        the hot topic of the day. As ever, I'm sorry I wasn't able to be
        there.

        Let me add my 2 cents.

        LD forms a small part of the available data on the Web. It would be
        silly of us to push for everyone to convert their data into
        perfectly
        linked 5 star data before they make it available publicly or behind
        a
        pay-wall of some kind.

        What we *can* do IMO is:

        - Promote the publication of human readable metadata as Laufer has
        described;

        - promote the publication of machine readable metadata and then
        show how
        this can be (and is) done with RDF using DCAT as an example;

        - promote the publication of structural metadata which, for CSV at
        least, we have a very clear route - use the CSV on the Web work;

        - if Eric and Annette can provide similar examples for NetCDF that
        would
        be terrific (I'm out of my depth here).

        - We can leave it to the Spatial Data on the Web WG to handle
        spatial
        stuff (as they are leaving some of their generic issues to this
        group).

        As an aside, the CSV WG has resolved its issues now and is
        expecting to
        publish pretty much the stable version of its specs in the first
        week of
        April.

        If you publish data in your favourite format + structural metadata
        in
        whatever format goes with that (and the CSV WG is using JSON for
        its
        metadata) then you are providing a route through which your users
        can
        readily create 5 star data if they so wish. They may or may not use
        LD
        themselves but the concept behind it is, I hope, clear enough to
        readers?

          From what I've read of Friday and the list since then, I dare t
        hope
        this is in line with the general mood of the WG?

        Phil.



        On 20/03/2015 18:09, Laufer wrote:
          Thank, you, Eric.

          Abraços,
          Laufer

          2015-03-20 12:31 GMT-03:00 Eric Stephan <ericphb@gmail.com>:

           Laufer and Bernadette,

           I raised an issue relating to this asking the question can we
           use 5
        star
           as a metric and not a path?
        http://www.w3.org/2013/dwbp/track/issues/148

           Eric S.

           On Fri, Mar 20, 2015 at 7:54 AM, Bernadette Farias Lóscio <
        bfl@cin.ufpe.br
             wrote:

             Hi Laufer,

             Thanks for the message! It is a very useful explanation!

             I fully agree with you: "In this dataset publishing I can see
             the
        idea of
             publishing metadata and using standard vocabularies, but is
             not a LD
             dataset."

             IMHO, we can use vocabularies to publish metadata, but we are
             not
        doing
             linked data, i.e., there are no links between resources.

             I also agree that "we should differentiate the idea of a Best
        Practice of
             a non LD dataset of the idea of an implicit Best Practice to
             go to a
        LD
             dataset, that is what the 5 stars scale says.".

             If we have a BP whose implementation proposes the use of the
             RDF
        model to
             publish data, then we are moving towards the 5 stars. It is
             important
        to
             note that, publishind data using the RDF model may be just one
             of the
             proposed approaches for implementation, i.e, we may show other
             ways of
             publishing data without using RDF.

             Cheers,
             Bernadette




             2015-03-20 11:32 GMT-03:00 Laufer <laufer@globo.com>:

             Hi all,

              I will start my comment using an example:

              Someone publish a page where there are links to 2 files:
              a csv file with a dataset;
              a text file that explains the structure of the dataset, in
              natural
              language (metadata).

              In the page there are a lot of metadata provided in natural
        language, as
              for example, an overview of the dataset, license,
              organization,
        version,
              creator, rights, etc...

              At the same time, the page has an embedded dcat instance
              using rdfa
              where there are info about the dataset, the distribution,
              etc.

              What I want to say is that we have here the metadata concept
              mixed
        with
              semantic web concepts, and it is a way of publishing data
              that, if
        all the
              things are well described, could be very useful to the
              society.

              In this dataset publishing I can see the idea of publishing
              metadata
        and
              using standard vocabularies, but is not a LD dataset.

              What I was discussing in the last meeting is: will we support
              in the
              document the idea that the best way to publish is LD. I am
              not
        saying that
              I am against or not the idea. I am favorable to LD. But we
              should
              differentiate the idea of a Best Practice of a non LD dataset
              of the
        idea
              of an implicit Best Practice to go to a LD dataset, that is
              what the
        5
              stars scale says.

              Maybe is too much care with the words, sorry about this.

              Best Regards,
              Laufer

              --
              .  .  .  .. .  .
              .        .   . ..
              .     ..       .




             --
             Bernadette Farias Lóscio
             Centro de Informática
             Universidade Federal de Pernambuco - UFPE, Brazil


        ----------------------------------------------------------------------------






        --


        Phil Archer
        W3C Data Activity Lead
        http://www.w3.org/2013/data/

        http://philarcher.org
        +44 (0)7887 767755
        @philarcher1




   --


   Phil Archer
   W3C Data Activity Lead
   http://www.w3.org/2013/data/

   http://philarcher.org
   +44 (0)7887 767755
   @philarcher1




--
Bernadette Farias Lóscio
Centro de Informática
Universidade Federal de Pernambuco - UFPE, Brazil
----------------------------------------------------------------------------

Received on Tuesday, 24 March 2015 14:19:18 UTC