W3C home > Mailing lists > Public > public-dwbp-wg@w3.org > January 2015

Re: Best Practice 4 (Document Metadata) - I agree to suppress it

From: Carlos Iglesias <contact@carlosiglesias.es>
Date: Thu, 22 Jan 2015 01:24:13 +0100
Message-ID: <CAAa1Xz=Ygpsb59yRkONQBwADx=KnAfW9xcov-RZvJbiroSRX-A@mail.gmail.com>
To: Annette Greiner <amgreiner@lbl.gov>
Cc: Bernadette Farias Lóscio <bfl@cin.ufpe.br>, Laufer <laufer@globo.com>, "contact@carlosiglesias.es" <contact@carlosiglesias.es>, DWBP WG <public-dwbp-wg@w3.org>
Hi Annette,

First change should be to start talking about data models and not
vocabularies, because vocabularies are just a subset of data models that
for example nobody working with APIs has ever been using
(I know I'm starting to be quite tiresome - on purpose - with this, and I
am also threaten all you I will keep doing so until we rectify or somebody
else gives me a good argument on why we should keep as currently)

After I agree that we should emphasize (more) reuse of reference models
instead of developing new ones. Still, unfortunately, the number of
available reference models is limited yet (specially wrt vocabularies) so
it is not so unfrequent you may need to develop your new ones and thus the
rest of BPs remain relevant and necessary. So I think we should keep them,
but maybe explaining this should just be used in the case you can't re-use
an existing model and you are forced to create your own (or extend an
existing one)

With respect to the differences between BP3 and BP11 we are discussing that
extensively in other thread, but in brief the difference is between reusing
specific terms for your metadata and reusing full data models for your
data. Agree it should be a SHOULD for BP3. In fact it already is in the BP
title, but not in the content.

Best,
 CI.


On 21 January 2015 at 21:31, Annette Greiner <amgreiner@lbl.gov> wrote:

> I’m concerned that we are maybe getting out of scope with all the detail
> about vocabularies. Creating new vocabularies is a different task from
> publishing data that uses them, unless you are talking about custom
> controlled vocabularies. Of course, publishers need to document any custom
> controlled vocabularies they are using, but the best practices we have seem
> to be written for people inventing large standardized ones. Creating large
> standardized vocabularies is not something we expect data publishers to do
> per se. In fact, the more we emphasize information about creating
> vocabularies, the more we seem to be suggesting that data publishers should
> be doing that regularly. I would rather we de-emphasized creating new
> vocabularies and instead emphasized re-using existing vocabularies.
>
> thus, I think we should reconsider whether each of the BPs in 7.4 is in
> scope. BP12, BP13, and BP15 seem to me meant for people developing large
> standard vocabularies. BP14 is just the reverse way of saying the same
> thing as BP3. BP11 should only address custom controlled vocabularies.
> (Publishers should not produce new—and possibly conflicting--documentation
> for existing vocabularies; that task falls to the creators of the
> vocabulary.)
>
> I also think BP3 should only be a SHOULD. I wouldn’t want someone to avoid
> publishing because they felt they had to use standard vocabularies to do
> that. Many datasets in the sciences are overwhelmingly filled with data
> that has no standard vocabulary (because the domain is too new).
>
> The document needs editing by a native English speaker. Is someone already
> in line to do that?
> -Annette
>
> --
> Annette Greiner
> NERSC Data and Analytics Services
> Lawrence Berkeley National Laboratory
> 510-495-2935
>
> On Jan 21, 2015, at 10:16 AM, Bernadette Farias Lóscio <bfl@cin.ufpe.br>
> wrote:
>
> > Hello Carlos,
> >
> > Thanks for your comments!
> >
> > When I said that the Document Metadata BP was redundant with the
> > Document Vocabularies BP, I was considering the BP definition and not
> > the real intention of the BP.
> >
> > If we consider the meaning that "BP4 is about documenting what
> > metadata terms (being reused or ad-hoc) are you finally using", then
> > Document Metadata is not redundant with Document Vocabularies BP.
> >
> > In this case, it should be more clear what is the real meaning of
> > "documenting". If documenting means to "provide a document that
> > describe the metadata", then I think that BP on human vs. machine
> > readable metadata covers this requirement. On the other hand, if
> > documenting metadata concerns to maintain a documentation for
> > metadata, then maybe we should have a different BP. In this case,
> > there will be three BP:
> >
> > 1. Document metadata BP: data publishers SHOULD maintain a
> > documentation of the metadata that describe your data. This BP
> > concerns something that has to be done by the data consumer, but this
> > action doesn't have a direct impact on data consumers. There is
> > another BP (Provide metadata) to say that this documentation should be
> > provided to data consumers. This BP should be more general than the
> > Document Vocabularies BP. The metadata documentation should just tell
> > the vocabularies that are used, instead of providing a complete
> > documentation for vocabularies.
> >
> > 2. Provide  metadata for both human and machines BP: data publishers
> > SHOULD document metadata in such a way that both humans and machines
> > can read. This BP complements the previous one because it says how
> > metadata should be documented.
> >
> > 3. Provide metadata BP: data publishers SHOULD provide metadata
> > documentation to data consumers. When you have the documentation, give
> > it to the data consumers.
> >
> > Does it make sense for you?
> >
> > Cheers,
> > Bernadette
> >
> >
> > 2015-01-20 21:51 GMT-03:00 Laufer <laufer@globo.com>:
> >> Hi, Carlos,
> >>
> >>> BP4 is about documenting what metadata terms are you finally using
> >>
> >> Terms are parts of a vocabulary.
> >>
> >> And we will have a whole section about vocabularies.
> >>
> >> Metadata is documenting data. Then, metadata should be documented. These
> >> documents about metadata are metadata of metadata. We should take care
> about
> >> an infinite chain.
> >>
> >> If we talk about documents for machines, we are talking about
> vocabularies.
> >> And section 7.
> >> 4 will take care of this.
> >>
> >> If we are talking about humans, metadata is the documentation. Have a
> >> documentation about metadata is mandatory. If metadata does not have a
> >> documentation, it does not have a meaning. For example, If one says
> that the
> >> dataset has a GNU license, how this can be understood by a human if GNU
> is
> >> not documented? The meaning is the documentation and must exist if
> someone
> >> decides to refer to it.
> >>
> >> In respect to code lists, (maybe this is not the formal definition) I
> think
> >> they are a kind of type, or even a kind of vocabulary. Again, I think
> >> section 7.4 is a better candidate to talk about this.
> >>
> >> Best regards,
> >> Laufer
> >>
> >>
> >>
> >> Em terça-feira, 20 de janeiro de 2015, Carlos Iglesias
> >> <contact@carlosiglesias.es> escreveu:
> >>
> >>> Hello everyone,
> >>>
> >>> Here goes my view on this:
> >>>
> >>> - I tend to disagree on (former) BP4 being derived from BP1+2+3
> >>>
> >>> BP1 is on metadata availability (provide metadata)
> >>> BP2 is on human vs. machine readable metadata (how to present metadata)
> >>> BP3 is reusing generic standard metadata terms when possible (i.e. dc,
> >>> foaf and the like)
> >>> BP4 is about documenting what metadata terms (being reused or ad-hoc)
> are
> >>> you finally using
> >>>
> >>> I don't see overlap between any of the above.
> >>>
> >>> - WRT BP11 Document vocabularies
> >>>
> >>> I don't see any overlap with (fomer) BP4 either as:
> >>>
> >>> BP4 is about documenting what metadata terms are you finally using
> >>> BP11 is about documenting your data (not metadata) models (or
> >>> "vocabularies") in the case you are developing new ones.
> >>>
> >>> - Finally WRT Annette's comments I think there is a missing point here:
> >>> BPXX Document your data
> >>>
> >>> This is about the "data codebooks" that should be accompanying our
> data as
> >>> additional documentation but unfortunately are rarely available making
> >>> working with 3rd party data a pain. This "codebooks" usually document
> all
> >>> the information that Annette is refereeing to in her message and more.
> >>>
> >>> Best,
> >>> CI.
> >>>
> >>> On 20 January 2015 at 21:00, Annette Greiner <amgreiner@lbl.gov>
> wrote:
> >>>>
> >>>> Here are a few things that come to mind as needing to be documented in
> >>>> metadata.
> >>>> Units, for any measure that is not unitless.
> >>>> For responses to a survey question, the question itself and how it was
> >>>> coded. (This is where code lists come in.)
> >>>> Meaning of nulls, zeroes, NA, etc.
> >>>> language, locale (we have this one covered elsewhere, but probably it
> >>>> should be included under the more general BP.)
> >>>>
> >>>> I think the metadata information right now is a little bit redundant.
> >>>> Documenting metadata is really the same as providing metadata. When
> we have
> >>>> generalized the BP about documenting, it will be even more like the
> one
> >>>> about providing metadata. In both cases, we are talking about using
> good
> >>>> metadata to describe the data and making it available to data
> consumers.
> >>>> -Annette
> >>>> --
> >>>> Annette Greiner
> >>>> NERSC Data and Analytics Services
> >>>> Lawrence Berkeley National Laboratory
> >>>> 510-495-2935
> >>>>
> >>>> On Jan 20, 2015, at 5:16 AM, Bernadette Farias Lóscio <
> bfl@cin.ufpe.br>
> >>>> wrote:
> >>>>
> >>>>>
> >>>>> The Document metadata BP should be rewritten to become more general,
> >>>>> i.e., not just vocabularies should be documented. In this case, what
> >>>>> else should be documented when talking about metadata?
> >>>>>
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> ---
> >>>
> >>> Carlos Iglesias.
> >>> Internet & Web Consultant.
> >>> +34 687 917 759
> >>> contact@carlosiglesias.es
> >>> @carlosiglesias
> >>> http://es.linkedin.com/in/carlosiglesiasmoro/en
> >>
> >>
> >>
> >> --
> >> .  .  .  .. .  .
> >> .        .   . ..
> >> .     ..       .
> >
> >
> >
> > --
> > Bernadette Farias Lóscio
> > Centro de Informática
> > Universidade Federal de Pernambuco - UFPE, Brazil
> >
> ----------------------------------------------------------------------------
>
>


-- 
---

Carlos Iglesias.
Internet & Web Consultant.
+34 687 917 759
contact@carlosiglesias.es
@carlosiglesias
http://es.linkedin.com/in/carlosiglesiasmoro/en
Received on Thursday, 22 January 2015 00:24:43 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:39:31 UTC