Re: Intrinsic vs extrinsic metadata (my action #54)

Hi Laufer,

Thanks for your comments! I'm gonna try to answer below:

I have a doubt about why you defined distribution in a different way of,
> for example, license. In the same way that a data/dataset has a
> distribution that has metadata, a data/dataset has a license that has
> metadata. Why distribution is not simply a metadata type?
>

As proposed by DCAT [1], my initial idea was to describe a dataset
independently from its distributions. In the diagram, a dataset has a
collection of data and it is described by different types of metadata (the
ones illustrated in the diagram). Following the DCAT description of a
dataset, I also consider that a dataset may have one or more distributions,
where a distribution is a possible way of publishing the collection of data
of a given dataset, for example a file or an API. In this context, I don't
see distribution is a type of metadata.

On the other hand, in the diagram, a distribution is also described by
metadata. I am not sure if a distribution will have the same the metadata
that a dataset has. I am also not sure if access metadata should be related
to a dataset or to a specific distribution.


> Another thing that I think that could be represented in the diagram are
> the relationships that could exist among the diverse data/dataset metadata.
> So, metadata has a relation with metadata.
>

I don't see how we could relate the different types of metadata. Could you
please give an example?


Thanks again!

kind regards,
Bernadette

[1] http://www.w3.org/TR/vocab-dcat/




>
> Thank you.
>
> Best regards,
> Laufer
>
>
> 2014-07-01 20:51 GMT-03:00 Bernadette Farias Lóscio <bfl@cin.ufpe.br>:
>
> Hi Mark,
>>
>> Thanks again for the explanation! These examples are really helpful for
>> the understanding of the role of the different types of metadata.
>> I think that examples like these will be very useful to illustrate the
>> best practices. After having some feedback from the group, it could be nice
>> to update the wiki page with the diagrams together with a brief explanation
>> and an example for each type of metadata. What do you think?
>>
>> I'm sending attached a pdf version of the updated diagram. Since I am
>> using PowerPoint to create the diagrams, I am including the ppt version as
>> well. If you have suggestions for other tools that may help the
>> collaborative work, please let me know.
>>
>> It has been a great discussion! Thanks!
>>
>> kind regards,
>> Bernadette
>>
>>
>>
>>
>> 2014-07-01 20:09 GMT-03:00 Mark Harrison <mark.harrison@gs1.org>:
>>
>>  Hi Bernadette,
>>>
>>>  Thanks for the further discussion and updates to your diagram.
>>>
>>>  I also like the vertical continuum in the other diagram to express how
>>> intrinsic / extrinsic these different kinds of metadata are.
>>>
>>>  I'd say that scope and granularity are distinct and not
>>> interchangeable.
>>> Scope defines the dimensions and location of the 'bounding box' or
>>> 'envelope' in time and space, whereas granularity is a measure of how many
>>> sample points there are *within* that bounding box or envelope.
>>>
>>>  A simple example could be weather observation data, where the scope
>>> defines that the dataset has a coverage of the United Kingdom for the month
>>> of June 2014 and the granularity is dependent on how closely spaced the
>>> weather observation stations are and how frequently a new data point is
>>> recorded for wind speed, rainfall, barometric pressure etc. - e.g. is it
>>> per day, per hour, per minute or per second?  They both have temporal and
>>> geospatial dimensions, but I'd redraw that part of the diagram like this.
>>>
>>>  By the way - just a suggestion:  can we try to export any diagrams
>>> like this as vector graphics, either in SVG or PDF?  That makes it much
>>> easier for us all to make modifications fairly easily, rather than having
>>> to kludge bitmap modifications in Photoshop or Gimp.
>>>
>>>  Best wishes,
>>>
>>>  - Mark
>>>
>>>
>>>
>>> On 1 Jul 2014, at 22:52, Bernadette Farias Lóscio <bfl@cin.ufpe.br>
>>> wrote:
>>>
>>> Hi Mark,
>>>
>>> Thank you very much for your explanation!
>>>
>>> After reading your examples, I agree with you that scope is a intrinsec
>>> property, once it provides a better understanding about the meanining of
>>> the data itself (this was my initial idea about intrinsec metadata).  In
>>> the Data on the Web context, structural information is not enough to
>>> provide the semantics of the data, we need more information, like the scope
>>> of the data.
>>>
>>> Instead of removing the classification, I suggest to have two categories
>>> of intrinsec metadata: scope/granularity and structural. Do you think that
>>> scope and granularity can be considered together as a single category?
>>>
>>> I also agree that "these characteristics really fit on a sliding scale
>>> between Very Intrinsic and Very Extrinsic, with some middle ground in
>>> between". I created a figure that tries to illustrate this idea. Thi figure
>>> is attached.
>>>
>>> I'm sending attached another version of the diagram with the idea of a
>>> new classification.
>>>
>>> Yes, this discussion is very interesting and it is really important for
>>> best practices identification and definition :)
>>>
>>> Thanks again!
>>>
>>> Kind regards,
>>> Bernadette
>>>
>>>
>>>
>>>
>>> 2014-07-01 17:51 GMT-03:00 Mark Harrison <mark.harrison@gs1.org>:
>>> Hello Bernadette,
>>>
>>> Thanks for your updated diagram.
>>>
>>> I don't mind if we have slightly different opinions about where to draw
>>> the boundary between 'intrinsic' and 'extrinsic'.
>>>
>>> We both agree that structural metadata (what kind of data is it?) is
>>> intrinsic.
>>>
>>> I think the scope metadata is perhaps on the boundary between intrinsic
>>> and extrinsic, in the sense that even if you transform the data into
>>> another format or provide it through a different access method, the scope
>>> remains invariant.
>>>
>>> For example, consider local government spending data.
>>>
>>> At one level, you need intrinsic structural metadata that says 'this is
>>> spending per year on this expenditure category in this region', and we use
>>> classes and predicates from controlled vocabularies to express that so
>>> that anyone looking for that kind of data can find it, no matter which
>>> local government authority published it.  There may be domain-specific data
>>> publishing guidelines that recommend specific vocabularies to use.  Some
>>> will be core W3C vocabularies.  Others may be more domain-specific but
>>> ideally globally defined and multi-lingual.
>>>
>>> At another level, you want to be able to identify a particular dataset
>>> by its temporal and spatial scope.  I consider this to be intrinsic to the
>>> dataset, even though it's not a structural description.  If a dataset of
>>> local government spending data is published for a particular city and a
>>> particular fiscal year, the data contained within that dataset has that
>>> scope.  We can transform that set of data into different formats and
>>> provide additional methods to access it - and that temporal+spatial scope
>>> remains invariant under those changes.  We can't transform the spending
>>> data for London in 1999 into the spending data for Paris in 2013.  They are
>>> distinguishing characteristics of the data itself that distinguishes one
>>> set of data from another set of data, even when they share the same
>>> structural semantics.  That's why I think of temporal/spatial scope as
>>> being intrinsic to the dataset and its data, because they are (in my
>>> opinion) equally important to the meaning of the data - they're
>>> effectively expressing what the data is about (i.e. its subject or scope),
>>> whereas the intrinsic structural metadata says 'this is government spending
>>> for a particular city or region and a particular time interval'.  You
>>> actually need both.
>>>
>>> At another level, you want to explain which formats are available, how
>>> you can access it, which licence applies for usage of the data.  Those
>>> things feel much more extrinsic, because they can change over time -
>>> e.g. additional formats and access methods can be provided, other formats
>>> or access methods might be deprecated or withdrawn.  A licence might be
>>> changed to a more liberal licence - or a more restrictive licence.
>>>
>>> However, we can agree to differ about the boundary between intrinsic and
>>> extrinsic - and as I wrote, it's probably something of a continuum or
>>> sliding scale, rather than only consisting of only two possibilities with a
>>> very clearly defined boundary between them.
>>>
>>> The main issue is to use this exercise as a way to explore all the
>>> useful dimensions of metadata and identify the best practice ways of
>>> expressing those - and it seems that this discussion is helping to make
>>> some additional progress in that direction.
>>>
>>> I like your updated diagram.  Maybe it's easier for everyone to agree on
>>> it if we remove the words 'intrinsic' and 'extrinsic' from the diagram but
>>> just use them internally for the thought processes that try to make it as
>>> complete as possible.
>>>
>>> Best wishes,
>>>
>>> - Mark
>>>
>>>
>>>
>>>
>>> On 1 Jul 2014, at 20:51, Bernadette Farias Lóscio <bfl@cin.ufpe.br>
>>>  wrote:
>>>
>>> > Hello Mark,
>>> >
>>> > Thank you very much for sharing your thoughts about metadata
>>> definition.
>>> >
>>> > I read your notes on the wiki page and I have some comments:
>>> >
>>> > - I agree with you that a dataset may be described by two types of
>>> metadata. The metadatas that describes the data itself (intrinsic one) and
>>> the metadata that describes the dataset (extrinsic metadata). In the
>>> diagram that I showed in the last meeting, I called them structural and
>>> descriptive metadata.
>>> >
>>> > - I believe that intrinsic properties are the ones that describe the
>>> meaning of the data itself, like concepts, classes and properties.
>>> Intrinsic metadata has a similar role of a database schema and should be
>>> described by a domain vocabulary.
>>> >
>>> > - In this case,  Scope (temporal and geographic) and Granularity
>>> (temporal and spatial) should be considered extrinsic properties, once they
>>> describe the dataset instead of the meaning of data. Extrinsic properties
>>> should be described by standard vocabulariies like DCAT, PROV and the
>>> Quality and Data Usage vocabularies.
>>> >
>>> > Maybe I'm being too strict with this classification, but on the other
>>> hand I think this may help the understanding of the different types of
>>> metadata and their roles on describing a dataset.
>>> >
>>> > I'm sending attached a new version of the diagram that I showed on our
>>> last meeting. In this new version, I included more subclasses (access,
>>> granularity and scope) for the extrinsic metadata. I believe that now it
>>> is possible to define the properties (intrinsic and extrinsic) described in
>>> your notes.
>>> >
>>> > It would be great if you could take a look at the diagram and tell me
>>> if these ideas make sense to you.
>>> >
>>> > Thanks again!
>>> >
>>> > kind regards,
>>> > Bernadette
>>> >
>>> >
>>> >
>>> > 2014-07-01 10:29 GMT-03:00 Mark Harrison <mark.harrison@gs1.org>:
>>> > Dear DWBP colleagues,
>>> >
>>> > I've added a section to the DWBP wiki with some thoughts about
>>> intrinsic vs extrinsic metadata, in response to my action #54 from last
>>> Friday's call and the initial discussion there.
>>> >
>>> > I've now added that section at
>>> >
>>> >
>>> https://www.w3.org/2013/dwbp/wiki/Guidance_on_the_Provision_of_Metadata#Intrinsic_vs_Extrinsic_Metadata
>>> >
>>> > Maybe it's not the best place for it - in which case, I'm happy for
>>> the editors to move it to a better location in the Wiki.
>>> >
>>> > It's not definitive either - more of a discussion about the kinds of
>>> metadata that is intrinsic to the data itself (irrespective of format or
>>> access mechanism) and other kinds of metadata that is extrinsic (e.g.
>>> depends on a particular format, access mechanism or licence).
>>> >
>>> > Please feel free to modify this and extend it.
>>> >
>>> > I hope that it's useful for the discussions that Bernadette and I were
>>> having last week, as well as the work Hadley is writing about alternative
>>> approaches to data catalogues.
>>> >
>>> > At least it might help us to ensure that we explore the various
>>> 'dimensions' of metadata that might be used by data consumers when
>>> searching for datasets or discovering related datasets.  I have also
>>> included some ideas about capturing feedback about data usage (e.g. in
>>> applications, websites, mash-ups), including links to related datasets that
>>> add some valuable context.
>>> >
>>> > Feel free to develop this further if you think it is useful.
>>> >
>>> > Best wishes,
>>> >
>>> > - Mark
>>> >
>>> >
>>> > CONFIDENTIALITY / DISCLAIMER: The contents of this e-mail are
>>> confidential and are not to be regarded as a contractual offer or
>>> acceptance from GS1 (registered in Belgium). If you are not the addressee,
>>> or if this has been copied or sent to you in error, you must not use data
>>> herein for any purpose, you must delete it, and should inform the sender.
>>> GS1 disclaims liability for accuracy or completeness, and opinions
>>> expressed are those of the author alone. GS1 may monitor communications.
>>> Third party rights acknowledged. (c) 2013.
>>> >
>>> >
>>> >
>>> > --
>>> > Bernadette Farias Lóscio
>>> > Centro de Informática
>>> > Universidade Federal de Pernambuco - UFPE, Brazil
>>> >
>>> ----------------------------------------------------------------------------
>>> > <DWBP_metadata.jpg>
>>>
>>> CONFIDENTIALITY / DISCLAIMER: The contents of this e-mail are
>>>  confidential and are not to be regarded as a contractual offer or
>>> acceptance from GS1 (registered in Belgium).
>>> If you are not the addressee, or if this has been copied or sent to you
>>> in error, you must not use data herein for any purpose, you must delete it,
>>> and should inform the sender.
>>> GS1 disclaims liability for accuracy or completeness, and opinions
>>> expressed are those of the author alone.
>>> GS1 may monitor communications.
>>> Third party rights acknowledged.
>>> (c) 2012.
>>> </a>
>>>
>>>
>>>
>>>
>>> --
>>> Bernadette Farias Lóscio
>>> Centro de Informática
>>> Universidade Federal de Pernambuco - UFPE, Brazil
>>>
>>> ----------------------------------------------------------------------------
>>>
>>>
>>> CONFIDENTIALITY / DISCLAIMER: The contents of this e-mail
>>> are confidential and are not to be regarded as a contractual offer or
>>> acceptance from GS1 (registered in Belgium). If you are not the addressee,
>>> or if this has been copied or sent to you in error, you must not use data
>>> herein for any purpose, you must delete it, and should inform the
>>> sender. GS1 disclaims liability for accuracy or completeness, and opinions
>>> expressed are those of the author alone. GS1 may monitor
>>> communications. Third party rights acknowledged. (c) 2013.
>>> <DWBP_metadata_v02.jpg><Extrinsic x Intrinsec.jpg>
>>>
>>>
>>>
>>>
>>>  ------------------------------
>>> CONFIDENTIALITY / DISCLAIMER: The contents of this e-mail are
>>> confidential and are not to be regarded as a contractual offer or
>>> acceptance from GS1 (registered in Belgium). If you are not the addressee,
>>> or if this has been copied or sent to you in error, you must not use data
>>> herein for any purpose, you must delete it, and should inform the sender.
>>> GS1 disclaims liability for accuracy or completeness, and opinions
>>> expressed are those of the author alone. GS1 may monitor communications.
>>> Third party rights acknowledged. (c) 2013.
>>> ------------------------------
>>>
>>
>>
>>
>> --
>> Bernadette Farias Lóscio
>> Centro de Informática
>> Universidade Federal de Pernambuco - UFPE, Brazil
>> ----------------------------------------------------------------------------
>>
>>
>
>
>
> --
> .  .  .  .. .  .
> .        .   . ..
> .     ..       .
>



-- 
Bernadette Farias Lóscio
Centro de Informática
Universidade Federal de Pernambuco - UFPE, Brazil
----------------------------------------------------------------------------

Received on Monday, 7 July 2014 13:28:21 UTC