Re: Intrinsic vs extrinsic metadata (my action #54)

Hi Laufer,

Thanks for your comments! Please find some comments below.


> I think that a DCAT description (extended by the DWBP WG) would have
> pointers to different metadata types of a specific Dataset, and
> distribution would be one of them. Maybe distribution could have a special
> status but I can´t see why.
>

If we consider the DCAT definition for dataset and distribution, then
"distribution represents an accessible form of a dataset as for example a
downloadable file, an RSS feed or a web service that provides the data". In
this case, a distribution won't be a type of metadata, but a way of
publising the data. However, a distribution may be associated to metadata
as well. A distribution also needs to have a description. Please take a
look into the diagram that I included in the wiki page about BD guidelines
[2].


> >> I don't see how we could relate the different types of metadata. Could
> you please give an example?
> Iintrinsic metada related to specific distributions (for example, a CSV
> file);
> Different types of licenses/credentials could allow access to different
> subsets of the Dataset implying in different intrinsic metadata.
>
>
I think that intrinsic metadata is related to the data itself and not to a
specific distribution. In my opinion, the structure of the data, the scope
and granularity of the data should be independent of data format and data
access. On the other hand, as I mentioned before, each distribution may
have its own metadata.

I'm sorry if I didn't understand or point, but I am still not convinced
that we should have relations between different types of metadata. However,
I agree that metadata should also be described my metadata as proposed by
Andrea Perego [1]. In this case, it is important to find a way to model
this recursive relationship.

Thank you!
Bernadette

[1] http://lists.w3.org/Archives/Public/public-dwbp-wg/2014Jul/0035.html
[2] https://www.w3.org/2013/dwbp/wiki/Best_practices_guidelines



 Thank you.
>
> Kind regards,
> Laufer
>
>
> 2014-07-07 10:27 GMT-03:00 Bernadette Farias Lóscio <bfl@cin.ufpe.br>:
>
> Hi Laufer,
>>
>> Thanks for your comments! I'm gonna try to answer below:
>>
>> I have a doubt about why you defined distribution in a different way of,
>>> for example, license. In the same way that a data/dataset has a
>>> distribution that has metadata, a data/dataset has a license that has
>>> metadata. Why distribution is not simply a metadata type?
>>>
>>
>> As proposed by DCAT [1], my initial idea was to describe a dataset
>> independently from its distributions. In the diagram, a dataset has a
>> collection of data and it is described by different types of metadata (the
>> ones illustrated in the diagram). Following the DCAT description of a
>> dataset, I also consider that a dataset may have one or more distributions,
>> where a distribution is a possible way of publishing the collection of data
>> of a given dataset, for example a file or an API. In this context, I don't
>> see distribution is a type of metadata.
>>
>> On the other hand, in the diagram, a distribution is also described by
>> metadata. I am not sure if a distribution will have the same the metadata
>> that a dataset has. I am also not sure if access metadata should be related
>> to a dataset or to a specific distribution.
>>
>>
>>> Another thing that I think that could be represented in the diagram are
>>> the relationships that could exist among the diverse data/dataset metadata.
>>> So, metadata has a relation with metadata.
>>>
>>
>> I don't see how we could relate the different types of metadata. Could
>> you please give an example?
>>
>>
>> Thanks again!
>>
>> kind regards,
>> Bernadette
>>
>> [1] http://www.w3.org/TR/vocab-dcat/
>>
>>
>>
>>
>>>
>>> Thank you.
>>>
>>> Best regards,
>>> Laufer
>>>
>>>
>>> 2014-07-01 20:51 GMT-03:00 Bernadette Farias Lóscio <bfl@cin.ufpe.br>:
>>>
>>> Hi Mark,
>>>>
>>>> Thanks again for the explanation! These examples are really helpful for
>>>> the understanding of the role of the different types of metadata.
>>>> I think that examples like these will be very useful to illustrate the
>>>> best practices. After having some feedback from the group, it could be nice
>>>> to update the wiki page with the diagrams together with a brief explanation
>>>> and an example for each type of metadata. What do you think?
>>>>
>>>> I'm sending attached a pdf version of the updated diagram. Since I am
>>>> using PowerPoint to create the diagrams, I am including the ppt version as
>>>> well. If you have suggestions for other tools that may help the
>>>> collaborative work, please let me know.
>>>>
>>>> It has been a great discussion! Thanks!
>>>>
>>>> kind regards,
>>>> Bernadette
>>>>
>>>>
>>>>
>>>>
>>>> 2014-07-01 20:09 GMT-03:00 Mark Harrison <mark.harrison@gs1.org>:
>>>>
>>>>  Hi Bernadette,
>>>>>
>>>>>  Thanks for the further discussion and updates to your diagram.
>>>>>
>>>>>  I also like the vertical continuum in the other diagram to express
>>>>> how intrinsic / extrinsic these different kinds of metadata are.
>>>>>
>>>>>  I'd say that scope and granularity are distinct and not
>>>>> interchangeable.
>>>>> Scope defines the dimensions and location of the 'bounding box' or
>>>>> 'envelope' in time and space, whereas granularity is a measure of how many
>>>>> sample points there are *within* that bounding box or envelope.
>>>>>
>>>>>  A simple example could be weather observation data, where the scope
>>>>> defines that the dataset has a coverage of the United Kingdom for the month
>>>>> of June 2014 and the granularity is dependent on how closely spaced the
>>>>> weather observation stations are and how frequently a new data point is
>>>>> recorded for wind speed, rainfall, barometric pressure etc. - e.g. is it
>>>>> per day, per hour, per minute or per second?  They both have temporal and
>>>>> geospatial dimensions, but I'd redraw that part of the diagram like this.
>>>>>
>>>>>  By the way - just a suggestion:  can we try to export any diagrams
>>>>> like this as vector graphics, either in SVG or PDF?  That makes it much
>>>>> easier for us all to make modifications fairly easily, rather than having
>>>>> to kludge bitmap modifications in Photoshop or Gimp.
>>>>>
>>>>>  Best wishes,
>>>>>
>>>>>  - Mark
>>>>>
>>>>>
>>>>>
>>>>> On 1 Jul 2014, at 22:52, Bernadette Farias Lóscio <bfl@cin.ufpe.br>
>>>>> wrote:
>>>>>
>>>>> Hi Mark,
>>>>>
>>>>> Thank you very much for your explanation!
>>>>>
>>>>> After reading your examples, I agree with you that scope is a
>>>>> intrinsec property, once it provides a better understanding about the
>>>>> meanining of the data itself (this was my initial idea about intrinsec
>>>>> metadata).  In the Data on the Web context, structural information is not
>>>>> enough to provide the semantics of the data, we need more information, like
>>>>> the scope of the data.
>>>>>
>>>>> Instead of removing the classification, I suggest to have two
>>>>> categories of intrinsec metadata: scope/granularity and structural. Do you
>>>>> think that scope and granularity can be considered together as a single
>>>>> category?
>>>>>
>>>>> I also agree that "these characteristics really fit on a sliding scale
>>>>> between Very Intrinsic and Very Extrinsic, with some middle ground in
>>>>> between". I created a figure that tries to illustrate this idea. Thi figure
>>>>> is attached.
>>>>>
>>>>> I'm sending attached another version of the diagram with the idea of a
>>>>> new classification.
>>>>>
>>>>> Yes, this discussion is very interesting and it is really important
>>>>> for best practices identification and definition :)
>>>>>
>>>>> Thanks again!
>>>>>
>>>>> Kind regards,
>>>>> Bernadette
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2014-07-01 17:51 GMT-03:00 Mark Harrison <mark.harrison@gs1.org>:
>>>>> Hello Bernadette,
>>>>>
>>>>> Thanks for your updated diagram.
>>>>>
>>>>> I don't mind if we have slightly different opinions about where to
>>>>> draw the boundary between 'intrinsic' and 'extrinsic'.
>>>>>
>>>>> We both agree that structural metadata (what kind of data is it?) is
>>>>> intrinsic.
>>>>>
>>>>> I think the scope metadata is perhaps on the boundary between
>>>>> intrinsic and extrinsic, in the sense that even if you transform the data
>>>>> into another format or provide it through a different access method, the
>>>>> scope remains invariant.
>>>>>
>>>>> For example, consider local government spending data.
>>>>>
>>>>> At one level, you need intrinsic structural metadata that says 'this
>>>>> is spending per year on this expenditure category in this region', and we
>>>>> use classes and predicates from controlled vocabularies to express that so
>>>>> that anyone looking for that kind of data can find it, no matter which
>>>>> local government authority published it.  There may be domain-specific data
>>>>> publishing guidelines that recommend specific vocabularies to use.  Some
>>>>> will be core W3C vocabularies.  Others may be more domain-specific but
>>>>> ideally globally defined and multi-lingual.
>>>>>
>>>>> At another level, you want to be able to identify a particular dataset
>>>>> by its temporal and spatial scope.  I consider this to be intrinsic to the
>>>>> dataset, even though it's not a structural description.  If a dataset of
>>>>> local government spending data is published for a particular city and a
>>>>> particular fiscal year, the data contained within that dataset has that
>>>>> scope.  We can transform that set of data into different formats and
>>>>> provide additional methods to access it - and that temporal+spatial scope
>>>>> remains invariant under those changes.  We can't transform the spending
>>>>> data for London in 1999 into the spending data for Paris in 2013.  They are
>>>>> distinguishing characteristics of the data itself that distinguishes one
>>>>> set of data from another set of data, even when they share the same
>>>>> structural semantics.  That's why I think of temporal/spatial scope as
>>>>> being intrinsic to the dataset and its data, because they are (in my
>>>>> opinion) equally important to the meaning of the data - they're
>>>>> effectively expressing what the data is about (i.e. its subject or scope),
>>>>> whereas the intrinsic structural metadata says 'this is government spending
>>>>> for a particular city or region and a particular time interval'.  You
>>>>> actually need both.
>>>>>
>>>>> At another level, you want to explain which formats are available, how
>>>>> you can access it, which licence applies for usage of the data.  Those
>>>>> things feel much more extrinsic, because they can change over time -
>>>>> e.g. additional formats and access methods can be provided, other formats
>>>>> or access methods might be deprecated or withdrawn.  A licence might be
>>>>> changed to a more liberal licence - or a more restrictive licence.
>>>>>
>>>>> However, we can agree to differ about the boundary between intrinsic
>>>>> and extrinsic - and as I wrote, it's probably something of a continuum or
>>>>> sliding scale, rather than only consisting of only two possibilities with a
>>>>> very clearly defined boundary between them.
>>>>>
>>>>> The main issue is to use this exercise as a way to explore all the
>>>>> useful dimensions of metadata and identify the best practice ways of
>>>>> expressing those - and it seems that this discussion is helping to make
>>>>> some additional progress in that direction.
>>>>>
>>>>> I like your updated diagram.  Maybe it's easier for everyone to agree
>>>>> on it if we remove the words 'intrinsic' and 'extrinsic' from the diagram
>>>>> but just use them internally for the thought processes that try to make it
>>>>> as complete as possible.
>>>>>
>>>>> Best wishes,
>>>>>
>>>>> - Mark
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 1 Jul 2014, at 20:51, Bernadette Farias Lóscio <bfl@cin.ufpe.br>
>>>>>  wrote:
>>>>>
>>>>> > Hello Mark,
>>>>> >
>>>>> > Thank you very much for sharing your thoughts about metadata
>>>>> definition.
>>>>> >
>>>>> > I read your notes on the wiki page and I have some comments:
>>>>> >
>>>>> > - I agree with you that a dataset may be described by two types of
>>>>> metadata. The metadatas that describes the data itself (intrinsic one) and
>>>>> the metadata that describes the dataset (extrinsic metadata). In the
>>>>> diagram that I showed in the last meeting, I called them structural and
>>>>> descriptive metadata.
>>>>> >
>>>>> > - I believe that intrinsic properties are the ones that describe the
>>>>> meaning of the data itself, like concepts, classes and properties.
>>>>> Intrinsic metadata has a similar role of a database schema and should be
>>>>> described by a domain vocabulary.
>>>>> >
>>>>> > - In this case,  Scope (temporal and geographic) and Granularity
>>>>> (temporal and spatial) should be considered extrinsic properties, once they
>>>>> describe the dataset instead of the meaning of data. Extrinsic properties
>>>>> should be described by standard vocabulariies like DCAT, PROV and the
>>>>> Quality and Data Usage vocabularies.
>>>>> >
>>>>> > Maybe I'm being too strict with this classification, but on the
>>>>> other hand I think this may help the understanding of the different types
>>>>> of metadata and their roles on describing a dataset.
>>>>> >
>>>>> > I'm sending attached a new version of the diagram that I showed on
>>>>> our last meeting. In this new version, I included more subclasses (access,
>>>>> granularity and scope) for the extrinsic metadata. I believe that now it
>>>>> is possible to define the properties (intrinsic and extrinsic) described in
>>>>> your notes.
>>>>> >
>>>>> > It would be great if you could take a look at the diagram and tell
>>>>> me if these ideas make sense to you.
>>>>> >
>>>>> > Thanks again!
>>>>> >
>>>>> > kind regards,
>>>>> > Bernadette
>>>>> >
>>>>> >
>>>>> >
>>>>> > 2014-07-01 10:29 GMT-03:00 Mark Harrison <mark.harrison@gs1.org>:
>>>>> > Dear DWBP colleagues,
>>>>> >
>>>>> > I've added a section to the DWBP wiki with some thoughts about
>>>>> intrinsic vs extrinsic metadata, in response to my action #54 from last
>>>>> Friday's call and the initial discussion there.
>>>>> >
>>>>> > I've now added that section at
>>>>> >
>>>>> >
>>>>> https://www.w3.org/2013/dwbp/wiki/Guidance_on_the_Provision_of_Metadata#Intrinsic_vs_Extrinsic_Metadata
>>>>> >
>>>>> > Maybe it's not the best place for it - in which case, I'm happy for
>>>>> the editors to move it to a better location in the Wiki.
>>>>> >
>>>>> > It's not definitive either - more of a discussion about the kinds of
>>>>> metadata that is intrinsic to the data itself (irrespective of format or
>>>>> access mechanism) and other kinds of metadata that is extrinsic (e.g.
>>>>> depends on a particular format, access mechanism or licence).
>>>>> >
>>>>> > Please feel free to modify this and extend it.
>>>>> >
>>>>> > I hope that it's useful for the discussions that Bernadette and I
>>>>> were having last week, as well as the work Hadley is writing about
>>>>> alternative approaches to data catalogues.
>>>>> >
>>>>> > At least it might help us to ensure that we explore the various
>>>>> 'dimensions' of metadata that might be used by data consumers when
>>>>> searching for datasets or discovering related datasets.  I have also
>>>>> included some ideas about capturing feedback about data usage (e.g. in
>>>>> applications, websites, mash-ups), including links to related datasets that
>>>>> add some valuable context.
>>>>> >
>>>>> > Feel free to develop this further if you think it is useful.
>>>>> >
>>>>> > Best wishes,
>>>>> >
>>>>> > - Mark
>>>>> >
>>>>> >
>>>>> > CONFIDENTIALITY / DISCLAIMER: The contents of this e-mail are
>>>>> confidential and are not to be regarded as a contractual offer or
>>>>> acceptance from GS1 (registered in Belgium). If you are not the addressee,
>>>>> or if this has been copied or sent to you in error, you must not use data
>>>>> herein for any purpose, you must delete it, and should inform the sender.
>>>>> GS1 disclaims liability for accuracy or completeness, and opinions
>>>>> expressed are those of the author alone. GS1 may monitor communications.
>>>>> Third party rights acknowledged. (c) 2013.
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Bernadette Farias Lóscio
>>>>> > Centro de Informática
>>>>> > Universidade Federal de Pernambuco - UFPE, Brazil
>>>>> >
>>>>> ----------------------------------------------------------------------------
>>>>> > <DWBP_metadata.jpg>
>>>>>
>>>>> CONFIDENTIALITY / DISCLAIMER: The contents of this e-mail are
>>>>>  confidential and are not to be regarded as a contractual offer or
>>>>> acceptance from GS1 (registered in Belgium).
>>>>> If you are not the addressee, or if this has been copied or sent to
>>>>> you in error, you must not use data herein for any purpose, you must delete
>>>>> it, and should inform the sender.
>>>>> GS1 disclaims liability for accuracy or completeness, and opinions
>>>>> expressed are those of the author alone.
>>>>> GS1 may monitor communications.
>>>>> Third party rights acknowledged.
>>>>> (c) 2012.
>>>>> </a>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Bernadette Farias Lóscio
>>>>> Centro de Informática
>>>>> Universidade Federal de Pernambuco - UFPE, Brazil
>>>>>
>>>>> ----------------------------------------------------------------------------
>>>>>
>>>>>
>>>>> CONFIDENTIALITY / DISCLAIMER: The contents of this e-mail
>>>>> are confidential and are not to be regarded as a contractual offer or
>>>>> acceptance from GS1 (registered in Belgium). If you are not the addressee,
>>>>> or if this has been copied or sent to you in error, you must not use data
>>>>> herein for any purpose, you must delete it, and should inform the
>>>>> sender. GS1 disclaims liability for accuracy or completeness, and opinions
>>>>> expressed are those of the author alone. GS1 may monitor
>>>>> communications. Third party rights acknowledged. (c) 2013.
>>>>> <DWBP_metadata_v02.jpg><Extrinsic x Intrinsec.jpg>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>  ------------------------------
>>>>> CONFIDENTIALITY / DISCLAIMER: The contents of this e-mail are
>>>>> confidential and are not to be regarded as a contractual offer or
>>>>> acceptance from GS1 (registered in Belgium). If you are not the addressee,
>>>>> or if this has been copied or sent to you in error, you must not use data
>>>>> herein for any purpose, you must delete it, and should inform the sender.
>>>>> GS1 disclaims liability for accuracy or completeness, and opinions
>>>>> expressed are those of the author alone. GS1 may monitor communications.
>>>>> Third party rights acknowledged. (c) 2013.
>>>>> ------------------------------
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Bernadette Farias Lóscio
>>>> Centro de Informática
>>>> Universidade Federal de Pernambuco - UFPE, Brazil
>>>> ----------------------------------------------------------------------------
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> .  .  .  .. .  .
>>> .        .   . ..
>>> .     ..       .
>>>
>>
>>
>>
>> --
>> Bernadette Farias Lóscio
>> Centro de Informática
>> Universidade Federal de Pernambuco - UFPE, Brazil
>> ----------------------------------------------------------------------------
>>
>>
>
>
>
> --
> .  .  .  .. .  .
> .        .   . ..
> .     ..       .
>



-- 
Bernadette Farias Lóscio
Centro de Informática
Universidade Federal de Pernambuco - UFPE, Brazil
----------------------------------------------------------------------------

Received on Friday, 11 July 2014 12:37:43 UTC