Re: Intrinsic vs extrinsic metadata (my action #54)

Hi Bernadette,

Thanks for the further discussion and updates to your diagram.

I also like the vertical continuum in the other diagram to express how intrinsic / extrinsic these different kinds of metadata are.

I'd say that scope and granularity are distinct and not interchangeable.
Scope defines the dimensions and location of the 'bounding box' or 'envelope' in time and space, whereas granularity is a measure of how many sample points there are *within* that bounding box or envelope.

A simple example could be weather observation data, where the scope defines that the dataset has a coverage of the United Kingdom for the month of June 2014 and the granularity is dependent on how closely spaced the weather observation stations are and how frequently a new data point is recorded for wind speed, rainfall, barometric pressure etc. - e.g. is it per day, per hour, per minute or per second?  They both have temporal and geospatial dimensions, but I'd redraw that part of the diagram like this.

By the way - just a suggestion:  can we try to export any diagrams like this as vector graphics, either in SVG or PDF?  That makes it much easier for us all to make modifications fairly easily, rather than having to kludge bitmap modifications in Photoshop or Gimp.

Best wishes,

- Mark

[cid:6E2C282F-A245-47DA-B106-0178E5355348]


On 1 Jul 2014, at 22:52, Bernadette Farias Lóscio <bfl@cin.ufpe.br<mailto:bfl@cin.ufpe.br>> wrote:

Hi Mark,

Thank you very much for your explanation!

After reading your examples, I agree with you that scope is a intrinsec property, once it provides a better understanding about the meanining of the data itself (this was my initial idea about intrinsec metadata).  In the Data on the Web context, structural information is not enough to provide the semantics of the data, we need more information, like the scope of the data.

Instead of removing the classification, I suggest to have two categories of intrinsec metadata: scope/granularity and structural. Do you think that scope and granularity can be considered together as a single category?

I also agree that "these characteristics really fit on a sliding scale between Very Intrinsic and Very Extrinsic, with some middle ground in between". I created a figure that tries to illustrate this idea. Thi figure is attached.

I'm sending attached another version of the diagram with the idea of a new classification.

Yes, this discussion is very interesting and it is really important for best practices identification and definition :)

Thanks again!

Kind regards,
Bernadette




2014-07-01 17:51 GMT-03:00 Mark Harrison <mark.harrison@gs1.org<mailto:mark.harrison@gs1.org>>:
Hello Bernadette,

Thanks for your updated diagram.

I don't mind if we have slightly different opinions about where to draw the boundary between 'intrinsic' and 'extrinsic'.

We both agree that structural metadata (what kind of data is it?) is intrinsic.

I think the scope metadata is perhaps on the boundary between intrinsic and extrinsic, in the sense that even if you transform the data into another format or provide it through a different access method, the scope remains invariant.

For example, consider local government spending data.

At one level, you need intrinsic structural metadata that says 'this is spending per year on this expenditure category in this region', and we use classes and predicates from controlled vocabularies to express that so that anyone looking for that kind of data can find it, no matter which local government authority published it.  There may be domain-specific data publishing guidelines that recommend specific vocabularies to use.  Some will be core W3C vocabularies.  Others may be more domain-specific but ideally globally defined and multi-lingual.

At another level, you want to be able to identify a particular dataset by its temporal and spatial scope.  I consider this to be intrinsic to the dataset, even though it's not a structural description.  If a dataset of local government spending data is published for a particular city and a particular fiscal year, the data contained within that dataset has that scope.  We can transform that set of data into different formats and provide additional methods to access it - and that temporal+spatial scope remains invariant under those changes.  We can't transform the spending data for London in 1999 into the spending data for Paris in 2013.  They are distinguishing characteristics of the data itself that distinguishes one set of data from another set of data, even when they share the same structural semantics.  That's why I think of temporal/spatial scope as being intrinsic to the dataset and its data, because they are (in my opinion) equally important to the meaning of the data - they're effectively expressing what the data is about (i.e. its subject or scope), whereas the intrinsic structural metadata says 'this is government spending for a particular city or region and a particular time interval'.  You actually need both.

At another level, you want to explain which formats are available, how you can access it, which licence applies for usage of the data.  Those things feel much more extrinsic, because they can change over time - e.g. additional formats and access methods can be provided, other formats or access methods might be deprecated or withdrawn.  A licence might be changed to a more liberal licence - or a more restrictive licence.

However, we can agree to differ about the boundary between intrinsic and extrinsic - and as I wrote, it's probably something of a continuum or sliding scale, rather than only consisting of only two possibilities with a very clearly defined boundary between them.

The main issue is to use this exercise as a way to explore all the useful dimensions of metadata and identify the best practice ways of expressing those - and it seems that this discussion is helping to make some additional progress in that direction.

I like your updated diagram.  Maybe it's easier for everyone to agree on it if we remove the words 'intrinsic' and 'extrinsic' from the diagram but just use them internally for the thought processes that try to make it as complete as possible.

Best wishes,

- Mark




On 1 Jul 2014, at 20:51, Bernadette Farias Lóscio <bfl@cin.ufpe.br>
 wrote:

> Hello Mark,
>
> Thank you very much for sharing your thoughts about metadata definition.
>
> I read your notes on the wiki page and I have some comments:
>
> - I agree with you that a dataset may be described by two types of metadata. The metadatas that describes the data itself (intrinsic one) and the metadata that describes the dataset (extrinsic metadata). In the diagram that I showed in the last meeting, I called them structural and descriptive metadata.
>
> - I believe that intrinsic properties are the ones that describe the meaning of the data itself, like concepts, classes and properties. Intrinsic metadata has a similar role of a database schema and should be described by a domain vocabulary.
>
> - In this case,  Scope (temporal and geographic) and Granularity (temporal and spatial) should be considered extrinsic properties, once they describe the dataset instead of the meaning of data. Extrinsic properties should be described by standard vocabulariies like DCAT, PROV and the Quality and Data Usage vocabularies.
>
> Maybe I'm being too strict with this classification, but on the other hand I think this may help the understanding of the different types of metadata and their roles on describing a dataset.
>
> I'm sending attached a new version of the diagram that I showed on our last meeting. In this new version, I included more subclasses (access, granularity and scope) for the extrinsic metadata. I believe that now it is possible to define the properties (intrinsic and extrinsic) described in your notes.
>
> It would be great if you could take a look at the diagram and tell me if these ideas make sense to you.
>
> Thanks again!
>
> kind regards,
> Bernadette
>
>
>
> 2014-07-01 10:29 GMT-03:00 Mark Harrison <mark.harrison@gs1.org>:
> Dear DWBP colleagues,
>
> I've added a section to the DWBP wiki with some thoughts about intrinsic vs extrinsic metadata, in response to my action #54 from last Friday's call and the initial discussion there.
>
> I've now added that section at
>
> https://www.w3.org/2013/dwbp/wiki/Guidance_on_the_Provision_of_Metadata#Intrinsic_vs_Extrinsic_Metadata
>
> Maybe it's not the best place for it - in which case, I'm happy for the editors to move it to a better location in the Wiki.
>
> It's not definitive either - more of a discussion about the kinds of metadata that is intrinsic to the data itself (irrespective of format or access mechanism) and other kinds of metadata that is extrinsic (e.g. depends on a particular format, access mechanism or licence).
>
> Please feel free to modify this and extend it.
>
> I hope that it's useful for the discussions that Bernadette and I were having last week, as well as the work Hadley is writing about alternative approaches to data catalogues.
>
> At least it might help us to ensure that we explore the various 'dimensions' of metadata that might be used by data consumers when searching for datasets or discovering related datasets.  I have also included some ideas about capturing feedback about data usage (e.g. in applications, websites, mash-ups), including links to related datasets that add some valuable context.
>
> Feel free to develop this further if you think it is useful.
>
> Best wishes,
>
> - Mark
>
>
> CONFIDENTIALITY / DISCLAIMER: The contents of this e-mail are confidential and are not to be regarded as a contractual offer or acceptance from GS1 (registered in Belgium). If you are not the addressee, or if this has been copied or sent to you in error, you must not use data herein for any purpose, you must delete it, and should inform the sender. GS1 disclaims liability for accuracy or completeness, and opinions expressed are those of the author alone. GS1 may monitor communications. Third party rights acknowledged. (c) 2013.
>
>
>
> --
> Bernadette Farias Lóscio
> Centro de Informática
> Universidade Federal de Pernambuco - UFPE, Brazil
> ----------------------------------------------------------------------------
> <DWBP_metadata.jpg>

CONFIDENTIALITY / DISCLAIMER: The contents of this e-mail are  confidential and are not to be regarded as a contractual offer or acceptance from GS1 (registered in Belgium).
If you are not the addressee, or if this has been copied or sent to you in error, you must not use data herein for any purpose, you must delete it, and should inform the sender.
GS1 disclaims liability for accuracy or completeness, and opinions expressed are those of the author alone.
GS1 may monitor communications.
Third party rights acknowledged.
(c) 2012.
</a>




--
Bernadette Farias Lóscio
Centro de Informática
Universidade Federal de Pernambuco - UFPE, Brazil
----------------------------------------------------------------------------


CONFIDENTIALITY / DISCLAIMER: The contents of this e-mail are confidential and are not to be regarded as a contractual offer or acceptance from GS1 (registered in Belgium). If you are not the addressee, or if this has been copied or sent to you in error, you must not use data herein for any purpose, you must delete it, and should inform the sender. GS1 disclaims liability for accuracy or completeness, and opinions expressed are those of the author alone. GS1 may monitor communications. Third party rights acknowledged. (c) 2013.
<DWBP_metadata_v02.jpg><Extrinsic x Intrinsec.jpg>

CONFIDENTIALITY / DISCLAIMER: The contents of this e-mail are  confidential and are not to be regarded as a contractual offer or acceptance from GS1 (registered in Belgium). 
If you are not the addressee, or if this has been copied or sent to you in error, you must not use data herein for any purpose, you must delete it, and should inform the sender. 
GS1 disclaims liability for accuracy or completeness, and opinions expressed are those of the author alone. 
GS1 may monitor communications. 
Third party rights acknowledged. 
(c) 2012.
</a>

Received on Tuesday, 1 July 2014 23:09:48 UTC