- From: Bernadette Farias Lóscio <bfl@cin.ufpe.br>
- Date: Wed, 9 Jul 2014 13:01:12 -0300
- To: Andrea Perego <andrea.perego@jrc.ec.europa.eu>
- Cc: Mark Harrison <mark.harrison@gs1.org>, public-dwbp-wg <public-dwbp-wg@w3.org>, Hadley Beeman <hadley@linkedgov.org>, Phil Archer <phila@w3.org>
- Message-ID: <CANx1Pzx=eiiFM+T_CupbqBJcgrpFOOs9sbZUEGSTALzG6KbsOA@mail.gmail.com>
Dear Andrea, Thank you very much for your comments! I have some comments about your questions: > > 1. My first impression, looking at the diagram you prepared, Bernadette, > is that this classification is merging DCAT (see the notions of dataset and > distribution) and VoID (see the notions of general access and structural > metadata). Is this correct? I'm asking just to better understand the actual > semantics of the different entities in the diagram. > When I prepared the diagram I had DCAT definitions on my mind. However, I also see some intersection with VOID definitions. In my opinion, dataset and distribution definitions should be "general", i.e. independent of data model, for example. > 2. From the diagram it is not clear if some metadata elements are specific > to data, datasets or their distributions, or, rather, they can be used for > all of them. E.g., "access metadata" are just for distributions or also for > data/sets? > The initial idea was to identify metadata to describe datasets. I included access metadata is part of the classification, but I'm not sure if this type of metadata should be used to describe datasets or distributions. Moreover, it is not clear for me what types of metadata should be used to describe the distributions. For example, should we use the same ones that we use to describe datasets? > > 3. I wonder whether structural metadata are meant to describe only the > structure (database schema) or also the content (database instances)? > Actually, in VoID structural metadata are doing both. > Structural metadata should describe the data itself. They should provide an interpretation for the dataset content (i.e. the data). It can be seen as the vocabulary (ontology) that describes the data. I think this idea is different from the structural metadata proposed by VOID. If you have a RDF distribution for a given dataset, maybe you can have a VOID description for this specific distribution. > > 4. The diagram does not model the fact that metadata are, in turn, data. > As such, metadata records may be available in different formats (metadata > distributions) and they can be described by other metadata (this scheme is, > in theory, recursive). A real world example is given by INSPIRE [1], where > we have "metadata on metadata", providing information concerning the > provenance of a metadata record (responsible, language, > creation/publication/modification dates). > Yes, this is a good observation! I agree that metadata itself may have some properties (metadata) . Maybe, we can consider that these properties will be associated to the class metadata and will be inherited by the sublasses. Does it make sense for you? > > Thanks! > > Andrea > Thanks! Bernadette > > ---- > [1]http://inspire.ec.europa.eu/ > > > On Wed, Jul 2, 2014 at 1:51 AM, Bernadette Farias Lóscio <bfl@cin.ufpe.br> > wrote: > >> Hi Mark, >> >> Thanks again for the explanation! These examples are really helpful for >> the understanding of the role of the different types of metadata. >> I think that examples like these will be very useful to illustrate the >> best practices. After having some feedback from the group, it could be nice >> to update the wiki page with the diagrams together with a brief explanation >> and an example for each type of metadata. What do you think? >> >> I'm sending attached a pdf version of the updated diagram. Since I am >> using PowerPoint to create the diagrams, I am including the ppt version as >> well. If you have suggestions for other tools that may help the >> collaborative work, please let me know. >> >> It has been a great discussion! Thanks! >> >> kind regards, >> Bernadette >> >> >> >> >> 2014-07-01 20:09 GMT-03:00 Mark Harrison <mark.harrison@gs1.org>: >> >> Hi Bernadette, >>> >>> Thanks for the further discussion and updates to your diagram. >>> >>> I also like the vertical continuum in the other diagram to express how >>> intrinsic / extrinsic these different kinds of metadata are. >>> >>> I'd say that scope and granularity are distinct and not >>> interchangeable. >>> Scope defines the dimensions and location of the 'bounding box' or >>> 'envelope' in time and space, whereas granularity is a measure of how many >>> sample points there are *within* that bounding box or envelope. >>> >>> A simple example could be weather observation data, where the scope >>> defines that the dataset has a coverage of the United Kingdom for the month >>> of June 2014 and the granularity is dependent on how closely spaced the >>> weather observation stations are and how frequently a new data point is >>> recorded for wind speed, rainfall, barometric pressure etc. - e.g. is it >>> per day, per hour, per minute or per second? They both have temporal and >>> geospatial dimensions, but I'd redraw that part of the diagram like this. >>> >>> By the way - just a suggestion: can we try to export any diagrams >>> like this as vector graphics, either in SVG or PDF? That makes it much >>> easier for us all to make modifications fairly easily, rather than having >>> to kludge bitmap modifications in Photoshop or Gimp. >>> >>> Best wishes, >>> >>> - Mark >>> >>> >>> >>> On 1 Jul 2014, at 22:52, Bernadette Farias Lóscio <bfl@cin.ufpe.br> >>> wrote: >>> >>> Hi Mark, >>> >>> Thank you very much for your explanation! >>> >>> After reading your examples, I agree with you that scope is a intrinsec >>> property, once it provides a better understanding about the meanining of >>> the data itself (this was my initial idea about intrinsec metadata). In >>> the Data on the Web context, structural information is not enough to >>> provide the semantics of the data, we need more information, like the scope >>> of the data. >>> >>> Instead of removing the classification, I suggest to have two categories >>> of intrinsec metadata: scope/granularity and structural. Do you think that >>> scope and granularity can be considered together as a single category? >>> >>> I also agree that "these characteristics really fit on a sliding scale >>> between Very Intrinsic and Very Extrinsic, with some middle ground in >>> between". I created a figure that tries to illustrate this idea. Thi figure >>> is attached. >>> >>> I'm sending attached another version of the diagram with the idea of a >>> new classification. >>> >>> Yes, this discussion is very interesting and it is really important for >>> best practices identification and definition :) >>> >>> Thanks again! >>> >>> Kind regards, >>> Bernadette >>> >>> >>> >>> >>> 2014-07-01 17:51 GMT-03:00 Mark Harrison <mark.harrison@gs1.org>: >>> Hello Bernadette, >>> >>> Thanks for your updated diagram. >>> >>> I don't mind if we have slightly different opinions about where to draw >>> the boundary between 'intrinsic' and 'extrinsic'. >>> >>> We both agree that structural metadata (what kind of data is it?) is >>> intrinsic. >>> >>> I think the scope metadata is perhaps on the boundary between intrinsic >>> and extrinsic, in the sense that even if you transform the data into >>> another format or provide it through a different access method, the scope >>> remains invariant. >>> >>> For example, consider local government spending data. >>> >>> At one level, you need intrinsic structural metadata that says 'this is >>> spending per year on this expenditure category in this region', and we use >>> classes and predicates from controlled vocabularies to express that so >>> that anyone looking for that kind of data can find it, no matter which >>> local government authority published it. There may be domain-specific data >>> publishing guidelines that recommend specific vocabularies to use. Some >>> will be core W3C vocabularies. Others may be more domain-specific but >>> ideally globally defined and multi-lingual. >>> >>> At another level, you want to be able to identify a particular dataset >>> by its temporal and spatial scope. I consider this to be intrinsic to the >>> dataset, even though it's not a structural description. If a dataset of >>> local government spending data is published for a particular city and a >>> particular fiscal year, the data contained within that dataset has that >>> scope. We can transform that set of data into different formats and >>> provide additional methods to access it - and that temporal+spatial scope >>> remains invariant under those changes. We can't transform the spending >>> data for London in 1999 into the spending data for Paris in 2013. They are >>> distinguishing characteristics of the data itself that distinguishes one >>> set of data from another set of data, even when they share the same >>> structural semantics. That's why I think of temporal/spatial scope as >>> being intrinsic to the dataset and its data, because they are (in my >>> opinion) equally important to the meaning of the data - they're >>> effectively expressing what the data is about (i.e. its subject or scope), >>> whereas the intrinsic structural metadata says 'this is government spending >>> for a particular city or region and a particular time interval'. You >>> actually need both. >>> >>> At another level, you want to explain which formats are available, how >>> you can access it, which licence applies for usage of the data. Those >>> things feel much more extrinsic, because they can change over time - >>> e.g. additional formats and access methods can be provided, other formats >>> or access methods might be deprecated or withdrawn. A licence might be >>> changed to a more liberal licence - or a more restrictive licence. >>> >>> However, we can agree to differ about the boundary between intrinsic and >>> extrinsic - and as I wrote, it's probably something of a continuum or >>> sliding scale, rather than only consisting of only two possibilities with a >>> very clearly defined boundary between them. >>> >>> The main issue is to use this exercise as a way to explore all the >>> useful dimensions of metadata and identify the best practice ways of >>> expressing those - and it seems that this discussion is helping to make >>> some additional progress in that direction. >>> >>> I like your updated diagram. Maybe it's easier for everyone to agree on >>> it if we remove the words 'intrinsic' and 'extrinsic' from the diagram but >>> just use them internally for the thought processes that try to make it as >>> complete as possible. >>> >>> Best wishes, >>> >>> - Mark >>> >>> >>> >>> >>> On 1 Jul 2014, at 20:51, Bernadette Farias Lóscio <bfl@cin.ufpe.br> >>> wrote: >>> >>> > Hello Mark, >>> > >>> > Thank you very much for sharing your thoughts about metadata >>> definition. >>> > >>> > I read your notes on the wiki page and I have some comments: >>> > >>> > - I agree with you that a dataset may be described by two types of >>> metadata. The metadatas that describes the data itself (intrinsic one) and >>> the metadata that describes the dataset (extrinsic metadata). In the >>> diagram that I showed in the last meeting, I called them structural and >>> descriptive metadata. >>> > >>> > - I believe that intrinsic properties are the ones that describe the >>> meaning of the data itself, like concepts, classes and properties. >>> Intrinsic metadata has a similar role of a database schema and should be >>> described by a domain vocabulary. >>> > >>> > - In this case, Scope (temporal and geographic) and Granularity >>> (temporal and spatial) should be considered extrinsic properties, once they >>> describe the dataset instead of the meaning of data. Extrinsic properties >>> should be described by standard vocabulariies like DCAT, PROV and the >>> Quality and Data Usage vocabularies. >>> > >>> > Maybe I'm being too strict with this classification, but on the other >>> hand I think this may help the understanding of the different types of >>> metadata and their roles on describing a dataset. >>> > >>> > I'm sending attached a new version of the diagram that I showed on our >>> last meeting. In this new version, I included more subclasses (access, >>> granularity and scope) for the extrinsic metadata. I believe that now it >>> is possible to define the properties (intrinsic and extrinsic) described in >>> your notes. >>> > >>> > It would be great if you could take a look at the diagram and tell me >>> if these ideas make sense to you. >>> > >>> > Thanks again! >>> > >>> > kind regards, >>> > Bernadette >>> > >>> > >>> > >>> > 2014-07-01 10:29 GMT-03:00 Mark Harrison <mark.harrison@gs1.org>: >>> > Dear DWBP colleagues, >>> > >>> > I've added a section to the DWBP wiki with some thoughts about >>> intrinsic vs extrinsic metadata, in response to my action #54 from last >>> Friday's call and the initial discussion there. >>> > >>> > I've now added that section at >>> > >>> > >>> https://www.w3.org/2013/dwbp/wiki/Guidance_on_the_Provision_of_Metadata#Intrinsic_vs_Extrinsic_Metadata >>> > >>> > Maybe it's not the best place for it - in which case, I'm happy for >>> the editors to move it to a better location in the Wiki. >>> > >>> > It's not definitive either - more of a discussion about the kinds of >>> metadata that is intrinsic to the data itself (irrespective of format or >>> access mechanism) and other kinds of metadata that is extrinsic (e.g. >>> depends on a particular format, access mechanism or licence). >>> > >>> > Please feel free to modify this and extend it. >>> > >>> > I hope that it's useful for the discussions that Bernadette and I were >>> having last week, as well as the work Hadley is writing about alternative >>> approaches to data catalogues. >>> > >>> > At least it might help us to ensure that we explore the various >>> 'dimensions' of metadata that might be used by data consumers when >>> searching for datasets or discovering related datasets. I have also >>> included some ideas about capturing feedback about data usage (e.g. in >>> applications, websites, mash-ups), including links to related datasets that >>> add some valuable context. >>> > >>> > Feel free to develop this further if you think it is useful. >>> > >>> > Best wishes, >>> > >>> > - Mark >>> > >>> > >>> > CONFIDENTIALITY / DISCLAIMER: The contents of this e-mail are >>> confidential and are not to be regarded as a contractual offer or >>> acceptance from GS1 (registered in Belgium). If you are not the addressee, >>> or if this has been copied or sent to you in error, you must not use data >>> herein for any purpose, you must delete it, and should inform the sender. >>> GS1 disclaims liability for accuracy or completeness, and opinions >>> expressed are those of the author alone. GS1 may monitor communications. >>> Third party rights acknowledged. (c) 2013. >>> > >>> > >>> > >>> > -- >>> > Bernadette Farias Lóscio >>> > Centro de Informática >>> > Universidade Federal de Pernambuco - UFPE, Brazil >>> > >>> ---------------------------------------------------------------------------- >>> > <DWBP_metadata.jpg> >>> >>> CONFIDENTIALITY / DISCLAIMER: The contents of this e-mail are >>> confidential and are not to be regarded as a contractual offer or >>> acceptance from GS1 (registered in Belgium). >>> If you are not the addressee, or if this has been copied or sent to you >>> in error, you must not use data herein for any purpose, you must delete it, >>> and should inform the sender. >>> GS1 disclaims liability for accuracy or completeness, and opinions >>> expressed are those of the author alone. >>> GS1 may monitor communications. >>> Third party rights acknowledged. >>> (c) 2012. >>> </a> >>> >>> >>> >>> >>> -- >>> Bernadette Farias Lóscio >>> Centro de Informática >>> Universidade Federal de Pernambuco - UFPE, Brazil >>> >>> ---------------------------------------------------------------------------- >>> >>> >>> CONFIDENTIALITY / DISCLAIMER: The contents of this e-mail >>> are confidential and are not to be regarded as a contractual offer or >>> acceptance from GS1 (registered in Belgium). If you are not the addressee, >>> or if this has been copied or sent to you in error, you must not use data >>> herein for any purpose, you must delete it, and should inform the >>> sender. GS1 disclaims liability for accuracy or completeness, and opinions >>> expressed are those of the author alone. GS1 may monitor >>> communications. Third party rights acknowledged. (c) 2013. >>> <DWBP_metadata_v02.jpg><Extrinsic x Intrinsec.jpg> >>> >>> >>> >>> >>> ------------------------------ >>> CONFIDENTIALITY / DISCLAIMER: The contents of this e-mail are >>> confidential and are not to be regarded as a contractual offer or >>> acceptance from GS1 (registered in Belgium). If you are not the addressee, >>> or if this has been copied or sent to you in error, you must not use data >>> herein for any purpose, you must delete it, and should inform the sender. >>> GS1 disclaims liability for accuracy or completeness, and opinions >>> expressed are those of the author alone. GS1 may monitor communications. >>> Third party rights acknowledged. (c) 2013. >>> ------------------------------ >>> >> >> >> >> -- >> Bernadette Farias Lóscio >> Centro de Informática >> Universidade Federal de Pernambuco - UFPE, Brazil >> ---------------------------------------------------------------------------- >> >> > > > > -- > Andrea Perego, Ph.D. > European Commission DG JRC > Institute for Environment & Sustainability > Unit H06 - Digital Earth & Reference Data > Via E. Fermi, 2749 - TP 262 > 21027 Ispra VA, Italy > > https://ec.europa.eu/jrc/ > > ---- > The views expressed are purely those of the writer and may > not in any circumstances be regarded as stating an official > position of the European Commission. > -- Bernadette Farias Lóscio Centro de Informática Universidade Federal de Pernambuco - UFPE, Brazil ----------------------------------------------------------------------------
Received on Wednesday, 9 July 2014 16:02:03 UTC