Re: Glossary defn of dataset, metadata

Thanks Annette, yes it's over-repetitive. Let me try again (if you can 
shorten it further, please do):

<h2>Data, Datasets, Metadata, Publishers and Re-Users</h2>

<p>When discussing the publication and use of data on the Web, terms 
like data, dataset and metadata are commonplace. In a <em>specific 
context</em>, the differences between the terms can be clear. For 
example, if a CSV file contains a series of numerical values those 
values are the data, the totality of the data is the dataset and the 
column and row headings are the metadata. In this context, the simple 
'metedata is data about data' definition works. But imagine a system 
that scrapes the Web site of an online shop, adds extra pictures and 
details and then publishes the resulting information through an API. As 
far as the online shop is concerned, the original data is metadata about 
the products on sale, but to the person scraping the site, the metadata 
is now the data and the enriched data must now be described with new 
metadata as part of the API documentation. In this sequence, the data 
consumer becomes a data publisher too of course.</p>

<dl>
   <dt id="dataset">Dataset</dt>
   <dd>From <a 
href="http://www.w3.org/TR/vocab-dcat/#class-dataset"><abbr title="Data 
Catalog Vocabulary">DCAT<abbr></a>: A collection of data, published or 
curated by a single agent, and available for access or download in one 
or more formats.</dd>
   <dt id="data">Data</dt>
   <dl>A number, string or other object that is processed by software.</dd>
   <dt id="metadata">Metadata</dt>
   <dd>Metadata is the data that describes the data to be processed, but 
that is not itself processed.</dd>
</dl>

<p>As noted, and quoting a sentence from 1997:</p>
<blockquote>The distinction between "data" and "metadata" is not an 
absolute one; it is a distinction created primarily by a particular 
application, and many times the same resource will be interpreted in 
both ways simultaneously.' [RDF-INTRO]</blockquote>


"RDF-INTRO":{
         "authors":["Ora Lassila"],
         "href":"http://www.w3.org/TR/NOTE-rdf-simple-intro
         "title":"Introduction to RDF Metadata",
         "status":"Note",
         "publisher":"W3C",
         "date":"13 November 1997"
        }




On 24/04/2015 15:05, Annette Greiner wrote:
> I think this is great. I really like the way you describe the example. However, the bit about the overlap between data and metadata is a large amount of text for a very fine point. Could we keep that bit to one or two sentences at most? Right now I feel like the single biggest barrier to use of our document is its length.
> -Annette
> --
> Annette Greiner
> NERSC Data and Analytics Services
> Lawrence Berkeley National Laboratory
> 510-495-2935
>
> On Apr 24, 2015, at 6:33 AM, Phil Archer <phila@w3.org> wrote:
>
>> Eating some of my own dogfood...
>>
>> Yaso asked me for comment on her work on the mental models in the glossary [1].
>>
>> I sent this suggested text:
>>
>> <h2>Data, Datasets, Metadata, Publishers and Re-Users</h2>
>>
>> <p>When discussing the publication and use of data on the Web, terms like data, dataset and metadata are commonplace. In a <em>specific context</em>, the differences between the terms can be clear. For example, if a CSV file contains a series of numerical values those values are the data, the totality of the data is the dataset and the column and row headings are the metadata. Again emphasizing the context, the simple 'metedata is data about data' definition works. But, to recycle a sentence from 1997:</p>
>> <blockquote>The distinction between "data" and "metadata" is not an absolute one; it is a distinction created primarily by a particular application, and many times the same resource will be interpreted in both ways simultaneously.' [RDF-INTRO]</blockquote>
>> <p>Imagine a system that scrapes the Web site of an online shop, adds extra pictures and details and then publishes the resulting information through an API. As far as the online shop is concerned, the original data is metadata about the products on sale, but to the person scraping the site, the metadata is now the data and the enriched data must now be described with new metadata as part of the API documentation. In this sequence, the data consumer becomes a data publisher too of course.</p>
>> <p><strong>Therefore</strong>, in order to present a coherent set of best practices, the working group takes the view that the same artifacts (the same bytes), may be thought of as data in one context, metadata in another, or indeed both simultaneously. Any re-user may be a publisher, again, perhaps simultaneously. However, in context:</p>
>>
>> Data...
>>
>> Metadata...
>>
>>
>> "RDF-INTRO":{
>>         "authors":["Ora Lassila"],
>>         "href":"http://www.w3.org/TR/NOTE-rdf-simple-intro
>>         "title":"Introduction to RDF Metadata",
>>         "status":"Note",
>>         "publisher":"W3C",
>>         "date":"13 November 1997"
>>        }
>>
>>
>>
>> [1] http://yaso.is/dwbp/glossary.html
>>
>> --
>>
>>
>> Phil Archer
>> W3C Data Activity Lead
>> http://www.w3.org/2013/data/
>>
>> http://philarcher.org
>> +44 (0)7887 767755
>> @philarcher1
>>
>
>
>

-- 


Phil Archer
W3C Data Activity Lead
http://www.w3.org/2013/data/

http://philarcher.org
+44 (0)7887 767755
@philarcher1

Received on Friday, 24 April 2015 16:29:41 UTC