Re: Glossary defn of dataset, metadata from Phil Archer on 2015-04-25 (public-dwbp-wg@w3.org from April 2015)

From: Phil Archer <phila@w3.org>
Date: Sat, 25 Apr 2015 09:01:42 +0100
To: "yaso@nic.br" <yaso@nic.br>, public-dwbp-wg@w3.org
Message-ID: <553B49E6.7070900@w3.org>
On 24/04/2015 20:26, yaso@nic.br wrote:
> Yes, I liked that too!
>
> Phil, can I give you access for my fork of dwbp?

I'd rather not. If you add the glossary doc to the WG's own repo I can 
get it and edit it there. Don't be shy! If you want me to I can add the 
ReSpec stuff and the IDs for the <dt> elements - which are essential so 
that when terms are used in the other docs we can link to the term. But, 
see below...

I'm still with
> questions about the right place to put your text, if in the BP doc or in
> the Glossary.

I wrote that text with the glossary in mind, not the BP doc.

>
> And, I still miss more conexion between our lifecycle and this mental
> models.

In my mind - and it is only my mind - the examples you wrote, one of 
which I used, are the mental models. I hope the text written today 
actually shows all we need to show to prove that one person's metadata 
is another person's data and that consumers become publishers. For me 
that's enough - that and the basic CSV example *are* the mental models 
and that's all we need for the glossary which is meant to just be a set 
of terms.

I'd leave discussion of the lifecycle to the BP doc


  For me, the scope delineated by Deirdre needs to be explicitly
> connected with the lifecycle that is cited in the BP doc at:
>
> "This section contains the best practices to be used by data publishers
> in order to help them and data consumers to overcome the different
> challenges faced during the data on the Web lifecycle."
>
> BTW, the word lifecycle shouldn't contain a link to the image proposed
> by Bernadette[1]? It is difficult to identify which lifecycle we are
> referring to...

I thought we took the lifecycle out of the BP doc, no? If Berna finds it 
useful she can put it back but, again, I don't think any of that belongs 
in the glossary which is just a list of terms and definitions - no?

But... we could up the geek stuff here. How about creating a JSON object 
with the definitions and put that on the web separately. Then we can 
easily use it to auto-generate the glossary and use it to create 
mouseovers for the terms when they're used in the other docs.

WDYT?

Phil


>
>
>
> Yaso
>
> [1] https://github.com/w3c/dwbp/blob/gh-pages/images/lifecyclesvg.svg
>
>
>
> On 04/24/2015 11:05 AM, Annette Greiner wrote:
>> I think this is great. I really like the way you describe the example.
>> However, the bit about the overlap between data and metadata is a
>> large amount of text for a very fine point. Could we keep that bit to
>> one or two sentences at most? Right now I feel like the single biggest
>> barrier to use of our document is its length.
>> -Annette
>> --
>> Annette Greiner
>> NERSC Data and Analytics Services
>> Lawrence Berkeley National Laboratory
>> 510-495-2935
>>
>> On Apr 24, 2015, at 6:33 AM, Phil Archer <phila@w3.org> wrote:
>>
>>> Eating some of my own dogfood...
>>>
>>> Yaso asked me for comment on her work on the mental models in the
>>> glossary [1].
>>>
>>> I sent this suggested text:
>>>
>>> <h2>Data, Datasets, Metadata, Publishers and Re-Users</h2>
>>>
>>> <p>When discussing the publication and use of data on the Web, terms
>>> like data, dataset and metadata are commonplace. In a <em>specific
>>> context</em>, the differences between the terms can be clear. For
>>> example, if a CSV file contains a series of numerical values those
>>> values are the data, the totality of the data is the dataset and the
>>> column and row headings are the metadata. Again emphasizing the
>>> context, the simple 'metedata is data about data' definition works.
>>> But, to recycle a sentence from 1997:</p>
>>> <blockquote>The distinction between "data" and "metadata" is not an
>>> absolute one; it is a distinction created primarily by a particular
>>> application, and many times the same resource will be interpreted in
>>> both ways simultaneously.' [RDF-INTRO]</blockquote>
>>> <p>Imagine a system that scrapes the Web site of an online shop, adds
>>> extra pictures and details and then publishes the resulting
>>> information through an API. As far as the online shop is concerned,
>>> the original data is metadata about the products on sale, but to the
>>> person scraping the site, the metadata is now the data and the
>>> enriched data must now be described with new metadata as part of the
>>> API documentation. In this sequence, the data consumer becomes a data
>>> publisher too of course.</p>
>>> <p><strong>Therefore</strong>, in order to present a coherent set of
>>> best practices, the working group takes the view that the same
>>> artifacts (the same bytes), may be thought of as data in one context,
>>> metadata in another, or indeed both simultaneously. Any re-user may
>>> be a publisher, again, perhaps simultaneously. However, in context:</p>
>>>
>>> Data...
>>>
>>> Metadata...
>>>
>>>
>>> "RDF-INTRO":{
>>>         "authors":["Ora Lassila"],
>>>         "href":"http://www.w3.org/TR/NOTE-rdf-simple-intro
>>>         "title":"Introduction to RDF Metadata",
>>>         "status":"Note",
>>>         "publisher":"W3C",
>>>         "date":"13 November 1997"
>>>        }
>>>
>>>
>>>
>>> [1] http://yaso.is/dwbp/glossary.html
>>>
>>> --
>>>
>>>
>>> Phil Archer
>>> W3C Data Activity Lead
>>> http://www.w3.org/2013/data/
>>>
>>> http://philarcher.org
>>> +44 (0)7887 767755
>>> @philarcher1
>>>
>>
>
>

-- 


Phil Archer
W3C Data Activity Lead
http://www.w3.org/2013/data/

http://philarcher.org
+44 (0)7887 767755
@philarcher1
Received on Saturday, 25 April 2015 08:01:44 UTC