Re: Scoping: "Tabular Data" from Ceolin, D. on 2014-03-02 (public-csv-wg@w3.org from March 2014)

From: Ceolin, D. <d.ceolin@vu.nl>
Date: Sun, 2 Mar 2014 17:47:48 +0000
To: Jeni Tennison <jeni@jenitennison.com>
CC: Dan Brickley <danbri@google.com>, "public-csv-wg@w3.org" <public-csv-wg@w3.org>
Message-ID: <F62C3F7D-2B02-4357-A8D1-BCCD2DC589E5@vu.nl>
Hi Jeni,

that's clear, thanks. What about the meaning of each row? (sorry for being pedantic...)
Best,

Davide

Il giorno 01/mar/2014, alle ore 23.31, Jeni Tennison ha scritto:

> Davide,
> 
> I think the upshot of the discussion was that we came to an agreement that in *tabular* data, each column has a consistent meaning across all rows.
> 
> I’m not sure that conclusion addresses your query.
> 
> Jeni
> 
> ------------------------------------------------------
> From: Ceolin, D. d.ceolin@vu.nl
> Reply: Ceolin, D. d.ceolin@vu.nl
> Date: 28 February 2014 at 11:42:11
> To: Jeni Tennison jeni@jenitennison.com
> Subject:  Re: Scoping: "Tabular Data"
> 
>> 
>> Hi all,
>> 
>> I'm adding Tim's use case to the "use case and requirements doc",  
>> and I was wondering what conclusion we drew from this discussion,  
>> if any.
>> In particular, I'd say that not only in Tim's case “Each row is  
>> a statement”, but also "Each row is a statement and possibly one  
>> or more annotations about that statement".
>> This may add some ambiguity (e.g. is the confidence related only  
>> to the triple or to the triple and its provenance?), but offers  
>> also an easy way to annotate statements (and, BTW, how would that  
>> be translated into RDF? By means of reification or else? I'm very  
>> interested in trust value representations and related).
>> Also, I'm not sure if these issues are fully covered by the PrimaryKey  
>> and SemanticTypeDefinition requirements.
>> Cheers,
>> 
>> Davide
>> 
>> 
>>> In your bitmap case, you can say:
>>> 
>>> “Each row is a *row of a bitmap* and the columns are the *first  
>> pixel*, *second pixel*, *third pixel*... of the row.”
>>> 
>>> Conversely, in Tim’s case, you can say “Each row is a statement”,  
>> but you can’t name the columns in a regular way in terms of being  
>> a property of each statement.
>>> 
>>> Cheers,
>>> 
>>> Jeni
>>> 
>>> (*) or “represents” or “contains information about” or whatever  
>> you want to say to be more semantically accurate
>>> 
>>> ------------------------------------------------------  
>>> From: Dan Brickley danbri@google.com
>>> Reply: Dan Brickley danbri@google.com
>>> Date: 23 February 2014 at 16:09:18
>>> To: Jeni Tennison jeni@theodi.org
>>> Subject: Re: Scoping: "Tabular Data"
>>> 
>>>> 
>>>> On 23 February 2014 15:19, Jeni Tennison
>>>> wrote:
>>>>> Hi,
>>>>> 
>>>>> Another scoping question, brought up from Tim Finin’s example  
>>>> from:
>>>>> 
>>>>> https://www.w3.org/2013/csvw/wiki/Use_Cases#Representing_entitles_and_facts_extracted_from_text  
>>>>> 
>>>>> 1> :e4 type PER
>>>>> 2> :e4 mention "Bart" D00124 283-286
>>>>> 3> :e4 mention "JoJo" D00124 145-149 0.9
>>>>> 4> :e4 per:siblings :e7 D00124 283-286 173-179 274-281
>>>>> 5> :e4 per:age "10" D00124 180-181 173-179 182-191 0.9
>>>>> 6> :e4 per:parent :e9 D00124 180-181 381-380 399-406 D00101  
>>>> 220-225 230-233 201-210
>>>>> ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
>>>>> 1 2 3 4 5 6 7 8 9 10 11
>>>>> 
>>>>> (I’ve added numbers for the implied columns.)
>>>>> 
>>>>> To me, this looks like a text-based format in which each line  
>>>> has a defined format, but where there isn’t the commonality  
>> between
>>>> values in a single column that I would normally expect in what  
>>>> I would consider a tabular format.
>>>>> 
>>>>> So for example, column 6 contains a certainty value on line  
>> 3
>>>> and an offset range in lines 4-6, while column 8 contains a certainty  
>>>> value on line 5 and a document ID on line 6.
>>>>> 
>>>>> If the data looked like (comma separators added for clarity):  
>>>>> 
>>>>> :e4, type, PER, ,
>>>>> :e4, mention, ”Bart”, D00124 283-286,
>>>>> :e4, mention, ”JoJo”, D00124 145-149, 0.9
>>>>> :e4, per:siblings, :e7, D00124 283-286 173-179 274-281,  
>>>>> :e4, per:age, "10" D00124 180-181 173-179 182-191, 0.9
>>>>> :e4, per:parent, :e9 D00124 180-181 381-380 399-406 D00101  
>>>> 220-225 230-233 201-210,
>>>>> ^ ^ ^ ^ ^
>>>>> 1 2 3 4 5
>>>>> 
>>>>> then I would consider it tabular data and could add headers:  
>>>>> 
>>>>> 1: subject
>>>>> 2: predicate
>>>>> 3: object
>>>>> 4: location
>>>>> 5: certainty
>>>>> 
>>>>> Can/should we define tabular data as data where all values  
>> in
>>>> a given column have a common meaning?
>>>> 
>>>> In this last form, you might argue that when relationship typing  
>>>> is
>>>> pushed down into cell values, i.e. potentially a different  
>> predicate
>>>> in each row, then that column does not really have a "common  
>> meaning".
>>>> Or you might say the column does have a broader fixed meaning:  
>>>> it
>>>> carries information about how values from other columns relate  
>>>> to each
>>>> other.
>>>> 
>>>> For the sake of thought experiment I find it useful to come back  
>>>> to
>>>> pixel-style representation. Consider a 640x480 grid in which  
>>>> red-ness,
>>>> green-ness and blue-ness values are packed into each cell.  
>> Perhaps
>>>> with a sub-notation using ':', on a 0-1 scale for now:
>>>> 
>>>> So,
>>>> 
>>>> 0.4:1.0:0.0, 0.0:0.0:0.0, 1.0:1.0;1.0, 0.4:1.0:0.0
>>>> 1.0:1.0:1.0, 1.0:1.0:1.0, 1.0:1.0:1.0, 1.0:1.0:1.0 ...  
>> might
>>>> give us a
>>>> fragment of such a grid, with neon, black, white etc cells.  
>>>> 
>>>> Q: Do these columns have regular meaning?
>>>> A: Yes; they stand for a column of pixels in a bitmap
>>>> A: No; each row-column combination stands for a distinct entity  
>>>> (pixel value)
>>>> 
>>>> Q: Is it useful to use W3C CSVW's work to describe this?
>>>> A: Sure. It can help us get the syntax details right (whitespace,  
>>>> quotes, newlines) between tools; and it can provide arbitrary  
>>>> per-file
>>>> metadata. For example the metadata might tell us that the grid  
>>>> of
>>>> colours comes from dan's security camera photo at such-and-so  
>>>> a date.
>>>> 
>>>> Q: Isn't this iffy, since there are much better binary representations  
>>>> for such data? (e.g. digital image formats)
>>>> A: Yes, but that can be true for more obviously factual data  
>> too.
>>>> 
>>>> Maybe what I'm getting at here is that I'm not sure what "a common  
>>>> meaning" for columns might mean. On the last call I tried to  
>> talk
>>>> about columns being "homogenous" but that was more in terms  
>> of
>>>> low
>>>> level data-typing. For example, a column might always contain  
>>>> ISO-8601-style dates, i.e. YYYY-MM-DD. But what they *mean*  
>>>> (birthdate, deathdate, date hired, favourite date, ...)  
>> could
>>>> be fixed
>>>> by the meaning of a different column. So the column could be  
>>>> datatype-homogenous but the nature of it's per-cell meaning  
>>>> could vary
>>>> per cell.
>>>> 
>>>> Dan
>>>> 
>>>> 
>>>> 
>>> 
>>> --
>>> Jeni Tennison
>>> http://www.jenitennison.com/
>>> 
>> 
>> 
>> 
>> 
> 
> --  
> Jeni Tennison
> http://www.jenitennison.com/
Received on Sunday, 2 March 2014 17:48:18 UTC