Re: Scoping: "Tabular Data" from Jeni Tennison on 2014-02-24 (public-csv-wg@w3.org from February 2014)

From: Jeni Tennison <jeni@jenitennison.com>
Date: Sun, 23 Feb 2014 17:03:24 -0800
To: Dan Brickley <danbri@google.com>
Cc: "public-csv-wg@w3.org" <public-csv-wg@w3.org>
Message-ID: <etPan.530a9a5c.79e2a9e3.10f@jenit.local>
Dan,

But in the case where you have:

  person1,birthdate,1912-04-23
  person1,deathdate,1993-03-30
  ...

you can still label the columns in a regular way (entity, property, value). You can fill in a statement that says:

  "Each row is(*) a X and the columns are the A,B,C… of the X”

ie

  “Each row is a *statement* and the columns are the *entity*, *property* and *value* of the statement.”

In your bitmap case, you can say:

  “Each row is a *row of a bitmap* and the columns are the *first pixel*, *second pixel*, *third pixel*... of the row.”

Conversely, in Tim’s case, you can say “Each row is a statement”, but you can’t name the columns in a regular way in terms of being a property of each statement.

Cheers,

Jeni

(*) or “represents” or “contains information about” or whatever you want to say to be more semantically accurate

------------------------------------------------------
From: Dan Brickley danbri@google.com
Reply: Dan Brickley danbri@google.com
Date: 23 February 2014 at 16:09:18
To: Jeni Tennison jeni@theodi.org
Subject:  Re: Scoping: "Tabular Data"

>  
> On 23 February 2014 15:19, Jeni Tennison  
> wrote:
> > Hi,
> >
> > Another scoping question, brought up from Tim Finin’s example  
> from:
> >
> > https://www.w3.org/2013/csvw/wiki/Use_Cases#Representing_entitles_and_facts_extracted_from_text  
> >
> > 1> :e4 type PER
> > 2> :e4 mention "Bart" D00124 283-286
> > 3> :e4 mention "JoJo" D00124 145-149 0.9
> > 4> :e4 per:siblings :e7 D00124 283-286 173-179 274-281
> > 5> :e4 per:age "10" D00124 180-181 173-179 182-191 0.9
> > 6> :e4 per:parent :e9 D00124 180-181 381-380 399-406 D00101  
> 220-225 230-233 201-210
> > ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
> > 1 2 3 4 5 6 7 8 9 10 11
> >
> > (I’ve added numbers for the implied columns.)
> >
> > To me, this looks like a text-based format in which each line  
> has a defined format, but where there isn’t the commonality between  
> values in a single column that I would normally expect in what  
> I would consider a tabular format.
> >
> > So for example, column 6 contains a certainty value on line 3  
> and an offset range in lines 4-6, while column 8 contains a certainty  
> value on line 5 and a document ID on line 6.
> >
> > If the data looked like (comma separators added for clarity):  
> >
> > :e4, type, PER, ,
> > :e4, mention, ”Bart”, D00124 283-286,
> > :e4, mention, ”JoJo”, D00124 145-149, 0.9
> > :e4, per:siblings, :e7, D00124 283-286 173-179 274-281,
> > :e4, per:age, "10" D00124 180-181 173-179 182-191, 0.9
> > :e4, per:parent, :e9 D00124 180-181 381-380 399-406 D00101  
> 220-225 230-233 201-210,
> > ^ ^ ^ ^ ^
> > 1 2 3 4 5
> >
> > then I would consider it tabular data and could add headers:  
> >
> > 1: subject
> > 2: predicate
> > 3: object
> > 4: location
> > 5: certainty
> >
> > Can/should we define tabular data as data where all values in  
> a given column have a common meaning?
>  
> In this last form, you might argue that when relationship typing  
> is
> pushed down into cell values, i.e. potentially a different predicate  
> in each row, then that column does not really have a "common meaning".  
> Or you might say the column does have a broader fixed meaning:  
> it
> carries information about how values from other columns relate  
> to each
> other.
>  
> For the sake of thought experiment I find it useful to come back  
> to
> pixel-style representation. Consider a 640x480 grid in which  
> red-ness,
> green-ness and blue-ness values are packed into each cell. Perhaps  
> with a sub-notation using ':', on a 0-1 scale for now:
>  
> So,
>  
> 0.4:1.0:0.0, 0.0:0.0:0.0, 1.0:1.0;1.0, 0.4:1.0:0.0
> 1.0:1.0:1.0, 1.0:1.0:1.0, 1.0:1.0:1.0, 1.0:1.0:1.0 ... might  
> give us a
> fragment of such a grid, with neon, black, white etc cells.
>  
> Q: Do these columns have regular meaning?
> A: Yes; they stand for a column of pixels in a bitmap
> A: No; each row-column combination stands for a distinct entity  
> (pixel value)
>  
> Q: Is it useful to use W3C CSVW's work to describe this?
> A: Sure. It can help us get the syntax details right (whitespace,  
> quotes, newlines) between tools; and it can provide arbitrary  
> per-file
> metadata. For example the metadata might tell us that the grid  
> of
> colours comes from dan's security camera photo at such-and-so  
> a date.
>  
> Q: Isn't this iffy, since there are much better binary representations  
> for such data? (e.g. digital image formats)
> A: Yes, but that can be true for more obviously factual data too.  
>  
> Maybe what I'm getting at here is that I'm not sure what "a common  
> meaning" for columns might mean. On the last call I tried to talk  
> about columns being "homogenous" but that was more in terms of  
> low
> level data-typing. For example, a column might always contain  
> ISO-8601-style dates, i.e. YYYY-MM-DD. But what they *mean*  
> (birthdate, deathdate, date hired, favourite date, ...) could  
> be fixed
> by the meaning of a different column. So the column could be
> datatype-homogenous but the nature of it's per-cell meaning  
> could vary
> per cell.
>  
> Dan
>  
>  
>  

--  
Jeni Tennison
http://www.jenitennison.com/
Received on Monday, 24 February 2014 01:03:54 UTC