Re: Scoping: "Tabular Data" from Jeni Tennison on 2014-03-01 (public-csv-wg@w3.org from March 2014)

From: Jeni Tennison <jeni@jenitennison.com>
Date: Sat, 1 Mar 2014 22:31:20 +0000
To: "Ceolin, D." <d.ceolin@vu.nl>
Cc: Dan Brickley <danbri@google.com>, "public-csv-wg@w3.org" <public-csv-wg@w3.org>
Message-ID: <etPan.53125fb8.2463b9ea.137@jenit.local>
Davide,

I think the upshot of the discussion was that we came to an agreement that in *tabular* data, each column has a consistent meaning across all rows.

I’m not sure that conclusion addresses your query.

Jeni

------------------------------------------------------
From: Ceolin, D. d.ceolin@vu.nl
Reply: Ceolin, D. d.ceolin@vu.nl
Date: 28 February 2014 at 11:42:11
To: Jeni Tennison jeni@jenitennison.com
Subject:  Re: Scoping: "Tabular Data"

>  
> Hi all,
>  
> I'm adding Tim's use case to the "use case and requirements doc",  
> and I was wondering what conclusion we drew from this discussion,  
> if any.
> In particular, I'd say that not only in Tim's case “Each row is  
> a statement”, but also "Each row is a statement and possibly one  
> or more annotations about that statement".
> This may add some ambiguity (e.g. is the confidence related only  
> to the triple or to the triple and its provenance?), but offers  
> also an easy way to annotate statements (and, BTW, how would that  
> be translated into RDF? By means of reification or else? I'm very  
> interested in trust value representations and related).
> Also, I'm not sure if these issues are fully covered by the PrimaryKey  
> and SemanticTypeDefinition requirements.
> Cheers,
>  
> Davide
>  
>  
> > In your bitmap case, you can say:
> >
> > “Each row is a *row of a bitmap* and the columns are the *first  
> pixel*, *second pixel*, *third pixel*... of the row.”
> >
> > Conversely, in Tim’s case, you can say “Each row is a statement”,  
> but you can’t name the columns in a regular way in terms of being  
> a property of each statement.
> >
> > Cheers,
> >
> > Jeni
> >
> > (*) or “represents” or “contains information about” or whatever  
> you want to say to be more semantically accurate
> >
> > ------------------------------------------------------  
> > From: Dan Brickley danbri@google.com
> > Reply: Dan Brickley danbri@google.com
> > Date: 23 February 2014 at 16:09:18
> > To: Jeni Tennison jeni@theodi.org
> > Subject: Re: Scoping: "Tabular Data"
> >
> >>
> >> On 23 February 2014 15:19, Jeni Tennison
> >> wrote:
> >>> Hi,
> >>>
> >>> Another scoping question, brought up from Tim Finin’s example  
> >> from:
> >>>
> >>> https://www.w3.org/2013/csvw/wiki/Use_Cases#Representing_entitles_and_facts_extracted_from_text  
> >>>
> >>> 1> :e4 type PER
> >>> 2> :e4 mention "Bart" D00124 283-286
> >>> 3> :e4 mention "JoJo" D00124 145-149 0.9
> >>> 4> :e4 per:siblings :e7 D00124 283-286 173-179 274-281
> >>> 5> :e4 per:age "10" D00124 180-181 173-179 182-191 0.9
> >>> 6> :e4 per:parent :e9 D00124 180-181 381-380 399-406 D00101  
> >> 220-225 230-233 201-210
> >>> ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
> >>> 1 2 3 4 5 6 7 8 9 10 11
> >>>
> >>> (I’ve added numbers for the implied columns.)
> >>>
> >>> To me, this looks like a text-based format in which each line  
> >> has a defined format, but where there isn’t the commonality  
> between
> >> values in a single column that I would normally expect in what  
> >> I would consider a tabular format.
> >>>
> >>> So for example, column 6 contains a certainty value on line  
> 3
> >> and an offset range in lines 4-6, while column 8 contains a certainty  
> >> value on line 5 and a document ID on line 6.
> >>>
> >>> If the data looked like (comma separators added for clarity):  
> >>>
> >>> :e4, type, PER, ,
> >>> :e4, mention, ”Bart”, D00124 283-286,
> >>> :e4, mention, ”JoJo”, D00124 145-149, 0.9
> >>> :e4, per:siblings, :e7, D00124 283-286 173-179 274-281,  
> >>> :e4, per:age, "10" D00124 180-181 173-179 182-191, 0.9
> >>> :e4, per:parent, :e9 D00124 180-181 381-380 399-406 D00101  
> >> 220-225 230-233 201-210,
> >>> ^ ^ ^ ^ ^
> >>> 1 2 3 4 5
> >>>
> >>> then I would consider it tabular data and could add headers:  
> >>>
> >>> 1: subject
> >>> 2: predicate
> >>> 3: object
> >>> 4: location
> >>> 5: certainty
> >>>
> >>> Can/should we define tabular data as data where all values  
> in
> >> a given column have a common meaning?
> >>
> >> In this last form, you might argue that when relationship typing  
> >> is
> >> pushed down into cell values, i.e. potentially a different  
> predicate
> >> in each row, then that column does not really have a "common  
> meaning".
> >> Or you might say the column does have a broader fixed meaning:  
> >> it
> >> carries information about how values from other columns relate  
> >> to each
> >> other.
> >>
> >> For the sake of thought experiment I find it useful to come back  
> >> to
> >> pixel-style representation. Consider a 640x480 grid in which  
> >> red-ness,
> >> green-ness and blue-ness values are packed into each cell.  
> Perhaps
> >> with a sub-notation using ':', on a 0-1 scale for now:
> >>
> >> So,
> >>
> >> 0.4:1.0:0.0, 0.0:0.0:0.0, 1.0:1.0;1.0, 0.4:1.0:0.0
> >> 1.0:1.0:1.0, 1.0:1.0:1.0, 1.0:1.0:1.0, 1.0:1.0:1.0 ...  
> might
> >> give us a
> >> fragment of such a grid, with neon, black, white etc cells.  
> >>
> >> Q: Do these columns have regular meaning?
> >> A: Yes; they stand for a column of pixels in a bitmap
> >> A: No; each row-column combination stands for a distinct entity  
> >> (pixel value)
> >>
> >> Q: Is it useful to use W3C CSVW's work to describe this?
> >> A: Sure. It can help us get the syntax details right (whitespace,  
> >> quotes, newlines) between tools; and it can provide arbitrary  
> >> per-file
> >> metadata. For example the metadata might tell us that the grid  
> >> of
> >> colours comes from dan's security camera photo at such-and-so  
> >> a date.
> >>
> >> Q: Isn't this iffy, since there are much better binary representations  
> >> for such data? (e.g. digital image formats)
> >> A: Yes, but that can be true for more obviously factual data  
> too.
> >>
> >> Maybe what I'm getting at here is that I'm not sure what "a common  
> >> meaning" for columns might mean. On the last call I tried to  
> talk
> >> about columns being "homogenous" but that was more in terms  
> of
> >> low
> >> level data-typing. For example, a column might always contain  
> >> ISO-8601-style dates, i.e. YYYY-MM-DD. But what they *mean*  
> >> (birthdate, deathdate, date hired, favourite date, ...)  
> could
> >> be fixed
> >> by the meaning of a different column. So the column could be  
> >> datatype-homogenous but the nature of it's per-cell meaning  
> >> could vary
> >> per cell.
> >>
> >> Dan
> >>
> >>
> >>
> >
> > --
> > Jeni Tennison
> > http://www.jenitennison.com/
> >
>  
>  
>  
>  

--  
Jeni Tennison
http://www.jenitennison.com/
Received on Saturday, 1 March 2014 22:31:46 UTC