- From: Jeni Tennison <jeni@jenitennison.com>
- Date: Sun, 2 Mar 2014 19:18:18 +0000
- To: "Ceolin, D." <d.ceolin@vu.nl>
- Cc: Dan Brickley <danbri@google.com>, "public-csv-wg@w3.org" <public-csv-wg@w3.org>
Davide, I’ve updated the spec here: http://w3c.github.io/csvw/syntax/ with the definition that I think we agreed to (though I’m happy to continue wordsmithing it), namely that each row contains information about some (one) thing. Jeni ------------------------------------------------------ From: Ceolin, D. d.ceolin@vu.nl Reply: Ceolin, D. d.ceolin@vu.nl Date: 2 March 2014 at 17:52:00 To: Jeni Tennison jeni@jenitennison.com Subject: Re: Scoping: "Tabular Data" > > Hi Jeni, > > that's clear, thanks. What about the meaning of each row? (sorry > for being pedantic...) > Best, > > Davide > > Il giorno 01/mar/2014, alle ore 23.31, Jeni Tennison ha scritto: > > > Davide, > > > > I think the upshot of the discussion was that we came to an agreement > that in *tabular* data, each column has a consistent meaning > across all rows. > > > > I’m not sure that conclusion addresses your query. > > > > Jeni > > > > ------------------------------------------------------ > > From: Ceolin, D. d.ceolin@vu.nl > > Reply: Ceolin, D. d.ceolin@vu.nl > > Date: 28 February 2014 at 11:42:11 > > To: Jeni Tennison jeni@jenitennison.com > > Subject: Re: Scoping: "Tabular Data" > > > >> > >> Hi all, > >> > >> I'm adding Tim's use case to the "use case and requirements > doc", > >> and I was wondering what conclusion we drew from this discussion, > >> if any. > >> In particular, I'd say that not only in Tim's case “Each row > is > >> a statement”, but also "Each row is a statement and possibly > one > >> or more annotations about that statement". > >> This may add some ambiguity (e.g. is the confidence related > only > >> to the triple or to the triple and its provenance?), but offers > >> also an easy way to annotate statements (and, BTW, how would > that > >> be translated into RDF? By means of reification or else? I'm > very > >> interested in trust value representations and related). > >> Also, I'm not sure if these issues are fully covered by the PrimaryKey > >> and SemanticTypeDefinition requirements. > >> Cheers, > >> > >> Davide > >> > >> > >>> In your bitmap case, you can say: > >>> > >>> “Each row is a *row of a bitmap* and the columns are the *first > >> pixel*, *second pixel*, *third pixel*... of the row.” > >>> > >>> Conversely, in Tim’s case, you can say “Each row is a statement”, > >> but you can’t name the columns in a regular way in terms of being > >> a property of each statement. > >>> > >>> Cheers, > >>> > >>> Jeni > >>> > >>> (*) or “represents” or “contains information about” or whatever > >> you want to say to be more semantically accurate > >>> > >>> ------------------------------------------------------ > >>> From: Dan Brickley danbri@google.com > >>> Reply: Dan Brickley danbri@google.com > >>> Date: 23 February 2014 at 16:09:18 > >>> To: Jeni Tennison jeni@theodi.org > >>> Subject: Re: Scoping: "Tabular Data" > >>> > >>>> > >>>> On 23 February 2014 15:19, Jeni Tennison > >>>> wrote: > >>>>> Hi, > >>>>> > >>>>> Another scoping question, brought up from Tim Finin’s > example > >>>> from: > >>>>> > >>>>> https://www.w3.org/2013/csvw/wiki/Use_Cases#Representing_entitles_and_facts_extracted_from_text > >>>>> > >>>>> 1> :e4 type PER > >>>>> 2> :e4 mention "Bart" D00124 283-286 > >>>>> 3> :e4 mention "JoJo" D00124 145-149 0.9 > >>>>> 4> :e4 per:siblings :e7 D00124 283-286 173-179 274-281 > >>>>> 5> :e4 per:age "10" D00124 180-181 173-179 182-191 0.9 > >>>>> 6> :e4 per:parent :e9 D00124 180-181 381-380 399-406 D00101 > >>>> 220-225 230-233 201-210 > >>>>> ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ > >>>>> 1 2 3 4 5 6 7 8 9 10 11 > >>>>> > >>>>> (I’ve added numbers for the implied columns.) > >>>>> > >>>>> To me, this looks like a text-based format in which each > line > >>>> has a defined format, but where there isn’t the commonality > >> between > >>>> values in a single column that I would normally expect in > what > >>>> I would consider a tabular format. > >>>>> > >>>>> So for example, column 6 contains a certainty value on line > >> 3 > >>>> and an offset range in lines 4-6, while column 8 contains > a certainty > >>>> value on line 5 and a document ID on line 6. > >>>>> > >>>>> If the data looked like (comma separators added for clarity): > >>>>> > >>>>> :e4, type, PER, , > >>>>> :e4, mention, ”Bart”, D00124 283-286, > >>>>> :e4, mention, ”JoJo”, D00124 145-149, 0.9 > >>>>> :e4, per:siblings, :e7, D00124 283-286 173-179 274-281, > >>>>> :e4, per:age, "10" D00124 180-181 173-179 182-191, 0.9 > >>>>> :e4, per:parent, :e9 D00124 180-181 381-380 399-406 D00101 > >>>> 220-225 230-233 201-210, > >>>>> ^ ^ ^ ^ ^ > >>>>> 1 2 3 4 5 > >>>>> > >>>>> then I would consider it tabular data and could add headers: > >>>>> > >>>>> 1: subject > >>>>> 2: predicate > >>>>> 3: object > >>>>> 4: location > >>>>> 5: certainty > >>>>> > >>>>> Can/should we define tabular data as data where all values > >> in > >>>> a given column have a common meaning? > >>>> > >>>> In this last form, you might argue that when relationship > typing > >>>> is > >>>> pushed down into cell values, i.e. potentially a different > >> predicate > >>>> in each row, then that column does not really have a "common > >> meaning". > >>>> Or you might say the column does have a broader fixed meaning: > >>>> it > >>>> carries information about how values from other columns > relate > >>>> to each > >>>> other. > >>>> > >>>> For the sake of thought experiment I find it useful to come > back > >>>> to > >>>> pixel-style representation. Consider a 640x480 grid in > which > >>>> red-ness, > >>>> green-ness and blue-ness values are packed into each cell. > >> Perhaps > >>>> with a sub-notation using ':', on a 0-1 scale for now: > >>>> > >>>> So, > >>>> > >>>> 0.4:1.0:0.0, 0.0:0.0:0.0, 1.0:1.0;1.0, 0.4:1.0:0.0 > >>>> 1.0:1.0:1.0, 1.0:1.0:1.0, 1.0:1.0:1.0, 1.0:1.0:1.0 > ... > >> might > >>>> give us a > >>>> fragment of such a grid, with neon, black, white etc cells. > >>>> > >>>> Q: Do these columns have regular meaning? > >>>> A: Yes; they stand for a column of pixels in a bitmap > >>>> A: No; each row-column combination stands for a distinct > entity > >>>> (pixel value) > >>>> > >>>> Q: Is it useful to use W3C CSVW's work to describe this? > >>>> A: Sure. It can help us get the syntax details right (whitespace, > >>>> quotes, newlines) between tools; and it can provide arbitrary > >>>> per-file > >>>> metadata. For example the metadata might tell us that the > grid > >>>> of > >>>> colours comes from dan's security camera photo at such-and-so > >>>> a date. > >>>> > >>>> Q: Isn't this iffy, since there are much better binary representations > >>>> for such data? (e.g. digital image formats) > >>>> A: Yes, but that can be true for more obviously factual data > >> too. > >>>> > >>>> Maybe what I'm getting at here is that I'm not sure what "a > common > >>>> meaning" for columns might mean. On the last call I tried > to > >> talk > >>>> about columns being "homogenous" but that was more in terms > >> of > >>>> low > >>>> level data-typing. For example, a column might always contain > >>>> ISO-8601-style dates, i.e. YYYY-MM-DD. But what they *mean* > >>>> (birthdate, deathdate, date hired, favourite date, ...) > >> could > >>>> be fixed > >>>> by the meaning of a different column. So the column could > be > >>>> datatype-homogenous but the nature of it's per-cell meaning > >>>> could vary > >>>> per cell. > >>>> > >>>> Dan > >>>> > >>>> > >>>> > >>> > >>> -- > >>> Jeni Tennison > >>> http://www.jenitennison.com/ > >>> > >> > >> > >> > >> > > > > -- > > Jeni Tennison > > http://www.jenitennison.com/ > > > > -- Jeni Tennison http://www.jenitennison.com/
Received on Sunday, 2 March 2014 19:18:42 UTC