- From: Ceolin, D. <d.ceolin@vu.nl>
- Date: Sun, 2 Mar 2014 17:47:48 +0000
- To: Jeni Tennison <jeni@jenitennison.com>
- CC: Dan Brickley <danbri@google.com>, "public-csv-wg@w3.org" <public-csv-wg@w3.org>
Hi Jeni, that's clear, thanks. What about the meaning of each row? (sorry for being pedantic...) Best, Davide Il giorno 01/mar/2014, alle ore 23.31, Jeni Tennison ha scritto: > Davide, > > I think the upshot of the discussion was that we came to an agreement that in *tabular* data, each column has a consistent meaning across all rows. > > I’m not sure that conclusion addresses your query. > > Jeni > > ------------------------------------------------------ > From: Ceolin, D. d.ceolin@vu.nl > Reply: Ceolin, D. d.ceolin@vu.nl > Date: 28 February 2014 at 11:42:11 > To: Jeni Tennison jeni@jenitennison.com > Subject: Re: Scoping: "Tabular Data" > >> >> Hi all, >> >> I'm adding Tim's use case to the "use case and requirements doc", >> and I was wondering what conclusion we drew from this discussion, >> if any. >> In particular, I'd say that not only in Tim's case “Each row is >> a statement”, but also "Each row is a statement and possibly one >> or more annotations about that statement". >> This may add some ambiguity (e.g. is the confidence related only >> to the triple or to the triple and its provenance?), but offers >> also an easy way to annotate statements (and, BTW, how would that >> be translated into RDF? By means of reification or else? I'm very >> interested in trust value representations and related). >> Also, I'm not sure if these issues are fully covered by the PrimaryKey >> and SemanticTypeDefinition requirements. >> Cheers, >> >> Davide >> >> >>> In your bitmap case, you can say: >>> >>> “Each row is a *row of a bitmap* and the columns are the *first >> pixel*, *second pixel*, *third pixel*... of the row.” >>> >>> Conversely, in Tim’s case, you can say “Each row is a statement”, >> but you can’t name the columns in a regular way in terms of being >> a property of each statement. >>> >>> Cheers, >>> >>> Jeni >>> >>> (*) or “represents” or “contains information about” or whatever >> you want to say to be more semantically accurate >>> >>> ------------------------------------------------------ >>> From: Dan Brickley danbri@google.com >>> Reply: Dan Brickley danbri@google.com >>> Date: 23 February 2014 at 16:09:18 >>> To: Jeni Tennison jeni@theodi.org >>> Subject: Re: Scoping: "Tabular Data" >>> >>>> >>>> On 23 February 2014 15:19, Jeni Tennison >>>> wrote: >>>>> Hi, >>>>> >>>>> Another scoping question, brought up from Tim Finin’s example >>>> from: >>>>> >>>>> https://www.w3.org/2013/csvw/wiki/Use_Cases#Representing_entitles_and_facts_extracted_from_text >>>>> >>>>> 1> :e4 type PER >>>>> 2> :e4 mention "Bart" D00124 283-286 >>>>> 3> :e4 mention "JoJo" D00124 145-149 0.9 >>>>> 4> :e4 per:siblings :e7 D00124 283-286 173-179 274-281 >>>>> 5> :e4 per:age "10" D00124 180-181 173-179 182-191 0.9 >>>>> 6> :e4 per:parent :e9 D00124 180-181 381-380 399-406 D00101 >>>> 220-225 230-233 201-210 >>>>> ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ >>>>> 1 2 3 4 5 6 7 8 9 10 11 >>>>> >>>>> (I’ve added numbers for the implied columns.) >>>>> >>>>> To me, this looks like a text-based format in which each line >>>> has a defined format, but where there isn’t the commonality >> between >>>> values in a single column that I would normally expect in what >>>> I would consider a tabular format. >>>>> >>>>> So for example, column 6 contains a certainty value on line >> 3 >>>> and an offset range in lines 4-6, while column 8 contains a certainty >>>> value on line 5 and a document ID on line 6. >>>>> >>>>> If the data looked like (comma separators added for clarity): >>>>> >>>>> :e4, type, PER, , >>>>> :e4, mention, ”Bart”, D00124 283-286, >>>>> :e4, mention, ”JoJo”, D00124 145-149, 0.9 >>>>> :e4, per:siblings, :e7, D00124 283-286 173-179 274-281, >>>>> :e4, per:age, "10" D00124 180-181 173-179 182-191, 0.9 >>>>> :e4, per:parent, :e9 D00124 180-181 381-380 399-406 D00101 >>>> 220-225 230-233 201-210, >>>>> ^ ^ ^ ^ ^ >>>>> 1 2 3 4 5 >>>>> >>>>> then I would consider it tabular data and could add headers: >>>>> >>>>> 1: subject >>>>> 2: predicate >>>>> 3: object >>>>> 4: location >>>>> 5: certainty >>>>> >>>>> Can/should we define tabular data as data where all values >> in >>>> a given column have a common meaning? >>>> >>>> In this last form, you might argue that when relationship typing >>>> is >>>> pushed down into cell values, i.e. potentially a different >> predicate >>>> in each row, then that column does not really have a "common >> meaning". >>>> Or you might say the column does have a broader fixed meaning: >>>> it >>>> carries information about how values from other columns relate >>>> to each >>>> other. >>>> >>>> For the sake of thought experiment I find it useful to come back >>>> to >>>> pixel-style representation. Consider a 640x480 grid in which >>>> red-ness, >>>> green-ness and blue-ness values are packed into each cell. >> Perhaps >>>> with a sub-notation using ':', on a 0-1 scale for now: >>>> >>>> So, >>>> >>>> 0.4:1.0:0.0, 0.0:0.0:0.0, 1.0:1.0;1.0, 0.4:1.0:0.0 >>>> 1.0:1.0:1.0, 1.0:1.0:1.0, 1.0:1.0:1.0, 1.0:1.0:1.0 ... >> might >>>> give us a >>>> fragment of such a grid, with neon, black, white etc cells. >>>> >>>> Q: Do these columns have regular meaning? >>>> A: Yes; they stand for a column of pixels in a bitmap >>>> A: No; each row-column combination stands for a distinct entity >>>> (pixel value) >>>> >>>> Q: Is it useful to use W3C CSVW's work to describe this? >>>> A: Sure. It can help us get the syntax details right (whitespace, >>>> quotes, newlines) between tools; and it can provide arbitrary >>>> per-file >>>> metadata. For example the metadata might tell us that the grid >>>> of >>>> colours comes from dan's security camera photo at such-and-so >>>> a date. >>>> >>>> Q: Isn't this iffy, since there are much better binary representations >>>> for such data? (e.g. digital image formats) >>>> A: Yes, but that can be true for more obviously factual data >> too. >>>> >>>> Maybe what I'm getting at here is that I'm not sure what "a common >>>> meaning" for columns might mean. On the last call I tried to >> talk >>>> about columns being "homogenous" but that was more in terms >> of >>>> low >>>> level data-typing. For example, a column might always contain >>>> ISO-8601-style dates, i.e. YYYY-MM-DD. But what they *mean* >>>> (birthdate, deathdate, date hired, favourite date, ...) >> could >>>> be fixed >>>> by the meaning of a different column. So the column could be >>>> datatype-homogenous but the nature of it's per-cell meaning >>>> could vary >>>> per cell. >>>> >>>> Dan >>>> >>>> >>>> >>> >>> -- >>> Jeni Tennison >>> http://www.jenitennison.com/ >>> >> >> >> >> > > -- > Jeni Tennison > http://www.jenitennison.com/
Received on Sunday, 2 March 2014 17:48:18 UTC