- From: Jeni Tennison <jeni@jenitennison.com>
- Date: Sat, 1 Mar 2014 22:31:20 +0000
- To: "Ceolin, D." <d.ceolin@vu.nl>
- Cc: Dan Brickley <danbri@google.com>, "public-csv-wg@w3.org" <public-csv-wg@w3.org>
Davide, I think the upshot of the discussion was that we came to an agreement that in *tabular* data, each column has a consistent meaning across all rows. I’m not sure that conclusion addresses your query. Jeni ------------------------------------------------------ From: Ceolin, D. d.ceolin@vu.nl Reply: Ceolin, D. d.ceolin@vu.nl Date: 28 February 2014 at 11:42:11 To: Jeni Tennison jeni@jenitennison.com Subject: Re: Scoping: "Tabular Data" > > Hi all, > > I'm adding Tim's use case to the "use case and requirements doc", > and I was wondering what conclusion we drew from this discussion, > if any. > In particular, I'd say that not only in Tim's case “Each row is > a statement”, but also "Each row is a statement and possibly one > or more annotations about that statement". > This may add some ambiguity (e.g. is the confidence related only > to the triple or to the triple and its provenance?), but offers > also an easy way to annotate statements (and, BTW, how would that > be translated into RDF? By means of reification or else? I'm very > interested in trust value representations and related). > Also, I'm not sure if these issues are fully covered by the PrimaryKey > and SemanticTypeDefinition requirements. > Cheers, > > Davide > > > > In your bitmap case, you can say: > > > > “Each row is a *row of a bitmap* and the columns are the *first > pixel*, *second pixel*, *third pixel*... of the row.” > > > > Conversely, in Tim’s case, you can say “Each row is a statement”, > but you can’t name the columns in a regular way in terms of being > a property of each statement. > > > > Cheers, > > > > Jeni > > > > (*) or “represents” or “contains information about” or whatever > you want to say to be more semantically accurate > > > > ------------------------------------------------------ > > From: Dan Brickley danbri@google.com > > Reply: Dan Brickley danbri@google.com > > Date: 23 February 2014 at 16:09:18 > > To: Jeni Tennison jeni@theodi.org > > Subject: Re: Scoping: "Tabular Data" > > > >> > >> On 23 February 2014 15:19, Jeni Tennison > >> wrote: > >>> Hi, > >>> > >>> Another scoping question, brought up from Tim Finin’s example > >> from: > >>> > >>> https://www.w3.org/2013/csvw/wiki/Use_Cases#Representing_entitles_and_facts_extracted_from_text > >>> > >>> 1> :e4 type PER > >>> 2> :e4 mention "Bart" D00124 283-286 > >>> 3> :e4 mention "JoJo" D00124 145-149 0.9 > >>> 4> :e4 per:siblings :e7 D00124 283-286 173-179 274-281 > >>> 5> :e4 per:age "10" D00124 180-181 173-179 182-191 0.9 > >>> 6> :e4 per:parent :e9 D00124 180-181 381-380 399-406 D00101 > >> 220-225 230-233 201-210 > >>> ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ > >>> 1 2 3 4 5 6 7 8 9 10 11 > >>> > >>> (I’ve added numbers for the implied columns.) > >>> > >>> To me, this looks like a text-based format in which each line > >> has a defined format, but where there isn’t the commonality > between > >> values in a single column that I would normally expect in what > >> I would consider a tabular format. > >>> > >>> So for example, column 6 contains a certainty value on line > 3 > >> and an offset range in lines 4-6, while column 8 contains a certainty > >> value on line 5 and a document ID on line 6. > >>> > >>> If the data looked like (comma separators added for clarity): > >>> > >>> :e4, type, PER, , > >>> :e4, mention, ”Bart”, D00124 283-286, > >>> :e4, mention, ”JoJo”, D00124 145-149, 0.9 > >>> :e4, per:siblings, :e7, D00124 283-286 173-179 274-281, > >>> :e4, per:age, "10" D00124 180-181 173-179 182-191, 0.9 > >>> :e4, per:parent, :e9 D00124 180-181 381-380 399-406 D00101 > >> 220-225 230-233 201-210, > >>> ^ ^ ^ ^ ^ > >>> 1 2 3 4 5 > >>> > >>> then I would consider it tabular data and could add headers: > >>> > >>> 1: subject > >>> 2: predicate > >>> 3: object > >>> 4: location > >>> 5: certainty > >>> > >>> Can/should we define tabular data as data where all values > in > >> a given column have a common meaning? > >> > >> In this last form, you might argue that when relationship typing > >> is > >> pushed down into cell values, i.e. potentially a different > predicate > >> in each row, then that column does not really have a "common > meaning". > >> Or you might say the column does have a broader fixed meaning: > >> it > >> carries information about how values from other columns relate > >> to each > >> other. > >> > >> For the sake of thought experiment I find it useful to come back > >> to > >> pixel-style representation. Consider a 640x480 grid in which > >> red-ness, > >> green-ness and blue-ness values are packed into each cell. > Perhaps > >> with a sub-notation using ':', on a 0-1 scale for now: > >> > >> So, > >> > >> 0.4:1.0:0.0, 0.0:0.0:0.0, 1.0:1.0;1.0, 0.4:1.0:0.0 > >> 1.0:1.0:1.0, 1.0:1.0:1.0, 1.0:1.0:1.0, 1.0:1.0:1.0 ... > might > >> give us a > >> fragment of such a grid, with neon, black, white etc cells. > >> > >> Q: Do these columns have regular meaning? > >> A: Yes; they stand for a column of pixels in a bitmap > >> A: No; each row-column combination stands for a distinct entity > >> (pixel value) > >> > >> Q: Is it useful to use W3C CSVW's work to describe this? > >> A: Sure. It can help us get the syntax details right (whitespace, > >> quotes, newlines) between tools; and it can provide arbitrary > >> per-file > >> metadata. For example the metadata might tell us that the grid > >> of > >> colours comes from dan's security camera photo at such-and-so > >> a date. > >> > >> Q: Isn't this iffy, since there are much better binary representations > >> for such data? (e.g. digital image formats) > >> A: Yes, but that can be true for more obviously factual data > too. > >> > >> Maybe what I'm getting at here is that I'm not sure what "a common > >> meaning" for columns might mean. On the last call I tried to > talk > >> about columns being "homogenous" but that was more in terms > of > >> low > >> level data-typing. For example, a column might always contain > >> ISO-8601-style dates, i.e. YYYY-MM-DD. But what they *mean* > >> (birthdate, deathdate, date hired, favourite date, ...) > could > >> be fixed > >> by the meaning of a different column. So the column could be > >> datatype-homogenous but the nature of it's per-cell meaning > >> could vary > >> per cell. > >> > >> Dan > >> > >> > >> > > > > -- > > Jeni Tennison > > http://www.jenitennison.com/ > > > > > > -- Jeni Tennison http://www.jenitennison.com/
Received on Saturday, 1 March 2014 22:31:46 UTC