- From: Jeni Tennison <jeni@jenitennison.com>
- Date: Sun, 23 Feb 2014 17:03:24 -0800
- To: Dan Brickley <danbri@google.com>
- Cc: "public-csv-wg@w3.org" <public-csv-wg@w3.org>
Dan, But in the case where you have: person1,birthdate,1912-04-23 person1,deathdate,1993-03-30 ... you can still label the columns in a regular way (entity, property, value). You can fill in a statement that says: "Each row is(*) a X and the columns are the A,B,C… of the X” ie “Each row is a *statement* and the columns are the *entity*, *property* and *value* of the statement.” In your bitmap case, you can say: “Each row is a *row of a bitmap* and the columns are the *first pixel*, *second pixel*, *third pixel*... of the row.” Conversely, in Tim’s case, you can say “Each row is a statement”, but you can’t name the columns in a regular way in terms of being a property of each statement. Cheers, Jeni (*) or “represents” or “contains information about” or whatever you want to say to be more semantically accurate ------------------------------------------------------ From: Dan Brickley danbri@google.com Reply: Dan Brickley danbri@google.com Date: 23 February 2014 at 16:09:18 To: Jeni Tennison jeni@theodi.org Subject: Re: Scoping: "Tabular Data" > > On 23 February 2014 15:19, Jeni Tennison > wrote: > > Hi, > > > > Another scoping question, brought up from Tim Finin’s example > from: > > > > https://www.w3.org/2013/csvw/wiki/Use_Cases#Representing_entitles_and_facts_extracted_from_text > > > > 1> :e4 type PER > > 2> :e4 mention "Bart" D00124 283-286 > > 3> :e4 mention "JoJo" D00124 145-149 0.9 > > 4> :e4 per:siblings :e7 D00124 283-286 173-179 274-281 > > 5> :e4 per:age "10" D00124 180-181 173-179 182-191 0.9 > > 6> :e4 per:parent :e9 D00124 180-181 381-380 399-406 D00101 > 220-225 230-233 201-210 > > ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ > > 1 2 3 4 5 6 7 8 9 10 11 > > > > (I’ve added numbers for the implied columns.) > > > > To me, this looks like a text-based format in which each line > has a defined format, but where there isn’t the commonality between > values in a single column that I would normally expect in what > I would consider a tabular format. > > > > So for example, column 6 contains a certainty value on line 3 > and an offset range in lines 4-6, while column 8 contains a certainty > value on line 5 and a document ID on line 6. > > > > If the data looked like (comma separators added for clarity): > > > > :e4, type, PER, , > > :e4, mention, ”Bart”, D00124 283-286, > > :e4, mention, ”JoJo”, D00124 145-149, 0.9 > > :e4, per:siblings, :e7, D00124 283-286 173-179 274-281, > > :e4, per:age, "10" D00124 180-181 173-179 182-191, 0.9 > > :e4, per:parent, :e9 D00124 180-181 381-380 399-406 D00101 > 220-225 230-233 201-210, > > ^ ^ ^ ^ ^ > > 1 2 3 4 5 > > > > then I would consider it tabular data and could add headers: > > > > 1: subject > > 2: predicate > > 3: object > > 4: location > > 5: certainty > > > > Can/should we define tabular data as data where all values in > a given column have a common meaning? > > In this last form, you might argue that when relationship typing > is > pushed down into cell values, i.e. potentially a different predicate > in each row, then that column does not really have a "common meaning". > Or you might say the column does have a broader fixed meaning: > it > carries information about how values from other columns relate > to each > other. > > For the sake of thought experiment I find it useful to come back > to > pixel-style representation. Consider a 640x480 grid in which > red-ness, > green-ness and blue-ness values are packed into each cell. Perhaps > with a sub-notation using ':', on a 0-1 scale for now: > > So, > > 0.4:1.0:0.0, 0.0:0.0:0.0, 1.0:1.0;1.0, 0.4:1.0:0.0 > 1.0:1.0:1.0, 1.0:1.0:1.0, 1.0:1.0:1.0, 1.0:1.0:1.0 ... might > give us a > fragment of such a grid, with neon, black, white etc cells. > > Q: Do these columns have regular meaning? > A: Yes; they stand for a column of pixels in a bitmap > A: No; each row-column combination stands for a distinct entity > (pixel value) > > Q: Is it useful to use W3C CSVW's work to describe this? > A: Sure. It can help us get the syntax details right (whitespace, > quotes, newlines) between tools; and it can provide arbitrary > per-file > metadata. For example the metadata might tell us that the grid > of > colours comes from dan's security camera photo at such-and-so > a date. > > Q: Isn't this iffy, since there are much better binary representations > for such data? (e.g. digital image formats) > A: Yes, but that can be true for more obviously factual data too. > > Maybe what I'm getting at here is that I'm not sure what "a common > meaning" for columns might mean. On the last call I tried to talk > about columns being "homogenous" but that was more in terms of > low > level data-typing. For example, a column might always contain > ISO-8601-style dates, i.e. YYYY-MM-DD. But what they *mean* > (birthdate, deathdate, date hired, favourite date, ...) could > be fixed > by the meaning of a different column. So the column could be > datatype-homogenous but the nature of it's per-cell meaning > could vary > per cell. > > Dan > > > -- Jeni Tennison http://www.jenitennison.com/
Received on Monday, 24 February 2014 01:03:54 UTC