Re: Scoping: "Tabular Data"

On 2 Mar 2014 11:18, "Jeni Tennison" <jeni@jenitennison.com> wrote:
>
> Davide,
>
> I’ve updated the spec here:
>
>   http://w3c.github.io/csvw/syntax/
>
> with the definition that I think we agreed to (though I’m happy to
continue wordsmithing it), namely that each row contains information about
some (one) thing.

We should allow two common cases:

Each row is about a different entity (or set of).

And

Each row is an observation on the state of such an entity/entities.

If we want we can say one entity per row is somehow the primary focus,
though describing that one thing is often achieved by mentioning properties
of others.

The latter allows for log-like and time series data, the former for more
entity -relationship structures.

Dan

> Jeni
>
> ------------------------------------------------------
> From: Ceolin, D. d.ceolin@vu.nl
> Reply: Ceolin, D. d.ceolin@vu.nl
> Date: 2 March 2014 at 17:52:00
> To: Jeni Tennison jeni@jenitennison.com
> Subject:  Re: Scoping: "Tabular Data"
>
> >
> > Hi Jeni,
> >
> > that's clear, thanks. What about the meaning of each row? (sorry
> > for being pedantic...)
> > Best,
> >
> > Davide
> >
> > Il giorno 01/mar/2014, alle ore 23.31, Jeni Tennison ha scritto:
> >
> > > Davide,
> > >
> > > I think the upshot of the discussion was that we came to an agreement
> > that in *tabular* data, each column has a consistent meaning
> > across all rows.
> > >
> > > I’m not sure that conclusion addresses your query.
> > >
> > > Jeni
> > >
> > > ------------------------------------------------------
> > > From: Ceolin, D. d.ceolin@vu.nl
> > > Reply: Ceolin, D. d.ceolin@vu.nl
> > > Date: 28 February 2014 at 11:42:11
> > > To: Jeni Tennison jeni@jenitennison.com
> > > Subject: Re: Scoping: "Tabular Data"
> > >
> > >>
> > >> Hi all,
> > >>
> > >> I'm adding Tim's use case to the "use case and requirements
> > doc",
> > >> and I was wondering what conclusion we drew from this discussion,
> > >> if any.
> > >> In particular, I'd say that not only in Tim's case “Each row
> > is
> > >> a statement”, but also "Each row is a statement and possibly
> > one
> > >> or more annotations about that statement".
> > >> This may add some ambiguity (e.g. is the confidence related
> > only
> > >> to the triple or to the triple and its provenance?), but offers
> > >> also an easy way to annotate statements (and, BTW, how would
> > that
> > >> be translated into RDF? By means of reification or else? I'm
> > very
> > >> interested in trust value representations and related).
> > >> Also, I'm not sure if these issues are fully covered by the
PrimaryKey
> > >> and SemanticTypeDefinition requirements.
> > >> Cheers,
> > >>
> > >> Davide
> > >>
> > >>
> > >>> In your bitmap case, you can say:
> > >>>
> > >>> “Each row is a *row of a bitmap* and the columns are the *first
> > >> pixel*, *second pixel*, *third pixel*... of the row.”
> > >>>
> > >>> Conversely, in Tim’s case, you can say “Each row is a statement”,
> > >> but you can’t name the columns in a regular way in terms of being
> > >> a property of each statement.
> > >>>
> > >>> Cheers,
> > >>>
> > >>> Jeni
> > >>>
> > >>> (*) or “represents” or “contains information about” or whatever
> > >> you want to say to be more semantically accurate
> > >>>
> > >>> ------------------------------------------------------
> > >>> From: Dan Brickley danbri@google.com
> > >>> Reply: Dan Brickley danbri@google.com
> > >>> Date: 23 February 2014 at 16:09:18
> > >>> To: Jeni Tennison jeni@theodi.org
> > >>> Subject: Re: Scoping: "Tabular Data"
> > >>>
> > >>>>
> > >>>> On 23 February 2014 15:19, Jeni Tennison
> > >>>> wrote:
> > >>>>> Hi,
> > >>>>>
> > >>>>> Another scoping question, brought up from Tim Finin’s
> > example
> > >>>> from:
> > >>>>>
> > >>>>>
https://www.w3.org/2013/csvw/wiki/Use_Cases#Representing_entitles_and_facts_extracted_from_text
> > >>>>>
> > >>>>> 1> :e4 type PER
> > >>>>> 2> :e4 mention "Bart" D00124 283-286
> > >>>>> 3> :e4 mention "JoJo" D00124 145-149 0.9
> > >>>>> 4> :e4 per:siblings :e7 D00124 283-286 173-179 274-281
> > >>>>> 5> :e4 per:age "10" D00124 180-181 173-179 182-191 0.9
> > >>>>> 6> :e4 per:parent :e9 D00124 180-181 381-380 399-406 D00101
> > >>>> 220-225 230-233 201-210
> > >>>>> ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
> > >>>>> 1 2 3 4 5 6 7 8 9 10 11
> > >>>>>
> > >>>>> (I’ve added numbers for the implied columns.)
> > >>>>>
> > >>>>> To me, this looks like a text-based format in which each
> > line
> > >>>> has a defined format, but where there isn’t the commonality
> > >> between
> > >>>> values in a single column that I would normally expect in
> > what
> > >>>> I would consider a tabular format.
> > >>>>>
> > >>>>> So for example, column 6 contains a certainty value on line
> > >> 3
> > >>>> and an offset range in lines 4-6, while column 8 contains
> > a certainty
> > >>>> value on line 5 and a document ID on line 6.
> > >>>>>
> > >>>>> If the data looked like (comma separators added for clarity):
> > >>>>>
> > >>>>> :e4, type, PER, ,
> > >>>>> :e4, mention, ”Bart”, D00124 283-286,
> > >>>>> :e4, mention, ”JoJo”, D00124 145-149, 0.9
> > >>>>> :e4, per:siblings, :e7, D00124 283-286 173-179 274-281,
> > >>>>> :e4, per:age, "10" D00124 180-181 173-179 182-191, 0.9
> > >>>>> :e4, per:parent, :e9 D00124 180-181 381-380 399-406 D00101
> > >>>> 220-225 230-233 201-210,
> > >>>>> ^ ^ ^ ^ ^
> > >>>>> 1 2 3 4 5
> > >>>>>
> > >>>>> then I would consider it tabular data and could add headers:
> > >>>>>
> > >>>>> 1: subject
> > >>>>> 2: predicate
> > >>>>> 3: object
> > >>>>> 4: location
> > >>>>> 5: certainty
> > >>>>>
> > >>>>> Can/should we define tabular data as data where all values
> > >> in
> > >>>> a given column have a common meaning?
> > >>>>
> > >>>> In this last form, you might argue that when relationship
> > typing
> > >>>> is
> > >>>> pushed down into cell values, i.e. potentially a different
> > >> predicate
> > >>>> in each row, then that column does not really have a "common
> > >> meaning".
> > >>>> Or you might say the column does have a broader fixed meaning:
> > >>>> it
> > >>>> carries information about how values from other columns
> > relate
> > >>>> to each
> > >>>> other.
> > >>>>
> > >>>> For the sake of thought experiment I find it useful to come
> > back
> > >>>> to
> > >>>> pixel-style representation. Consider a 640x480 grid in
> > which
> > >>>> red-ness,
> > >>>> green-ness and blue-ness values are packed into each cell.
> > >> Perhaps
> > >>>> with a sub-notation using ':', on a 0-1 scale for now:
> > >>>>
> > >>>> So,
> > >>>>
> > >>>> 0.4:1.0:0.0, 0.0:0.0:0.0, 1.0:1.0;1.0, 0.4:1.0:0.0
> > >>>> 1.0:1.0:1.0, 1.0:1.0:1.0, 1.0:1.0:1.0, 1.0:1.0:1.0
> > ...
> > >> might
> > >>>> give us a
> > >>>> fragment of such a grid, with neon, black, white etc cells.
> > >>>>
> > >>>> Q: Do these columns have regular meaning?
> > >>>> A: Yes; they stand for a column of pixels in a bitmap
> > >>>> A: No; each row-column combination stands for a distinct
> > entity
> > >>>> (pixel value)
> > >>>>
> > >>>> Q: Is it useful to use W3C CSVW's work to describe this?
> > >>>> A: Sure. It can help us get the syntax details right (whitespace,
> > >>>> quotes, newlines) between tools; and it can provide arbitrary
> > >>>> per-file
> > >>>> metadata. For example the metadata might tell us that the
> > grid
> > >>>> of
> > >>>> colours comes from dan's security camera photo at such-and-so
> > >>>> a date.
> > >>>>
> > >>>> Q: Isn't this iffy, since there are much better binary
representations
> > >>>> for such data? (e.g. digital image formats)
> > >>>> A: Yes, but that can be true for more obviously factual data
> > >> too.
> > >>>>
> > >>>> Maybe what I'm getting at here is that I'm not sure what "a
> > common
> > >>>> meaning" for columns might mean. On the last call I tried
> > to
> > >> talk
> > >>>> about columns being "homogenous" but that was more in terms
> > >> of
> > >>>> low
> > >>>> level data-typing. For example, a column might always contain
> > >>>> ISO-8601-style dates, i.e. YYYY-MM-DD. But what they *mean*
> > >>>> (birthdate, deathdate, date hired, favourite date, ...)
> > >> could
> > >>>> be fixed
> > >>>> by the meaning of a different column. So the column could
> > be
> > >>>> datatype-homogenous but the nature of it's per-cell meaning
> > >>>> could vary
> > >>>> per cell.
> > >>>>
> > >>>> Dan
> > >>>>
> > >>>>
> > >>>>
> > >>>
> > >>> --
> > >>> Jeni Tennison
> > >>> http://www.jenitennison.com/
> > >>>
> > >>
> > >>
> > >>
> > >>
> > >
> > > --
> > > Jeni Tennison
> > > http://www.jenitennison.com/
> >
> >
> >
> >
>
> --
> Jeni Tennison
> http://www.jenitennison.com/

Received on Sunday, 2 March 2014 20:40:44 UTC