Minutes of our meeting, 4 June

From: Ivan Herman <ivan@w3.org>
Date: Wed, 4 Jun 2014 09:15:46 -0400
To: W3C CSV on the Web Working Group <public-csv-wg@w3.org>
Meeting minutes are here:


Text version below




              CSV on the Web Working Group Teleconference

04 Jun 2014


   See also: [3]IRC log

          Phil Archer (philA), Dan Brickley (danbri), Jeni
          Tennison (JeniT), Jeremy Tandy (jtandy), Andy Seaborne
          (Andys), Eric Stephan (estephan), Ivan Herman (ivan),
          Davide Ceolin (DavideCeolin), Alfonso Noriega (fonso)

          Dan Brickley

          Andy Seaborne


     * [4]Topics
         1. [5]Progressing XML & JSON conversions
         2. [6]metadata format
         3. [7]Model for Tabular Data and Metadata on the Web
         4. [8]Use cases and requirements
         5. [9]Generating RDF from Web Tabular Data
         6. [10]publication status
     * [11]Summary of Action Items

   <trackbot> Date: 04 June 2014
   No formal meeting last week.

   <danbri> week before last's meeting:

Progressing XML & JSON conversions

   danbri: need people to champion these formats

   jtandy: we have UC for each format

   jenit: co-chairs intend to reach out to members of the WG

   danbri: test cases - need structure in the repo for
   examples/test cases.
   ... inputs directory for csv, and then folder per output
   ... can compare the different approaches.
   ... go though UCs and get the CSVs then produce outputs

   jtandy: makes sense ... UCs already have files for CSV inputs

   danbri: target presentations given?

   jtandy: somewhat but not systematic.
   ... starting point, UC#21, TSV, some possible geojson output
   from that

   danbri: allow different mappings with close position of
   different mappings

   estephen: UC#6 is an interesting one
   ivan: procedure -- if we use UC as test cases, we have be
   careful because once in /TR/ files are frozen

   <danbri> [slightly muffled audio]

   jenit: in the structure have different metadata for same CSV

   danbri: good point

   <danbri> (maybe a raw_source, canonical_source)

   <danbri> + multiple metadata + mapping files

   <DavideCeolin> ops sorry AndyS, I thought that was last
   assigned id

metadata format


   jenit: conversation with Rufus
   ... metadata specifying parsing options and whether CSV is 1-1
   aligned to the "tabular format"
   ... #1 - standard CSV files , frags work

   <danbri> cf

   jenit: #2 - parsing options - e.g separator

   <danbri> losing audio

   <danbri> buzzzing noise

   <estephan> aliens landing?

   jenit: #3 padding around the CSV data, maybe more than one
   table in one file

   <JeniT> andys: I think we’ll need to do a bit of #3

   <JeniT> … especially to handle padding

   <JeniT> … because tabular data is made to look nice in Excel

   <JeniT> … the question is how much attention to pay to that

   <Zakim> danbri, you wanted to suggest metadata linking dataset
   packages could work; package A could be TSV with fluff; package
   B could be a downstream transform of A

   danbri: not great if input has appearance
   ... but pointing into cells good
   ... maybe CSV original => CSV cleaned => tabular data

   jenit: that is #1 -- metadata refs cleaned file

   <danbri> andy: risk of losing row/col refs to shape of the
   original file

   andys: reference issue on rows

   jtandy: suggest #1, #2 - agree -- and fixed format forms
   ... re: #3 -- location of data in the file. -- the frag
   relative to that datum.

   jenit: yes - that's the idea in the message
   ... rows, cols references do not match the original file.

   jtandy: provide a tool to extract the data in the processing

   <danbri> [re tooling, I stumbled across
   t yesterday, quite promising]

   jtandy: step one can be extract : pure #3 is potentially

   ivan: is metadata content is generated by parsing or instructs
   the parser?

   jenit: latter - it is input to parser

   ivan: tab data model is non-norm on this -- not a charter item
   (in IETF)
   ... need to come up with test cases etc.

   jenit: is more work. Think that #2 + explain how to use the

   <danbri> [there's some difference between defining a kind of
   software component ('parser'), versus giving metadata
   description of mappings from chars to tabular datasets]

   <danbri> JeniT also suggested we discuss 'separating 'schema'
   from 'notes'' under this item

   <Zakim> AndyS, you wanted to ask that an issue is left in doc
   about this for community.

   <danbri> andys: can we have an issue is left in doc about this

   <danbri> ivan: we can't avoid the procedural issue that IETF
   standardize CSV

   jtandy: multistep processing to get tabular data format.
   Provide information tools/advice.

   <danbri> jeni: there are other RFCs for other delimited formats

   jenit: encoding and header are in RFC anyway ... other RFCs for
   other formats ... escape is borderline.
   ... encourages people to create good files.

   danbri: IETF/W3C is not a worrying disconnect -- stuff to do

   subtopic: notes for schemas

   <danbri> vs schemas

   <danbri> "separating 'schema' from 'notes'"

   jenit: separate out the schema that can be reused across files

   <jtandy> +1

   <ivan> +1

   ivan: in current doc - bare JSON and JSON-LD examples - decide?

   jenit: aiming for JSON-LD compatibility

Model for Tabular Data and Metadata on the Web (Jeni)

   <danbri> "renaming 'fields' to 'cells' to avoid culture clash"

   jenit: renaming 'fields' to 'cells' to avoid culture clash --
   "field" is a column

   <ivan> +1

   (what is the CSV RFC terminology?)

   <danbri> +1

   <jtandy> thankyou

   <danbri> record = field *(COMMA field)

   jenit: records and fields in RFC

   <danbri> "determine if we are happy that _all_ RTL tabular data
   is logically the same as LTR tabular data - just that it is
   rendered differently"


   <danbri> (does scribe bot have notion of a subtopic?)

   <JeniT> AndyS, no: we will keep ‘column’ as ‘column’

   danbri: need to grab a copy.

   <Zakim> AndyS, you wanted to ask if the intention to use field
   as column?

   jtandy: data is serialized byte 0, byte 1 , ... and RTL is
   about display.

   ivan: same every RTL location?
   ... question to Yakov

   jenit: highlighted in doc to be published.

   ivan: remind me to follow up with W3C offices

   <Zakim> danbri, you wanted to comment on RTL and separators

Use cases and requirements

   jtandy: in good shape.
   ... HL7 -- maybe too ambitious - complex encoding. Drop?
   ... currently incomplete and no requirements so just leaving it
   as is is not good.
   ... from James McKinney (sp?)

   jenit: ping contributor?

   <Zakim> danbri, you wanted to suggest an 'other topics' section
   if don't have one

   jenit: issue on additonal UC ... if community gives input

   jtandy: ... HL7 not in current except as "known, incomplete"
   ... will put in UC current on list .. I have "a little list"

   <danbri> ACTION: danbri assign more actions [recorded in

   <trackbot> Created ACTION-21 - Assign more actions [on Dan
   Brickley - due 2014-06-11].


   jtandy: UC review by contributors --- currently empty ... draw
   people's attention to that

   <danbri> action everyone read

   <trackbot> Error finding 'everyone'. You can review and
   register nicknames at

   jtandy: issues in UCR ... take to list.

Generating RDF from Web Tabular Data

   <danbri> Andys: I haven't had any time to spend on it lately

   <danbri> danbri: seemed to be some mild disagreements last week
   - can someone summarise?

   <danbri> andys: ivan's concerned about us taking on too much

   <danbri> ivan: what is happening now is that we are exploring
   the whole approach of basing conversion on templates

   <danbri> … this expl means … i try to understand whats going
   on, i'm slow in understanding that, some details unclear to me

   <danbri> … this is where we are , some emails flying around
   between andy and me. Next step as far as I'm concerned (given
   time avail) is to write down a draft spec

   <danbri> … a bit on similar level to what I did a while back as
   'mechanical approach'; then we have to make a decision about
   overall direction. Shouldn't be andy's and mine only.

   <danbri> … back to json, xml — both andy and I have been using
   Turtle examples. At least in my case, main reason is that i'm
   more comfortable with it.

   <danbri> … whole discussion/approach is pretty much generic.
   Andy do you agree?

   <danbri> … that same tmpl approach should work with json, with
   xml … without any significant change

   <danbri> … in this sense not just an rdf thing. sense/structure
   is not rdf specific. some details might be, but whole thing is

   <danbri> AndyS: I can see it working for json. Just don't know
   well enough at xml level to understand one way or the other.

   <danbri> … Jeni - there were outstanding Qs to you from couple
   weeks ago. You mentioned requirements - I was asking for
   concrete requirements.

   <danbri> JeniT: on conditional processing?

   <danbri> … will use dan's example to illustrate that

   <danbri> … Dan's suggestion to focus on some examples and
   target output with specific metadata files, draft templates,
   will be helpful in exploring this

   <danbri> AndyS: a bit confused by that, as there are examples
   already, e.g. jeremy's

   <danbri> JeniT: absolutely; just to hve this in a structured
   way in the directory structure, to help unlock progress

   <danbri> ivan: back to procedural side, … dan/jeni will reach
   out to rest of group for the xml/json conversion

   <danbri> … assumption that andy/ivan could edit, though i
   didn't really volunteer, so let's not assume that. Perhaps Andy
   in same situation.

   <danbri> AndyS: yes, I have a time issue. If I don't see
   sufficient support for the approach i'm exploring, unlikely I
   can put more time in.

   <danbri> dan: how close are the other specs, jeni?

publication status

   <danbri> jeni: model spec is close. metadata spec has a lot of
   issues but ok for FPWD

   tab data - nearly ready

   jenit: metadata - happy to publish with issue boxes

   <danbri> practicalities: ivan vacation 4 weeks from 14 July.

   <danbri> dan some July vacation too.

   jenit: tabular data in good shape -- s/field/cell/g

   <danbri> ACTION: dan make a concrete proposal for concrete test
   cases structure [recorded in

   <trackbot> Created ACTION-22 - Make a concrete proposal for
   concrete test cases structure [on Dan Brickley - due

   ivan: UCR looks ready or v close already


Summary of Action Items

   [NEW] ACTION: dan make a concrete proposal for concrete test
   cases structure [recorded in
   [NEW] ACTION: danbri assign more actions [recorded in

   [End of minutes]

    Minutes formatted by David Booth's [26]scribe.perl version
    1.138 ([27]CVS log)
    $Date: 2014-06-04 13:12:14 $

