- From: H.Merijn Brand <h.m.brand@xs4all.nl>
- Date: Mon, 15 Sep 2014 22:36:52 +0200
- To: Kev Kirkland <kev@dataunity.org>
- Cc: "Ingram, William A" <wingram2@illinois.edu>, Jeni Tennison <jeni@jenitennison.com>, Ivan Herman <ivan@w3.org>, W3C CSV on the Web Working Group <public-csv-wg@w3.org>
- Message-ID: <20140915223652.6690fef7@pc09.procura.nl>
On Mon, 15 Sep 2014 19:53:05 +0100, Kev Kirkland <kev@dataunity.org> wrote: > MultipleHeadingRows is one that interests me too - it comes up often in the > Market Research field. The data I had to deal with looks similar to Use > Case #2 [1]. > > Python Pandas [2] is the best software package I've found so far for > dealing with multiple headers. It has 'levels' which show how row/column > headers are grouped together. In Pandas all rows and columns have an index > and "hierarchical indexing" is used when the indexes are grouped together > into bigger hierarchical structures [3]. Being the author of Perl5's Text::CSV_XS [1] I am fully open to *any* valuable extension I can make to help you out. I did implement RFC7111 [2] in the week it was released [3] If not only to have a goal to make Text::CSV_XS the fastest, most reliable and most configurable tool available for CSV parsing, I'd for sure also like to use it to promote perl. Not only perl5, but also perl6. I have started with someone else to make a perl6 port of this module that we hope to have working by April 2015. If you would have any (well-documented) requirements for multiple headers, I'd be obliged to think about implementing the most useful way to accommodate the w3c wishes. [1] http://metacpan.org/module/Text::CSV_XS [2] http://tools.ietf.org/html/rfc7111 [3] https://metacpan.org/pod/Text::CSV_XS#fragment > Using Example 1 (in Use Case #2) you might say row 5 is a column level > called "Measure", row 6 is a level called "Unit", row 7 "Indicator" and row > 8 "Employment status" (I'm guessing the most appropriate names for the > levels as I couldn't see the correct terms in the metadata in the zip file). > > I'm a bit rusty with Pandas, but I think it lets you specify hierarchical > indexing when you load a CSV file [4] (see the 'header' parameter). Unlike > a lot of other systems, 'headers' isn't a simple boolean for present or > absent, but can add more detail (like the rows which the headers appear on). > > It would be great to have this type of information in the CSV on the Web > metadata as it's very useful for reading files. Pandas has hierarchical > indexing on both rows and columns so it can deal with data that looks like > pivot tables (or OLAP style results). > > One potential issue with hierarchical indexing with levels is that each > level is assumed to be homogeneous. In Example 1 (Use Case #2) columns C > and D have a bit more information (they are total level figures) which > wouldn't be captured in the level definition. > > Thanks, > > Kev > > [1] http://w3c.github.io/csvw/use-cases-and-requirements/#UC-PublicationOfNationalStatistics > [2] http://pandas.pydata.org/ > [3] http://pandas.pydata.org/pandas-docs/stable/indexing.html#hierarchical-indexing-multiindex > [4] http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_csv.html > > On 15 September 2014 18:30, Ingram, William A <wingram2@illinois.edu> wrote: > > > I got started on UC 12 -- the chemical structures case (requires: > > WellFormedCsvCheck, CsvValidation, MultipleHeadingRows, and > > UnitMeasureDefinition). Perhaps this was not the best case to begin with, > > but I feel like I'm in too deep to turn back now. :) > > > > I hit a wall in trying to describe these as JSON, paticularly the multiple > > header rows. Is there any background reading I should read to learn more > > about csv to json? Or has no one figured this out yet? > > > > Thanks, > > Bill -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.19 porting perl5 on HP-UX, AIX, and openSUSE http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
Received on Tuesday, 16 September 2014 08:48:50 UTC