- From: Yakov Shafranovich <yakov-ietf@shaftek.org>
- Date: Wed, 11 Jun 2014 08:46:21 -0400
- To: "Tandy, Jeremy" <jeremy.tandy@metoffice.gov.uk>
- Cc: "public-csv-wg@w3.org" <public-csv-wg@w3.org>
To clarify regarding the encoding as per today's call, I agree with the need to include the encoding but just want to highlight the need to discuss how to deal with a conflict between encodings in the MIME type and inside the file. Thanks On Wed, Jun 11, 2014 at 8:20 AM, Yakov Shafranovich <yakov-ietf@shaftek.org> wrote: > Two brief notes: > 1. Regarding the header count, RFC 4180 has a "header present" feature > but that will be deprecated specifically because of the work of this > group. I think we need to include this in the syntax. > > 2. Regarding the encoding, the MIME type for CSV will be carrying the > encoding information. If we plan to embed the encoding information > within the document like XML and HTML formats do, then we would need > to deal with cases when they conflict with the MIME type. > > For HTML, the MIME encoding overrides the internal encoding: > > https://www.iana.org/assignments/media-types/text/html > > For XML, they do not (section 8.8): > > http://tools.ietf.org/html/draft-ietf-appsawg-xml-mediatypes-10#section-2.2 > > Yakov > > On Wed, Jun 11, 2014 at 6:22 AM, Tandy, Jeremy > <jeremy.tandy@metoffice.gov.uk> wrote: >> Hi - >> >> The CSV data model defines a number of flags that need to be set when parsing tabular data model (see [Parsing Tabular Data][1]. The list is: >> >> """ >> encoding >> The character encoding for the file, one of the encodings listed in [encoding]. The default is utf-8. >> row terminator >> The character that is used at the end of a row. The default is CRLF. >> enclosure character >> The character that is used around escaped cells. The default is ". >> escape character >> The character that is used to escape the enclosure character within escaped cells. The default is ". >> skip rows >> The number of rows to skip at the beginning of the file, before a header row or tabular data. The default is 0. >> comment prefix >> A character that, when it appears at the beginning of a skipped row, indicates a comment that should be associated as a comment annotation to the table. The default is #. >> header row count >> The number of header rows (following the skipped rows) in the file. The default is 1. >> delimiter >> The separator between cells. The default is ,. >> skip columns >> The number of columns to skip at the beginning of each row, before any header columns. The default is 0. >> header column count >> The number of header columns (following the skipped columns) in each row. The default is 0. >> skip blank rows >> Indicates whether to ignore wholly empty rows (ie rows in which all the cells are empty). The default is false. >> trim >> Indicates whether to trim whitespace around cells. >> """ >> >> I would expect these to be specified as properties in the [metadata vocabulary][2] >> >> Am I missing something? >> >> Jeremy >> >> [1]: http://w3c.github.io/csvw/syntax/#parsing >> [2]: http://w3c.github.io/csvw/metadata/index.html >>
Received on Wednesday, 11 June 2014 12:47:20 UTC