Re: Missing elements from the CSV metadata vocab from Yakov Shafranovich on 2014-06-11 (public-csv-wg@w3.org from June 2014)

From: Yakov Shafranovich <yakov-ietf@shaftek.org>
Date: Wed, 11 Jun 2014 08:46:21 -0400
To: "Tandy, Jeremy" <jeremy.tandy@metoffice.gov.uk>
Cc: "public-csv-wg@w3.org" <public-csv-wg@w3.org>
Message-ID: <CAPQd5oR=Pf5o9DHwZp=vENVM4AC6-n9h1prQ--p3oTGdsBtLaw@mail.gmail.com>

To clarify regarding the encoding as per today's call, I agree with
the need to include the encoding but just want to highlight the need
to discuss how to deal with a conflict between encodings in the MIME
type and inside the file.

Thanks

On Wed, Jun 11, 2014 at 8:20 AM, Yakov Shafranovich
<yakov-ietf@shaftek.org> wrote:
> Two brief notes:
> 1. Regarding the header count, RFC 4180 has a "header present" feature
> but that will be deprecated specifically because of the work of this
> group. I think we need to include this in the syntax.
>
> 2. Regarding the encoding, the MIME type for CSV will be carrying the
> encoding information. If we plan to embed the encoding information
> within the document like XML and HTML formats do, then we would need
> to deal with cases when they conflict with the MIME type.
>
> For HTML, the MIME encoding overrides the internal encoding:
>
> https://www.iana.org/assignments/media-types/text/html
>
> For XML, they do not (section 8.8):
>
> http://tools.ietf.org/html/draft-ietf-appsawg-xml-mediatypes-10#section-2.2
>
> Yakov
>
> On Wed, Jun 11, 2014 at 6:22 AM, Tandy, Jeremy
> <jeremy.tandy@metoffice.gov.uk> wrote:
>> Hi -
>>
>> The CSV data model defines a number of flags that need to be set when parsing tabular data model (see [Parsing Tabular Data][1]. The list is:
>>
>> """
>> encoding
>>     The character encoding for the file, one of the encodings listed in [encoding]. The default is utf-8.
>> row terminator
>>     The character that is used at the end of a row. The default is CRLF.
>> enclosure character
>>     The character that is used around escaped cells. The default is ".
>> escape character
>>     The character that is used to escape the enclosure character within escaped cells. The default is ".
>> skip rows
>>     The number of rows to skip at the beginning of the file, before a header row or tabular data. The default is 0.
>> comment prefix
>>     A character that, when it appears at the beginning of a skipped row, indicates a comment that should be associated as a comment annotation to the table. The default is #.
>> header row count
>>     The number of header rows (following the skipped rows) in the file. The default is 1.
>> delimiter
>>     The separator between cells. The default is ,.
>> skip columns
>>     The number of columns to skip at the beginning of each row, before any header columns. The default is 0.
>> header column count
>>     The number of header columns (following the skipped columns) in each row. The default is 0.
>> skip blank rows
>>     Indicates whether to ignore wholly empty rows (ie rows in which all the cells are empty). The default is false.
>> trim
>>     Indicates whether to trim whitespace around cells.
>> """
>>
>> I would expect these to be specified as properties in the [metadata vocabulary][2]
>>
>> Am I missing something?
>>
>> Jeremy
>>
>> [1]: http://w3c.github.io/csvw/syntax/#parsing
>> [2]: http://w3c.github.io/csvw/metadata/index.html
>>

Received on Wednesday, 11 June 2014 12:47:20 UTC