Re: Missing elements from the CSV metadata vocab

On 11 June 2014 13:46, Yakov Shafranovich <yakov-ietf@shaftek.org> wrote:
> To clarify regarding the encoding as per today's call, I agree with
> the need to include the encoding but just want to highlight the need
> to discuss how to deal with a conflict between encodings in the MIME
> type and inside the file.

Agreed. We have slight variation/complication of the issue here
(depending on how CSVs are packaged, e.g. zipped vs independent URLs).
With HTML and XML, the choice is "What does the serving HTTP message
say, versus what does the document itself say?". When metadata files
are external, we potentially have many:many relationship between
metadata files and the underlying CSV files.

Dan

> Thanks
>
> On Wed, Jun 11, 2014 at 8:20 AM, Yakov Shafranovich
> <yakov-ietf@shaftek.org> wrote:
>> Two brief notes:
>> 1. Regarding the header count, RFC 4180 has a "header present" feature
>> but that will be deprecated specifically because of the work of this
>> group. I think we need to include this in the syntax.
>>
>> 2. Regarding the encoding, the MIME type for CSV will be carrying the
>> encoding information. If we plan to embed the encoding information
>> within the document like XML and HTML formats do, then we would need
>> to deal with cases when they conflict with the MIME type.
>>
>> For HTML, the MIME encoding overrides the internal encoding:
>>
>> https://www.iana.org/assignments/media-types/text/html
>>
>> For XML, they do not (section 8.8):
>>
>> http://tools.ietf.org/html/draft-ietf-appsawg-xml-mediatypes-10#section-2.2
>>
>> Yakov
>>
>> On Wed, Jun 11, 2014 at 6:22 AM, Tandy, Jeremy
>> <jeremy.tandy@metoffice.gov.uk> wrote:
>>> Hi -
>>>
>>> The CSV data model defines a number of flags that need to be set when parsing tabular data model (see [Parsing Tabular Data][1]. The list is:
>>>
>>> """
>>> encoding
>>>     The character encoding for the file, one of the encodings listed in [encoding]. The default is utf-8.
>>> row terminator
>>>     The character that is used at the end of a row. The default is CRLF.
>>> enclosure character
>>>     The character that is used around escaped cells. The default is ".
>>> escape character
>>>     The character that is used to escape the enclosure character within escaped cells. The default is ".
>>> skip rows
>>>     The number of rows to skip at the beginning of the file, before a header row or tabular data. The default is 0.
>>> comment prefix
>>>     A character that, when it appears at the beginning of a skipped row, indicates a comment that should be associated as a comment annotation to the table. The default is #.
>>> header row count
>>>     The number of header rows (following the skipped rows) in the file. The default is 1.
>>> delimiter
>>>     The separator between cells. The default is ,.
>>> skip columns
>>>     The number of columns to skip at the beginning of each row, before any header columns. The default is 0.
>>> header column count
>>>     The number of header columns (following the skipped columns) in each row. The default is 0.
>>> skip blank rows
>>>     Indicates whether to ignore wholly empty rows (ie rows in which all the cells are empty). The default is false.
>>> trim
>>>     Indicates whether to trim whitespace around cells.
>>> """
>>>
>>> I would expect these to be specified as properties in the [metadata vocabulary][2]
>>>
>>> Am I missing something?
>>>
>>> Jeremy
>>>
>>> [1]: http://w3c.github.io/csvw/syntax/#parsing
>>> [2]: http://w3c.github.io/csvw/metadata/index.html
>>>
>

Received on Wednesday, 11 June 2014 13:01:54 UTC