- From: Richard Cyganiak <richard.cyganiak@deri.org>
- Date: Fri, 03 Jun 2011 15:01:48 -0400
- To: Michael Hausenblas <michael.hausenblas@deri.org>, Erik Wilde <dret@berkeley.edu>
- Cc: uri@w3.org
Hi Erik, hi Michael, This is a comment on the first draft of “URI Fragment Identifiers for the text/csv Media Type” [1], announced here [2]. Best, Richard [1] http://www.ietf.org/id/draft-hausenblas-csv-fragment-00.txt [2] http://lists.w3.org/Archives/Public/uri/2011Apr/0003.html Section 2 The draft does not appear to provide a way of addressing the most fundamental part of a CSV file: a cell. I find this confusing, as it seems like a really obvious and surprising use case to me. In fact, you say that one use case is “making assertions about a certain value”. How is this possible given the current design? I guess I'm asking for something like this: #cell:temperature,4 to address the value in the temperature column, row 4. A less critical but perhaps also interesting feature would be Excel-style cell ranges, such as #cells:temperature,4:temperature,6. Section 2.1 This is quite fuzzy on the question of header detection. As the draft is currently designed, an implementation has to detect whether a header is present or not, otherwise it cannot determine what part of the table exactly is being addressed. So is the header=present thing in the media type the only and canonical way of determining presence of headers? If that is the case, then what with non-HTTP protocols, e.g., file:///Users/richard/test.csv#row:0? What does #head address if the media type does not indicate the presence of a header? (A possible solution might be to make the addressed part independent of the presence of a header. #head would simply address the first row, regardless of whether it's actually a header. Same for #row:0. #col:2 would be place,Galway,Galway,Galway,Berkeley,Berkeley,Berkeley. If the example table had no header, then #col:2 would be Galway,Galway,Galway,Berkeley,Berkeley,Berkeley. And #col:Galway would be the same. And so on.) The first paragraph of 2.1 is poorly written. Section 2.2 How does the row:n format interact with presence/absence of header? If a header is present, does #row:0 address the same as #head? A handy feature would be to allow addressing of the last row using #row:-1 (and similar for the second-to-last row etc). What is addressed by #row:1000 if the table has only 10 rows? What is the use case for the #row:* format? It seems a bit obscure to me and perhaps might better be dropped. Section 2.3 It appears that the header row, if present, is excluded from #col:xxx addressing. Maybe this can be clarified in the text. What is addressed by #col:xxx if xxx is neither a number nor a column in the table? What is addressed by #col:2 if there is a column named "2"? What is addressed by #col:xxx if no header is present, or if a header is present but not indicated in the media type? What is addressed by #col:foo if the header contains a duplicate column, like foo,bar,baz,foo? Section 2.4 I am unconvinced that the slice-based selection is useful as it is described right now. I'd like to understand better what the use case is. Personally, I can see more use cases for selecting entire rows based on a value match, such as this: #row:name=Alice I would expect the addressed part to be the entire row, including the value that was used for the match. Excluding the matched column seems a bit strange to me and I just have trouble understanding what the motivation is. Independently from that: The name “slice-based” isn't very appropriate for the current mechanism. “Slice” implies a complete “thin” cut along one dimension. That's how it's used in data warehouse speak, anyways. In that sense, both row-based and column-based selection are slices, but this “slice-based” selection actually is not. More accurate would be “table reduction” or “select+project”, but admittedly these are not very snappy. Perhaps “value-based selection”? Section 3 URI syntax only allows certain characters. Other characters have to be escaped. CSV cells also allow only certain characters, but a different set, with different escaping rules. I would expect some language here that addresses this. For example, if I have a cell row: 2011-01-01,1,"Galway, Ireland" then what exactly would a #where:place=xxx fragment that selects this row look like?
Received on Friday, 3 June 2011 19:01:56 UTC