- From: Michael Hausenblas <michael.hausenblas@deri.org>
- Date: Thu, 28 Apr 2011 17:35:41 +0100
- To: URI IG <uri@w3.org>
- Cc: Erik Wilde <dret@berkeley.edu>, Richard Cyganiak <richard.cyganiak@deri.org>
(forwarding this to list as it seems Richard is not subscribed and hence this message didn't show up in the archive). Thanks a lot for the comments, Richard - I'll follow up in a separate mail, soon. Cheers, Michael On 27 Apr 2011, at 12:05, Richard Cyganiak wrote: > Hi Erik, hi Michael, > > This is a comment on the first draft of “URI Fragment Identifiers > for the text/csv Media Type” [1], announced here [2]. > > Best, > Richard > > [1] http://www.ietf.org/id/draft-hausenblas-csv-fragment-00.txt > [2] http://lists.w3.org/Archives/Public/uri/2011Apr/0003.html > > > Section 2 > > The draft does not appear to provide a way of addressing the most > fundamental part of a CSV file: a cell. I find this confusing, as it > seems like a really obvious and surprising use case to me. In fact, > you say that one use case is “making assertions about a certain > value”. How is this possible given the current design? > > I guess I'm asking for something like this: #cell:temperature,4 to > address the value in the temperature column, row 4. > > A less critical but perhaps also interesting feature would be Excel- > style cell ranges, such as #cells:temperature,4:temperature,6. > > > Section 2.1 > > This is quite fuzzy on the question of header detection. As the > draft is currently designed, an implementation has to detect whether > a header is present or not, otherwise it cannot determine what part > of the table exactly is being addressed. So is the header=present > thing in the media type the only and canonical way of determining > presence of headers? > > If that is the case, then what with non-HTTP protocols, e.g., file:///Users/richard/test.csv > #row:0? > > What does #head address if the media type does not indicate the > presence of a header? > > (A possible solution might be to make the addressed part independent > of the presence of a header. #head would simply address the first > row, regardless of whether it's actually a header. Same for #row:0. > #col:2 would be > place,Galway,Galway,Galway,Berkeley,Berkeley,Berkeley. If the > example table had no header, then #col:2 would be > Galway,Galway,Galway,Berkeley,Berkeley,Berkeley. And #col:Galway > would be the same. And so on.) > > The first paragraph of 2.1 is poorly written. > > > Section 2.2 > > How does the row:n format interact with presence/absence of header? > If a header is present, does #row:0 address the same as #head? > > A handy feature would be to allow addressing of the last row using > #row:-1 (and similar for the second-to-last row etc). > > What is addressed by #row:1000 if the table has only 10 rows? > > What is the use case for the #row:* format? It seems a bit obscure > to me and perhaps might better be dropped. > > > Section 2.3 > > It appears that the header row, if present, is excluded from > #col:xxx addressing. Maybe this can be clarified in the text. > > What is addressed by #col:xxx if xxx is neither a number nor a > column in the table? > > What is addressed by #col:2 if there is a column named "2"? > > What is addressed by #col:xxx if no header is present, or if a > header is present but not indicated in the media type? > > What is addressed by #col:foo if the header contains a duplicate > column, like foo,bar,baz,foo? > > > Section 2.4 > > I am unconvinced that the slice-based selection is useful as it is > described right now. I'd like to understand better what the use case > is. Personally, I can see more use cases for selecting entire rows > based on a value match, such as this: > > #row:name=Alice > > I would expect the addressed part to be the entire row, including > the value that was used for the match. Excluding the matched column > seems a bit strange to me and I just have trouble understanding what > the motivation is. > > Independently from that: The name “slice-based” isn't very > appropriate for the current mechanism. “Slice” implies a complete > “thin” cut along one dimension. That's how it's used in data > warehouse speak, anyways. In that sense, both row-based and column- > based selection are slices, but this “slice-based” selection > actually is not. More accurate would be “table reduction” or “select > +project”, but admittedly these are not very snappy. Perhaps “value- > based selection”? > > > Section 3 > > URI syntax only allows certain characters. Other characters have to > be escaped. CSV cells also allow only certain characters, but a > different set, with different escaping rules. I would expect some > language here that addresses this. For example, if I have a cell row: > > 2011-01-01,1,"Galway, Ireland" > > then what exactly would a #where:place=xxx fragment that selects > this row look like?
Received on Thursday, 28 April 2011 16:36:15 UTC