- From: Ivan Herman <ivan@w3.org>
- Date: Thu, 1 May 2014 08:59:02 +0200
- To: Jeni Tennison <jeni@jenitennison.com>
- Cc: W3C CSV on the Web Working Group <public-csv-wg@w3.org>
- Message-Id: <6DCBB62B-3C62-4E3C-AB65-4BAA6353D6BC@w3.org>
On 30 Apr 2014, at 19:57 , Jeni Tennison <jeni@jenitennison.com> wrote: > See http://w3c.github.io/csvw/use-cases-and-requirements/#R-CellValueMicroSyntax > > I’d like to have a quick discussion about this requirement because I think it’s covering a wide range of things which we might take different positions on when considering whether they’re in scope. > > The use cases show four types of microsyntax: > > 1. various date/time syntaxes (not just ISO-8601 ones) > 2. comma-separated lists of editors within fields in UC-JournalArticleSearch > 3. embedded structured data (eg XML (VML) in UC-PaloAltoTreeData) > 4. semi-structured text in UC-PaloAltoTreeData > > And I can see four things you might want to do with them: > > A. document the microsyntax so that humans can understand what it’s conveying > B. validate the values to make sure they conform to the microsyntax you expect > C. label the value as being in a particular microsyntax when converting into JSON/XML/RDF (eg marking an XML value as an XMLLiteral) > D. process the microsyntax into an appropriate data structure when converting into JSON/XML/RDF (eg mapping the XML value into an appropriate JSON object) > > I want to suggest that: > > * We should mark as Deferred the intersection of 3 & D — we shouldn’t expect CSV processors to be able to take values that are XML and convert them into RDF or into JSON. > > * We should mark as Deferred the intersection of 4 & D — similarly, we shouldn’t expect CSV processors to be able to take arbitrary semi-structured text and convert it into XML/JSON/RDF. I agree with both. But what about 2 and D? We could say (in JSON or RDF) that this means putting the values into a list, but I worried of the situation where the result should be a list of a particular datatype, for example. It might be complicated (but we should try). > > Otherwise I’m happy to include those requirements. WRT to the data model, I don’t think that means we need the data model to say that values in a CSV file *are* lists or object structures; I think we can continue to say that they’re annotated strings, and the annotation (which might include a definition of the format of the string) can be used to validate the string and (in some cases) convert it into a suitable value or data structure. I am also not sure what 2+'B' means. Do you mean we should have some sort of a 'schema' like description on the structure of a particular microsyntax when converting the CSV file into the Data Model? Ie, that the microsyntax should be a number, followed by a data, followed by something else? I am tempted to push this into a Deferred category as well, ie, the conversion into the Data Model should be opaque with a possible human readable description. If I put a possible implementer's hat on, I would probably implement the conversion into a Data Model (I actually did something like that in node.js as a JavaScript-learning exercise recently) by giving the possibility to the user to add a callback function on cells to make any conversion that is possible). I wonder whether this should remain an implementation-specific trick or something we would describe in the conversion process. Ivan > > Cheers, > > Jeni > -- > Jeni Tennison > http://www.jenitennison.com/ > ---- Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 GPG: 0x343F1A3D FOAF: http://www.ivan-herman.net/foaf
Received on Thursday, 1 May 2014 06:59:31 UTC