Re: R-CellValueMicroSyntax

On 30 Apr 2014, at 19:57 , Jeni Tennison <jeni@jenitennison.com> wrote:

> See http://w3c.github.io/csvw/use-cases-and-requirements/#R-CellValueMicroSyntax
> 
> I’d like to have a quick discussion about this requirement because I think it’s covering a wide range of things which we might take different positions on when considering whether they’re in scope.
> 
> The use cases show four types of microsyntax:
> 
>   1. various date/time syntaxes (not just ISO-8601 ones)
>   2. comma-separated lists of editors within fields in UC-JournalArticleSearch
>   3. embedded structured data (eg XML (VML) in UC-PaloAltoTreeData)
>   4. semi-structured text in UC-PaloAltoTreeData
> 
> And I can see four things you might want to do with them:
> 
>   A. document the microsyntax so that humans can understand what it’s conveying
>   B. validate the values to make sure they conform to the microsyntax you expect
>   C. label the value as being in a particular microsyntax when converting into JSON/XML/RDF (eg marking an XML value as an XMLLiteral)
>   D. process the microsyntax into an appropriate data structure when converting into JSON/XML/RDF (eg mapping the XML value into an appropriate JSON object)
> 
> I want to suggest that:
> 
> * We should mark as Deferred the intersection of 3 & D — we shouldn’t expect CSV processors to be able to take values that are XML and convert them into RDF or into JSON.
> 
> * We should mark as Deferred the intersection of 4 & D — similarly, we shouldn’t expect CSV processors to be able to take arbitrary semi-structured text and convert it into XML/JSON/RDF.

I agree with both.

But what about 2 and D? We could say (in JSON or RDF) that this means putting the values into a list, but I worried of the situation where the result should be a list of a particular datatype, for example. It might be complicated (but we should try).
> 
> Otherwise I’m happy to include those requirements. WRT to the data model, I don’t think that means we need the data model to say that values in a CSV file *are* lists or object structures; I think we can continue to say that they’re annotated strings, and the annotation (which might include a definition of the format of the string) can be used to validate the string and (in some cases) convert it into a suitable value or data structure.


I am also not sure what 2+'B' means. Do you mean we should have some sort of a 'schema' like description on the structure of a particular microsyntax when converting the CSV file into the Data Model? Ie, that the microsyntax should be a number, followed by a data, followed by something else? I am tempted to push this into a Deferred category as well, ie, the conversion into the Data Model should be opaque with a possible human readable description.

If I put a possible implementer's hat on, I would probably implement the conversion into a Data Model (I actually did something like that in node.js as a JavaScript-learning exercise recently) by giving the possibility to the user to add a callback function on cells to make any conversion that is possible). I wonder whether this should remain an implementation-specific trick or something we would describe in the conversion process.

Ivan


> 
> Cheers,
> 
> Jeni
> --  
> Jeni Tennison
> http://www.jenitennison.com/
> 


----
Ivan Herman, W3C 
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
GPG: 0x343F1A3D
FOAF: http://www.ivan-herman.net/foaf

Received on Thursday, 1 May 2014 06:59:31 UTC