Re: R-CellValueMicroSyntax

Hi Ivan,

From: Ivan Herman ivan@w3.org Date: 1 May 2014 at 08:00:05
> On 30 Apr 2014, at 19:57 , Jeni Tennison wrote:
> > See http://w3c.github.io/csvw/use-cases-and-requirements/#R-CellValueMicroSyntax  
> >
> > I’d like to have a quick discussion about this requirement because I think it’s covering  
> a wide range of things which we might take different positions on when considering whether  
> they’re in scope.
> >
> > The use cases show four types of microsyntax:
> >
> > 1. various date/time syntaxes (not just ISO-8601 ones)
> > 2. comma-separated lists of editors within fields in UC-JournalArticleSearch
> > 3. embedded structured data (eg XML (VML) in UC-PaloAltoTreeData)
> > 4. semi-structured text in UC-PaloAltoTreeData
> >
> > And I can see four things you might want to do with them:
> >
> > A. document the microsyntax so that humans can understand what it’s conveying
> > B. validate the values to make sure they conform to the microsyntax you expect
> > C. label the value as being in a particular microsyntax when converting into JSON/XML/RDF  
> (eg marking an XML value as an XMLLiteral)
> > D. process the microsyntax into an appropriate data structure when converting into  
> JSON/XML/RDF (eg mapping the XML value into an appropriate JSON object)
> >
> > I want to suggest that:
> >
> > * We should mark as Deferred the intersection of 3 & D — we shouldn’t expect CSV processors  
> to be able to take values that are XML and convert them into RDF or into JSON.
> >
> > * We should mark as Deferred the intersection of 4 & D — similarly, we shouldn’t expect  
> CSV processors to be able to take arbitrary semi-structured text and convert it into  
> XML/JSON/RDF.
>  
> I agree with both.
>  
> But what about 2 and D? We could say (in JSON or RDF) that this means putting the values into  
> a list, but I worried of the situation where the result should be a list of a particular  
> datatype, for example. It might be complicated (but we should try).

I was assuming, btw, that the list in #2 is of things all of the same type. I think the main complication then will probably come in the conversion to RDF where you have to distinguish between an rdf:List and the normal interpretation (multiple values for a given property). But that’s manageable too.

[snip]
> I am also not sure what 2+'B' means. Do you mean we should have some sort of a 'schema' like  
> description on the structure of a particular microsyntax when converting the CSV file  
> into the Data Model? Ie, that the microsyntax should be a number, followed by a data, followed  
> by something else? I am tempted to push this into a Deferred category as well, ie, the conversion  
> into the Data Model should be opaque with a possible human readable description.

I think that the vast majority of microsyntaxes are sufficiently describable using regexps for validation purposes, so I don’t think validation is ever a real issue.

> If I put a possible implementer's hat on, I would probably implement the conversion into  
> a Data Model (I actually did something like that in node.js as a JavaScript-learning  
> exercise recently) by giving the possibility to the user to add a callback function on  
> cells to make any conversion that is possible). I wonder whether this should remain an  
> implementation-specific trick or something we would describe in the conversion process.  

Yes, definitely the processing of microsyntaxes that aren’t described through the declarative metadata should be handled through extension mechanisms in implementations. I’m hoping that the conversion processes will all be able to ‘bug out’ to something sufficiently powerful that that parsing is possible in a standard way rather than being implementation dependent.

Cheers,

Jeni
--  
Jeni Tennison
http://www.jenitennison.com/

Received on Wednesday, 7 May 2014 16:00:13 UTC