Re: simple weather observation example illustrating complex column mappings (ACTION-11) from Gregg Kellogg on 2014-04-03 (public-csv-wg@w3.org from April 2014)

From: Gregg Kellogg <gregg@greggkellogg.net>
Date: Thu, 3 Apr 2014 10:22:46 -0700
To: Andy Seaborne <andy@apache.org>
Cc: "Tandy, Jeremy" <jeremy.tandy@metoffice.gov.uk>, "public-csv-wg@w3.org" <public-csv-wg@w3.org>
Message-Id: <CA07C327-27BD-4C49-8B39-57F1C5344075@greggkellogg.net>
On Apr 3, 2014, at 6:31 AM, Andy Seaborne <andy@apache.org> wrote:

> On 03/04/14 14:04, Tandy, Jeremy wrote:
>> Some related thoughts about JSON and RDF conversions:
>> 
>> - Conversion to JSON(sans-LD) would work with the mapping frame as
>> defined by Gregg - there would simply be no @context section in the
>> template
> 
> I think this is not a technology question but a perception question.
> 
> 1/ Whether the appearance of @graph is acceptable.  A simple final step to produce some other JSON is possible

Well, of course @graph can be aliased to something else in the context, if that’s objectionable. I’ve also considered that we might want more flexibility in specifying the mapping, so that there could be some surrounding boilerplate (perhaps including more data about columns or the transformation), and each row would then be added as a value of some other property within that node definition. Just thinking out loud here:

{
  “description”: “information about CSV”, …
  “rows”: {
    “@type”: [“@rowTemplate”, “{type column}”],
    …
  }
}

This is, of course, an evolving process, and I’m sure that the final form will bear scant resemblance to my original proposal; it’s already changed in many ways.

> 2/ Whether JSON-LD processing is then a requirement to get JSON(sans-LD).

Probably not, but algorithms such as JSON-LD framing can be useful, as can other custom processing methods.

>> - If you want ntriples, ttl or other RDF encoding one could extend
>> the processing pipeline methodology to convert the JSON-LD to the
>> requisite form as defined in the JSON-LD Processing Algorithms and
>> API
>> <http://www.w3.org/TR/json-ld-api/#rdf-serialization-deserialization-algorithms>
> 
> My local requirement is being about to produce RDF without an JSON-LD stack involved. There's nothing wrong about JSON-LD - but is it to be the only way?  I want it to work on larger-than-RAM data - AKA database loading.  CSV is streamable.

I don’t see this as a problem. As I’ve suggested elsewhere, there is no reason why the CSV-LD processing could process each row at a time to generate triples or quads, as each frame is essentially it’s own JSON-LD document. Moreover, there has been some discussion on streaming JSON-LD as well; but, the general problem with processing a whole JSON-LD document doesn’t seem to be so much the in-memory requirements, but the need to order results during intermediate operations; this turns out to not be necessary for the Expansion/Flatten/toRdf processing steps, which is a big win.

> To parse a JSON object, needs to see the end "}".  All the regular JSON processors, except one, I use will scan the the end for the "}" so reading the whole object, and that's the entire JSON at the outer level.
> 
> The exception is the Jena JSON SPARQL result processor.  It has to be it own JSON parser (c.f XML and SAX) and if has seen the details of the declarations of the result set, it will stream the rows.  These ar etwo object members of the top level JSON object.
> 
> There is nothing in JSON to require the declarations to come before the rows of the results so it has to fall back to reading in the whole result set before producing results for the application.  This is a big deal to some users.
> 
> If we can define the outcome of conversion, up to some level of complexity, in terms of RDF triples, then there can be different tools to get there and then CSV-LD can be one way of doing it.  To define the outcome by the algorithms of CSV/JSON-LD is a barrier to people not wanting that stack.  The algorithms of JSON-LD can be whole document processing.

Again, this can be short-circuited during CSV-LD processing to generate triples as each row is processed, so I don’t see this as a problem. If it is done after a conversion of a CSV with 10M rows, then this would be an issue, but using this to get a JSON representation in and of itself seems unlikely.

> This is something for the WG to decide soon.  I don't want to invest time and effort on spec'ing something that will be rejected on principle.  I realise that the WG as a group may wish to work on a single conversion approach to cover as many UC as possible.  If it is does decide that, then I'll work with whatever that spec proposes.

I’m in the same boat!

Gregg

> 	Andy
> 
> 
>> 
>> Jeremy
>> 
>> 
>> 
>> 
>> -----Original Message----- From: Tandy, Jeremy
>> [mailto:jeremy.tandy@metoffice.gov.uk] Sent: 03 April 2014 13:14 To:
>> Andy Seaborne; public-csv-wg@w3.org Subject: RE: simple weather
>> observation example illustrating complex column mappings (ACTION-11)
>> 
>> Is JSON-LD acceptable in place of a normal JSON encoding? Probably -
>> thanks to the "zero-edit" capability of JSON-LD you can make JSON-LD
>> look identical to JSON(sans-LD) ... even the @context reference can
>> be done in an HTTP header.
>> 
>> I hadn't intended to imply that the conversion was driven by OWL;
>> only that one can supplement these complex cases where you want to
>> annotate _every_ field in a column with the same information (e.g.
>> unit of measurement) by defining local object properties with the
>> necessary axioms.
>> 
>> I like your idea of the processing pipeline ... I'll modify the
>> example on GitHub to incorporate this string-formatting
>> pre-processing step.
>> 
>> I'll mark ACTION-11 as complete too.
>> 
>> Jeremy
>> 
>> -----Original Message----- From: Andy Seaborne
>> [mailto:andy@apache.org] Sent: 03 April 2014 13:00 To:
>> public-csv-wg@w3.org Subject: Re: simple weather observation example
>> illustrating complex column mappings (ACTION-11)
>> 
>> On 02/04/14 19:04, Tandy, Jeremy wrote:
>>> All,
>>> 
>>> (related action #11
>>> <https://www.w3.org/2013/csvw/track/actions/11>)
>>> 
>>> I've created an "Example" directory in the github repo
>>> <https://github.com/w3c/csvw/tree/gh-pages/examples>, within which
>>> I have placed the example requested by AndyS et al in today's
>>> teleconference:
>>> 
>>> simple-weather-observation
>>> <https://github.com/w3c/csvw/blob/gh-pages/examples/simple-weather-obs
>>> 
>>> 
> ervation.md>
>>> 
>>> It provides: - CSV example - RDF encoding (in TTL) - JSON-LD
>>> encoding (assuming my manual conversion is accurate) - CSV-LD
>>> mapping frame (or at least my best guess)
>>> 
>>> In the mapping frame I couldn't figure out how to construct the @id
>>> for the weather observation instances as I wanted to use a
>>> simplified form of the ISO 8601 date-time syntax used in the
>>> "Date-time" column.
>>> 
>>> Would be happy for folks to correct/amend what I've done :)
>>> 
>>> AndyS / Greg - if this meets your need could you close the action?
>>> (I left it in "pending review" state)
>>> 
>>> Jeremy
>>> 
>> 
>> Jeremy - thank you.
>> 
>> It more than meets the action item for the RDF part at least.
>> 
>> A question it raises for the JSON(sans -LD) conversion is whether the
>> JSON-LD form is acceptable.  I have no in-sight there but it is
>> something to test before going along a partcular spec'ing path.
>> 
>> I wasn't thinking that the conversion would necessarily be driven by
>> OWL, leaving that for tools beyond/better than the core spec.  It is
>> nice to be aware of the possibility.
>> 
>> If there are certain common conversions of ISO 8601 syntax for the
>> date-time, we can include those conversion.  This one is drop certain
>> legal URIs chars (":" and "-" timezone other than Z?)
>> 
>> My feeling is that a real-world issue is that datetime strings are
>> all too often not valid in the first place (systematically or not)
>> and so the error handling is important.
>> 
>> As CSV is a text format, string processing can be done before CSV2RDF
>> conversion.  Clean-up is best done at that point a well.
>> 
>> Seems like a processing pipeline model is emerging.
>> 
>> Andy
>> 
>> 
>> 
> 
>
Received on Thursday, 3 April 2014 17:23:16 UTC