Re: Creating JSON from RDF from Niklas Lindström on 2010-01-14 (public-lod@w3.org from January 2010)

From: Niklas Lindström <lindstream@gmail.com>
Date: Thu, 14 Jan 2010 16:31:43 +0100
To: Jeni Tennison <jeni@jenitennison.com>
Cc: Mike Bergman <mike@mkbergman.com>, Dave Reynolds <dave.e.reynolds@googlemail.com>, public-lod@w3.org, Mark Birbeck <mark.birbeck@webbackplane.com>, John Sheridan <John.Sheridan@nationalarchives.gsi.gov.uk>
Message-ID: <cf8107641001140731l1f677d82o874c07a861b807b@mail.gmail.com>
Hi Jeni!

2010/1/7 Jeni Tennison <jeni@jenitennison.com>:
> Thanks for the pointer! The compact (natural) example that you give at [1]
> is pretty much what I'd like to see as the JSON generated by default by a
> linked data API. Have you used this in anger anywhere?

No and yes. :) That is, I haven't (yet) used this specific format, but
I designed it based on previous practical experience of accessing RDF
via (dynamic) code in various situations (including generating similar
JSON from RDF, but without all of the Gluon features). So I feel
confident it has practical value. I think it is sound to express (and
thus abstract) usage in mappings like these, rather than clutting the
code with access details over and over. That is, if the data is
expected to be fairly uniform (think "perceived vs. real dangers of
dynamic typing").

> What we're trying to do in the linked-data-api project [2] is thrash out not
> only what that JSON looks like but also how it might be generated
> automatically by a server sitting on top of a triplestore. But you say that
> you're working on a profiling mechanism? Perhaps we can join forces to work
> out what that might look like?

Yes, I would love to join forces and contribute to something within a
larger context. I think that this is a very important topic, which may
simplify many various existing practises (especially ones hitting the
complexity barriers when working directly with the full RDF model in
instrumental code contexts). Consensus on syntax and wider adoption is
crucial, so I want these things as standarized as possible.

Gluon has a "full" (raw) form, where special objects represent RDF
data concepts (using "$uri", "$ref", "$value", "$datatype", "@<lang>"
and "$list" as keys) . But the profile mechanism is there to eliminate
potentially all of these, only requiring prefix declarations (to make
the profile itself more compact). The full form still uses
qnames/CURIEs for property names though.

The "compact" form also uses a "linked" map (JSON object) where the
URI:s are used as keys. And it supports giving a global "lang" and
"base" to resolve localized literals and relative URI:s respectively.

(Note that Gluon also always inlines bnode objects, since it
(currently by opinionated design) does not support bnode-ID:s.)

What I did in the profile mechanism of Gluon follows your
linked-data-api ideas. It's originally based on my Oort "RDF to Python
object access" mechanism, where property URI:s are bound to short
names (ideally using the local part of the property URI to promote
familiarity) along with "schema"-like settings such as one/many and
localized or datatype/native. In Gluon, in order to make the JSON very
tight, I added the specific settings for interpreting plain string
values as e.g. URI:s, qnames, values with a given datatype, or a
localized language literal.

All of these features are exemplified in the compact example at [1],
which is the exact Gluon representation of the RDF (in N3) at [2].

I should mention that the python "lab" implementation of Gluon is
fully functional, capable of converting RDF data from and to this
format, taking an optional profile to produce the compact form. (Note
that it requries the currently unreleased RDFLib 2.5, i.e. from trunk
[3].)

So in essence this can work on top of a triplestore, but you need to
limit the "amount of graph" to serialize somehow. ;) It works best
when you store RDF in chunks of "reasonably" (file-sized)
representations, and convert those. In that case you only need to
design a profile for the desired JSON form (covering the expected
properties and classes).

I have thought some about generating profiles from vocabularies
(inferring them from RDFS and OWL statements), and/or by examining the
graph while serializing to "pack it on the fly". I have no code for
that yet though. There are of course alternatives to ponder more about
as well; e.g. the merit of automatically turning:

    "rdfs:seeAlso": {"$ref": "http:..."}

into e.g.:

    "rdfs_seeAlso_ref": "http:..."

(I don't really like that though (the abstraction starts to "bulge").
I'd rather either be fully RDF-aware with a rich object model, or use
a lowering translation "lens", such as Gluon profiles. Still, the
"prefix as suffix" has merit, as seen e.g. in python-based RDF
wrappers such as Sparta [4].)

Furthermore, I think that mapping to JSON is not the only potential
target in scope. Specifically, the design of these
mapping/context/profile features is useful in many rdf-to-object
mapping situations. And as potential input to the ideas of profiles
(short term names) in the current work on RDFa 1.1 (which Mark  knows
much more about). I honestly don't think that this would cost much
more in terms of design than to limit the scope to JSON alone. But
that is certainly up for debate.

I'd like to stress that ironing out this pattern may provide a *huge*
win, since there are *many* mapping mechanisms (in Python, Java, Ruby
etc.) for accessing RDF in programmatic contexts which are slightly
similar, but mostly shares nothing formally (for instance, I've only
seen localized access to language literals in some of these).

I look forward to more collaboration on these matters!

Best regards,
Niklas

[1] = <http://code.google.com/p/oort/source/browse/trunk/lab/gluon/etc/examples/lcsh/sh95000541-compact.json>
[2] = <http://code.google.com/p/oort/source/browse/trunk/lab/gluon/etc/examples/lcsh/sh95000541.n3>
[3] = <http://code.google.com/p/rdflib/source/checkout>
[4] = <http://github.com/mnot/sparta>
Received on Thursday, 14 January 2010 15:32:36 UTC