- From: Arnaud Le Hors <lehors@us.ibm.com>
- Date: Thu, 25 Apr 2013 09:11:02 -0700
- To: "public-rdf-wg@w3.org" <public-rdf-wg@w3.org>
- Cc: Martin Nally <martin.nally@gmail.com>
- Message-ID: <OF3436AC56.2D55DBAA-ON88257B58.0057FC82-88257B58.0058E678@us.ibm.com>
My colleague Martin Nally sent a response to Markus but because he's not
subscribed to the list his message was put on hold for moderation. I
thought this would only take a day or two but it's not happened so I'm
forwarding his message to avoid any further delay. I suggest you copy him
in any response though.
Thanks.
--
Arnaud Le Hors - Software Standards Architect - IBM Software Group
----- Forwarded by Arnaud Le Hors/Cupertino/IBM on 04/25/2013 09:01 AM
-----
From: Martin Nally <martin.nally@gmail.com>
To: Arnaud Le Hors <lehors@us.ibm.com>
Date: 04/17/2013 12:21 PM
Subject: Fwd: RDF/JSON
Markus,
Thanks for your careful response to my code samples trying to explain why
RDF/JSON has been working better for us than JSON-LD. Let me first give
some background. The applications we have been writing have logic-tier
servers written in python (and some ruby) with user-interfaces written in
javascript running in the browser on PCs and mobile devices. The JSON we
are discussing is the JSON that flows between the python/ruby and the
javascript. Both sides are RDF-aware, and many of our resources have
multiple subjects. The JSON forms the primary API of the servers, although
you can also ask for the same data in turtle or rdfa (we use the rdfa
format for some use-cases in our applications, but we are not currently
using turtle). We do not have a very long or broad experience - we are a
prototyping team, not (yet) a product team, and we have built up no more
than a few of thousand lines of code in these applications.
Our JSON was originally in JSON-LD format, in the sense that the JSON we
produced and consumed was valid JSON-LD, but we only produced and consumed
a very restricted subset of JSON-LD. For example, we did not support
contexts. What we used from JSON-LD was the basic organization that can be
perhaps be summarized like this: [{'@id': S, P: O}]. I hope that is clear
- it is an array of "dictionaries" where one dictionary entry is '@id' for
the subject, and the other dictionary entries provide the predicates and
objects. The O is itself an array of dictionaries, with the only valid
keys in the dictionaries being '@id', '@value' and '@type'.
We spent a few months building code this way, and it worked OK. We also
stored this format in MongoDB and queried on it successfully. I cannot
remember exactly what triggered the decision, but about 2 months ago, I
decided to convert over to RDF/JSON format. Since RDF/JSON has almost no
options - in contrast with JSON-LD, which has many - you don't need me to
explain what we did much further, but for completeness I will say simply
that we moved to this data organization: {S: { P: O}}. The O in RDF/JSON
is very similar to the JSON-LD version of O we used before - it differs
only in the detail.
My experience with the port is that the complexity of our code came down
substantially. I would guess the code that manipulates RDF is now in
general maybe only two thirds or three-quarters of what it had previously
been. Even where the complexity is similar, we now have the advantage of
using simpler language constructs (primarily dictionary access like a[b]),
where before we used a helper method. Not all our helper methods are gone
- for example we still have a helper method whose meaning is "for a given
predicate and object, return all the subjects". It is possible that the
simplification we experienced was particular to our data. I think you are
correct in saying that if our representations were all single-subject,
then we would not have found RDF/JSON to be better - the JSON-LD data
organization would have been as good or better. Other than the fact that
our resources often have multiple subjects, I doubt there is anything very
special in what we are doing. I had hoped that my code samples might help
explain why our code got simpler, but I can see now that probably won't
work.
Interestingly, RDF/JSON turned out not to be helpful at all in JSON
databases. Databases like to query on predicates, and JSON-LD's approach
of making the subject be the value of the '@id' predicate has worked much
better for us, so our database format is still a very restricted but valid
JSON-LD format. (We have also used triples stores - that is another
conversation.) We have gone back and forth on whether we prefer JSON-LD's
format for the 'O' part or RDF/JSON's version. We have seen advantages to
both. When we put our JSON into Lucene, JSON-LD's version of the O format
was more convenient, because it allowed us to easily tell Lucene to handle
URLs differently from regular strings. When we put the data into MongoDB,
we did not need this, and RDF/JSON's version of the O part looked better
because it did not require query writers to remember to use '@ID' for
URL-valued properties and '@value' for everything else - you just always
use 'value'. This is particularly helpful if you don't know what type you
are querying on. Right now, we are using both 'O' formats, one for MongoDB
and the other for Lucene.
In addition to the code simplification we saw when we converted to
RDF/JSON for the API, we also saw a reduction in specification complexity
that may ultimately be more important. Remember that our JSON is our API
and so we need to document it. One approach is to document it ourselves
without reference to any external specification whatever. In that case,
documenting [{'@id': S, P: O}] is only slightly more complex than
documenting {S: { P: O}}. However, we would really rather not document
this ourselves - we would rather point to a specification. For {S: { P:
O}} we can easily point to the RDF/JSON spec - the whole spec almost fits
on a page, and since RDF/JSON really has no options, there is little to
say about our usage patterns of it. By contrast, JSON-LD is a complex
specification with many options. If we were to reference the JSON-LD spec
in our API documentation, not only would we be referencing a relatively
large and complex spec, we would then have to add yet more information to
document the very restricted subset we actually use. We get no value from
the parts of JSON-LD we do not use and we have no interest in allowing
clients to give us arbitrary JSON-LD, or in producing it, either now or in
the future. Referencing the JSON-LD spec in our API documentation is not
an attractive option.
I hope this helps explain our usage of RDF/JSON and JSON-LD and our
experiences with them.
Best wishes, Martin
Received on Thursday, 25 April 2013 16:11:39 UTC