Re: RDF/JSON from Martin Nally on 2013-04-29 (public-rdf-comments@w3.org from April 2013)

From: Martin Nally <martin.nally@gmail.com>
Date: Mon, 29 Apr 2013 11:28:36 -0400
To: Markus Lanthaler <markus.lanthaler@gmx.net>, public-rdf-comments@w3.org
Message-ID: <CAHJqhR6ot6vQHnOPwqCFWD+x4kMKs12qGiOq6tcuya6Uf6sNTA@mail.gmail.com>
I apologize if this is a duplicate. Gmail has recently introduced a new UI
for sending and replying that I'm still learning and I'm not sure if the
previous response went to the  mailing list or only to Markus


On Sun, Apr 28, 2013 at 10:13 PM, Martin Nally <martin.nally@gmail.com>wrote:

> >> What you mean here is that many representations contain multiple
> subjects (which are also resources) right?
>
> Yes, correct.
>
>
> >> Which is almost the same as flattened JSON-LD [2] except that it isn't
> indexed by S.
>
> Yes, but being indexed by S is the key characteristic that made
> programming to the data simpler. [What we were actually using was something
> like "framed-exapanded", because MongoDB will not store expanded - it
> breaks the outer array into multiple documents.]
>
>
> >> Quite frankly, this surprises me and makes me wonder why you use RDF at
> all? By defining this, you tightly couple the clients to the server which
> normally you try to prevent by all means. Given that you use RDF, a client
> shouldn't depend on the structure of your JSON data but on its semantics,
> on the raw data. I understand why you wanna do it (it is just more
> convenient to hard-code a client against a rigid structure) but it's really
> a bad practice. In that case you could just as well use a highly optimized
> proprietary JSON format (as most APIs do these days).
>
> In my opinion, the underlying RDF data model has helped us significantly.
> Using RDF has allowed us to implement several standard API patterns - e.g.
> collections and "data transfer objects" - in a way that is much simpler,
> more flexible and more regular than other approaches we've tried in the
> past. As far as format is concerned, we have simply found RDF/JSON to be
> the most useful one to code to. The important characteristic is the initial
> indexing by "S", which we have found the most useful for our data. I'm not
> sure why you think this is "bad practice" or "rigid structure" - perhaps
> you can elaborate? The way our python servers actually work makes it simple
> for us to accept and produce different RDF formats. The business logic in
> all our servers always consumes and produces python data structures whose
> organization is {S : {P : O}} (error results are currently an exception to
> this rule). If the client asks for HTML, we serialize the response as
> HTML/RDFa at the last moment before putting it on the wire, and if the
> client asks for JSON or RDF/JSON, we serialize the response by just dumping
> the data structures as JSON. It would only take me an hour or two to make
> all our servers optionally return some form of JSON-LD, but the only
> clients we have currently are ones we wrote ourselves, and they find
> RDF/JSON more convenient to consume. We will be happy to consider providing
> JSON-LD when we have a client that wants it. When that happens, it will be
> interesting to see if they will accept any JSON-LD - presumably because
> they have implemented the whole JSON-LD spec - or whether they will ask us
> for one particular JSON-LD format. Similarly, if we accept JSON-LD on
> input, we will have to decide ourselves what variants in the complete
> JSON-LD spec we need to implement. If we have to accept the whole JSON-LD
> spec on input, that will be rather costly unless we find a
> commercial-quality library that implements it for us (none identified yet).
> These questions don't really arise with RDF/JSON because of its simplicity,
> and in fact one of the attractive characteristics of working with RDF/JSON
> is that we have found that we really don't need any RDF programming
> libraries at all. What we currently use for both the python servers and the
> JavaScript clients are the standard JSON libraries, plus slightly less than
> 200 lines of simple helper methods that we wrote ourselves.
>
> >> Does the statement "we would rather point to a specification" mean that
> you wanna propose to put RDF/JSON on the REC track?
>
> Unfortunately, I don't know enough about W3C process to give you an
> intelligent answer on that. I'll leave it to others.
>
> >> The important thing is that we have a format which is able to address
> all use cases.
>
> You and I will probably have to agree to disagree on this one. I have
> nothing against JSON-LD - I hope it succeeds and I hope it helps RDF gain a
> little more traction with some part of the web crowd. I'm pessimistic about
> this, but I'm hoping to be proven wrong. However, I simply can't imagine us
> ever converting back to JSON-LD for our uses. Even before we converted to
> RDF/JSON, JSON-LD only looked interesting for our purposes when we threw
> out all but one option and focused on "expanded" (or "framed-expanded")
> form. I am very skeptical of the argument that tools will make it all good.
> My view is that JSON became popular in the first place exactly because it
> was a natural serialization of the programming language data structures
> people were already using, especially in JavaScript, but also in other
> programming languages. What we have seen in our own work is that you really
> don't need programming libraries to deal with RDF in JavaScript and Python.
> If you simply put the data in standard dictionaries [JS objects] and arrays
> in a {S : {P : O}} organization, and serialize it with JSON, things all
> work out fine. With the programmers that I work with, it has been much
> easier to get them to understand and to see the value in RDF when it is
> presented in this very simple way. For one thing, they can easily compare
> the RDF structure with the structure they would have used to solve the same
> problem if left to their own devices. If I had had to try to get them to
> adopt an RDF programming library and persuade them of the merits of
> JSON-LD, I'm afraid I would have had a much harder time. It would be an
> interesting social science experiment to take a group of JavaScript
> programmers, teach half of them a Javascript RDF programming library plus
> JSON-LD and teach the other half to put RDF directly into JavaScript
> objects and arrays in a {S : {P : O}} organization, and see which group
> does better. Is it possible that the approach of focusing on native data
> structures and their standard JSON serialization would get more traction
> with JavaScript programmers than the alternative? Do we have objective
> evidence of the appeal of JSON-LD with the web crowd? Maybe the W3C should
> do a scientific experiment or two?
>
>
> On Sat, Apr 27, 2013 at 12:03 PM, Markus Lanthaler <
> markus.lanthaler@gmx.net> wrote:
>
>> On 04/17/2013 12:21 PM, Martin Nally wrote:
>> > Markus,
>> >
>> > Thanks for your careful response to my code samples trying to explain
>> > why RDF/JSON has been working better for us than JSON-LD. Let me first
>> > give some background. The applications we have been writing have
>> > logic-tier servers written in python (and some ruby) with user-
>> > interfaces written in javascript running in the browser on PCs and
>> > mobile devices. The JSON we are discussing is the JSON that flows
>> > between the python/ruby and the javascript. Both sides are RDF-aware,
>> > and many of our resources have multiple subjects.
>>
>> What you mean here is that many representations contain multiple subjects
>> (which are also resources) right?
>>
>>
>> [...]
>> > Our JSON was originally in JSON-LD format, in the sense that the JSON
>> > we produced and consumed was valid JSON-LD, but we only produced and
>> > consumed a very restricted subset of JSON-LD. For example, we did not
>> > support contexts. What we used from JSON-LD was the basic organization
>> > that can be perhaps be summarized like this: [{'@id': S, P: O}]. I
>> > hope that is clear - it is an array of "dictionaries" where one
>> > dictionary entry is '@id' for the subject, and the other dictionary
>> > entries provide the predicates and objects. The O is itself an array
>> > of dictionaries, with the only valid keys in the dictionaries being
>> > '@id', '@value' and '@type'.
>>
>> In other words, you just used expanded JSON-LD [1].
>>
>>
>> > We spent a few months building code this way, and it worked OK. We
>> > also stored this format in MongoDB and queried on it successfully. I
>> > cannot remember exactly what triggered the decision, but about 2
>> > months ago, I decided to convert over to RDF/JSON format. Since
>> > RDF/JSON has almost no options - in contrast with JSON-LD, which has
>> > many - you don't need me to explain what we did much further, but for
>> > completeness I will say simply that we moved to this data
>> > organization: {S: { P: O}}. The O in RDF/JSON is very similar to the
>> > JSON-LD version of O we used before - it differs only in the detail.
>>
>> Which is almost the same as flattened JSON-LD [2] except that it isn't
>> indexed by S.
>> As I wrote in my previous mail it would be trivial to use data indexing to
>> get almost the same structure in most cases.
>>
>>
>> > My experience with the port is that the complexity of our code came
>> > down substantially. I would guess the code that manipulates RDF is now
>> > in general maybe only two thirds or three-quarters of what it had
>> > previously been. Even where the complexity is similar, we now have the
>> > advantage of using simpler language constructs (primarily dictionary
>> > access like a[b]), where before we used a helper method. Not all our
>> > helper methods are gone - for example we still have a helper method
>> > whose meaning is "for a given predicate and object, return all the
>> > subjects". It is possible that the simplification we experienced was
>> > particular to our data.
>>
>> Is that a result of the data structure being indexed by S?
>>
>> I can't really see any other substantial difference except the case when
>> you
>> don't care whether a value is a literal or an IRI (as you mention further
>> down in your mail).
>>
>>
>> > I think you are correct in saying that if our
>> > representations were all single-subject, then we would not have found
>> > RDF/JSON to be better - the JSON-LD data organization would have been
>> > as good or better.
>>
>> I didn't say that. I was saying that if you have a single subject as the
>> top-most node in your document you will be able to get almost the same
>> structure in JSON-LD using data indexing:
>>
>> {
>>   "@id": "top most node",
>>   "property-with-an-@index-container": {
>>     "S1": { "@id": S1, P: O },
>>     ...
>>     "Sn": { "@id": Sn, P: O }
>>   }
>> }
>>
>>
>> > Other than the fact that our resources often have
>> > multiple subjects, I doubt there is anything very special in what we
>> > are doing. I had hoped that my code samples might help explain why our
>> > code got simpler, but I can see now that probably won't work.
>> >
>> > Interestingly, RDF/JSON turned out not to be helpful at all in JSON
>> > databases. Databases like to query on predicates, and JSON-LD's
>> > approach of making the subject be the value of the '@id' predicate has
>> > worked much better for us, so our database format is still a very
>> > restricted but valid JSON-LD format. (We have also used triples stores
>> > - that is another conversation.)
>>
>> Yeah.. *querying* also becomes simpler in JavaScript. The downside is that
>> you have to query unless you use data indexing.
>>
>>
>> > We have gone back and forth on
>> > whether we prefer JSON-LD's format for the 'O' part or RDF/JSON's
>> > version. We have seen advantages to both. When we put our JSON into
>> > Lucene, JSON-LD's version of the O format was more convenient, because
>> > it allowed us to easily tell Lucene to handle URLs differently from
>> > regular strings. When we put the data into MongoDB, we did not need
>> > this, and RDF/JSON's version of the O part looked better because it
>> > did not require query writers to remember to use '@ID' for URL-valued
>> > properties and '@value' for everything else - you just always use
>> > 'value'. This is particularly helpful if you don't know what type you
>> > are querying on. Right now, we are using both 'O' formats, one for
>> > MongoDB and the other for Lucene.
>>
>> You can get around this by using a context and type-coercing the
>> properties.. but if you don't care it is still simple enough to provide a
>> simple helper method in MongoDB which just looks for both keys (there will
>> always just be one of them).
>>
>>
>> > In addition to the code simplification we saw when we converted to
>> > RDF/JSON for the API, we also saw a reduction in specification
>> > complexity that may ultimately be more important. Remember that our
>> > JSON is our API and so we need to document it. One approach is to
>> > document it ourselves without reference to any external specification
>> > whatever. In that case, documenting [{'@id': S, P: O}] is only
>> > slightly more complex than documenting {S: { P: O}}. However, we would
>> > really rather not document this ourselves - we would rather point to a
>> > specification. For {S: { P: O}} we can easily point to the RDF/JSON
>> > spec - the whole spec almost fits on a page, and since RDF/JSON really
>> > has no options, there is little to say about our usage patterns of it.
>> > By contrast, JSON-LD is a complex specification with many options. If
>> > we were to reference the JSON-LD spec in our API documentation, not
>> > only would we be referencing a relatively large and complex spec, we
>> > would then have to add yet more information to document the very
>> > restricted subset we actually use. We get no value from the parts of
>> > JSON-LD we do not use and we have no interest in allowing clients to
>> > give us arbitrary JSON-LD, or in producing it, either now or in the
>> > future. Referencing the JSON-LD spec in our API documentation is not
>> > an attractive option.
>>
>> Quite frankly, this surprises me and makes me wonder why you use RDF at
>> all?
>> By defining this, you tightly couple the clients to the server which
>> normally you try to prevent by all means. Given that you use RDF, a client
>> shouldn't depend on the structure of your JSON data but on its semantics,
>> on
>> the raw data. I understand why you wanna do it (it is just more convenient
>> to hard-code a client against a rigid structure) but it's really a bad
>> practice. In that case you could just as well use a highly optimized
>> proprietary JSON format (as most APIs do these days).
>>
>> I also don't really by your arguments about documenting your "usage
>> patterns". Unless I missed something this is flattened/expanded form.
>> Since
>> JSON-LD's media type has a profile parameter [3] you can even make this
>> visible on the HTTP level.
>>
>> I acknowledge that RDF/JSON is much simpler because it has much fewer
>> features but is that really an advantage in this case?
>>
>> Does the statement "we would rather point to a specification" mean that
>> you
>> wanna propose to put RDF/JSON on the REC track?
>>
>>
>> > I hope this helps explain our usage of RDF/JSON and JSON-LD and our
>> > experiences with them.
>>
>> Yes, it definitely helped me to understand some of the challenges you have
>> to deal with. But nevertheless, it didn't convince me that there's need
>> for
>> a second format. I would much rather hope that all RDF/Linked Data in JSON
>> converges to JSON-LD. It has a higher initial cost but having a single
>> format for which tools and libraries are being developed will pay of in
>> the
>> long term.
>>
>> None of the things you described is a fundamental problem IMO. I certainly
>> don't wanna belittle the challenges you had to deal with and also
>> understand
>> that for certain, very specific use cases RDF/JSON is a better fit. You
>> will
>> find a slightly better solution for almost every use case that is specific
>> enough. The important thing is that we have a format which is able to
>> address all use cases. It is also critical that it feels "native" for Web
>> developers - RDF/JSON certainly does not.
>>
>> Having a single format obviously means that it can't be the most efficient
>> for all scenarios. I'm convinced that tooling will be build to address
>> those
>> "shortcomings" in the future. Framing [4] is already a first step in that
>> direction. Let's try to not divide the community by having to competing
>> standards.. we are still developers.. so having to write a few lines of
>> code
>> more in certain scenarios isn't that much of a problem. I'm sure libraries
>> will emerge that do that for you very soon.
>>
>>
>> Cheers,
>> Markus
>>
>>
>> [1] http://www.w3.org/TR/json-ld/#expanded-document-form
>> [2] http://www.w3.org/TR/json-ld/#flattened-document-form
>> [3] http://www.w3.org/TR/json-ld/#iana-considerations
>>
>>
>>
>> --
>> Markus Lanthaler
>> @markuslanthaler
>>
>>
>
Received on Monday, 29 April 2013 15:58:46 UTC