Re: Thoughts on framing, normalization, CURIEs from Ivan Herman on 2011-08-31 (public-linked-json@w3.org from August 2011)

From: Ivan Herman <ivan@w3.org>
Date: Wed, 31 Aug 2011 12:44:47 +0200
To: Linked JSON <public-linked-json@w3.org>
Cc: Manu Sporny <msporny@digitalbazaar.com>
Message-Id: <2D5935DC-92A1-480A-9102-905EB131EC06@w3.org>
So... I am one of those "RDF People", I cannot deny that:-) Nevertheless, here are my somewhat random thoughts (now that, I believe, I understand what the various features below are meant to do...).

1. This has been said before ad nauseam: the, or maybe I should say "a", JSON-LD document should be simple for readers if we want to get it accepted. The spec on the framing and normalization algorithms is very complicated (I readily acknowledge that I did not go and check the details of the algorithm). If the impression is that "Linked Data = JSON-LD = complex algorithms", then we are shooting in our own foot. Those things, if retained, should be moved into a separate document. I know, this is an editorial issue, but an important one...

2. Re CURIE: I think it is overcomplicated in the current document, and can be reduced to a much simpler microsyntax, ie, simple string concatenation. (I wish it was always that simple in other places, but the original CURIE work had to operate in an XML environment. This is not the case here.) We can downplay it, let people use it that way and forget about it.

3. Re Framing: I think I understand what is to be achieved. But my question remains: what is the number of users who would really really use this? No, it is not the "Hardly anyone would use this in RDF world, so let's cut it." reaction, I genuinely try to understand why it is so necessary that a _recommendation_ has to go through this. Indeed

   - if a user uses JSON-LD without caring about RDF, that user would, most probably, use a format that is close to the framed version anyway. Ie, one {} or an array thereof on the top, essentially describing a forest or a single tree as a main structure with some extra links among the different branches
   - with RDF data... I do not have exact statistics here but most of the serialized RDF graphs/files out there are pretty much forests, too. That is at least the overwhelming majority in my feeling. That means various RDF environments (eg, the Turtle serializer of RDFLib) produces something that is pretty close to a frame already. If they adopt JSON-LD, that would be the same.

In view of that: is it really something that ought to be _standardized_? Is there an interoperability issue related to frames that requires a standard? Or, instead, is it something that smart implementations can do with any algorithm they see fit for their own purposes? I could imagine working out the algorithm and put it out as a separate WG note, for example, to help implementers if they need, but I do not see the value of a _standard algorithm_. After all, standard is all about interoperability...

4. Re the RDF world and RDF/JSON: I have the impression that taking the initial part of normalization it would be very easy to define a transformation of a JSON-LD into some sort of a N-triple format (for the lack of a better name, let me refer to this as J-triple). All contexts are expanded, all datatypes and @iri-s expanded, an array of subject-key-value structures, etc. It is fairly obvious, and it is, in fact, the first step of the normalization algorithm, if I am not mistaken. Just as N-triples have proven to be very useful in the RDF world (unexpectedly so, it was not meant to be a separate serialization...) to provide a very simple format to dump and exchange data, J-triple might be useful on its own right (at least for RDF people...). I think having that spelled out, with an algorithm that would be much simpler, may be worthwhile. 

5. Re Normalization: clearly this is an issue of interoperability. Also, it is clearly needed for specific applications that require, eg, signature. Not reducing the importance of this, I am not sure where else would a full normalization be useful, though, ie, whether there is an urgency to standardize it right away rather than leaving it for a second phase.

However... by, on the surface, separating from the RDF world I am a bit afraid that JSON-LD would try to reinvent the wheel, or trying to copy-paste-reformulate algorithms. I presume (I seem to recognize elements of it) the paper of Jeremy Carroll & al is taken into account here as a starting point and I note that the RDF WG has raised the issue lately of possibly standardizing a normalization format (issue raised by Jeremy, b.t.w., so it may simply take his algorithm, finalizing it and that is it). Is it really a good idea for the JSON-LD group to go its own way here? The only difference I can see is that the starting position would be J-triples instead of N-triples and that the text in the JSON-LD document avoids the usage of the term 'blank node' for something different that behaves exactly the same way as far as the algorithm is concerned... Otherwise the issues are exactly the same! Better work with those guys, have a common approach for n-triples and j-triples, and that is it...

Ok, shoot:-)

Ivan

On Aug 31, 2011, at 03:46 , Manu Sporny wrote:

> A conversation that Dave Longley, Gregg Kellogg and I just had on the JSON-LD IRC channel. Covers some things we've been discussing on here.
> 
> [21:11] <manu> I definitely don't like this "no framing", "no normalization" direction...
> [21:11] <manu> or splitting the spec into basic and advanced functionality conformance levels.
> [21:11] <manu> I'm also very concerned about the removal of CURIEs...
> [21:11] <manu> framing is something that you hit immediately when attempting to work with this data...
> [21:12] <gkellogg> I'm not happy about the push towards removing CURIEs, but I think we should look for ways to simplify the spec.
> [21:12] <dlongley> the same graph can be represented in many different ways in JSON-LD
> [21:12] <manu> Not dealing with normalization is a mistake that RDF serializations have been making for years...
> [21:12] <dlongley> the incoming RDF/JSON people might be unaware of this issue
> [21:12] <dlongley> or unaware of how JSON people want to work with the data
> [21:12] <dlongley> (as structures they form and work with naturally in JSON)
> [21:12] <dlongley> as opposed to graph APIs/triples/whatever/etc.
> [21:13] <manu> I agree that we should try and simplify the spec - but that's always true :) - Not many people want to try to make it more complicated :P
> [21:13] <dlongley> if you don't know what the data looks like, you can't work with it naturally in JSON.
> [21:13] <manu> yes, exactly.
> [21:13] <gkellogg> I do worry about TL;DR with all the stuff that's in there.
> [21:13] <dlongley> there are two ways to make the data look a particular way
> [21:13] <dlongley> 1. normalization
> [21:13] <dlongley> 2. framing
> [21:13] <manu> I don't think the people that are asking for framing to be removed understand why it is there in the first place.
> [21:14] <manu> gkellogg: Well, keep in mind that this is a spec for /implementers/ - an introduction and lessons should be elsewhere.
> [21:15] <gkellogg> It's really pretty similar to the Haml support I added to my RDFa writer; it's necessary to get something out that jQuery/CSS can work against. Framing's the equivalent for JSON-LD, but we never dealt with serialization issues in RDFa.
> [21:15] <manu> That is, I don't think it's bad for specs to be very thorough in explaining concepts and how stuff is intended to work.
> [21:15] <dlongley> we don't want to fall into the same trap we did a little while back
> [21:15] <dlongley> with people confusing publishing JSON-LD documents and understanding how to do that easily
> [21:15] <dlongley> with having to read the spec to write processor implementations
> [21:15] <manu> right.
> [21:16] <dlongley> this sounds like a repeat of the failed "JSON-LD Basic" spec vs. "JSON-LD Advanced" is what i'm saying.
> [21:16] <gkellogg> Is your worry about separating Framing & Normalization into separate specs that they just won't be implemented?
> [21:16] <manu> JSON-LD is /easy to publish/ - just slap a @context at the top of your object and you're good.
> [21:17] <manu> JSON-LD is /difficult to implement/ - the normalization algorithm, especially... but yes, gkellogg - if it is moved into another spec, it will be confusing how the two are related and it will more than likely not be implemented, imo.
> [21:17] <manu> I think we can have two conformance levels
> [21:17] <manu> 1) The API, 2) conversion to RDF.
> [21:17] <manu> People can implement #1 or #2 or both.
> [21:18] <manu> I don't think we should cut conformance across the API, though - compact/expand in basic and frame/normalize/triples in advanced.
> [21:18] <gkellogg> 90% of the people are going to read the spec to know how to publish. Those details aren't important.
> [21:18] <dlongley> then wouldn't they just skip the details they weren't interested in?
> [21:18] <manu> I disagree - I don't think anybody but really early adopters are going to read the spec.
> [21:18] <gkellogg> HTML5 also considered (and maybe does) generating different specs for publishers and processors
> [21:18] <dlongley> just stop reading when you get far enough?
> [21:18] <manu> that is - and we made this mistake with RDFa - there need to be clear examples and tutorials on the JSON-LD website.
> [21:19] <manu> gkellogg: HTML5 does have a "reference manual" and "the spec"
> [21:19] <manu> but I think we need something simpler for JSON-LD.
> [21:19] <dlongley> it seems to me like we're worried about people reading the spec that likely won't ever read it or need to
> [21:19] <manu> tutorials just don't fit nicely into the W3C spec model... they're unwieldy.
> [21:19] <dlongley> i don't know why we're worried about that group of people.
> [21:20] <manu> I think we should be worried about that group of people - and publish tutorials and examples for them on the JSON-LD website.
> [21:20] <manu> Kinda like: "Learn JSON-LD in Five easy steps"
> [21:20] <dlongley> let me clarify..
> [21:20] <dlongley> i don't think we should be worried about those people in how we write the spec.
> [21:20] <dlongley> they won't be reading it anyway.
> [21:20] <manu> right
> [21:21] <gkellogg> Well, it seems like we have an opportunity to get the RDF WG to adopt JSON-LD as a REC. To what degree do we feel the need to cater to pure RDF concerns?
> [21:21] <dlongley> it seems like we just need to produce something that lets them represent RDF in JSON
> [21:21] <gkellogg> Or, are we better staying away from the "smell" of RDF?
> [21:21] <gkellogg> cygri would seem to prefer RDF/JSON, which is probably better for his use case
> [21:22] <dlongley> if they can represent everything they need to ... then it seems like we will have done a lot to provide for them
> [21:22] <manu> I think there is plenty that we can do to work with the RDF WG to ensure that their use cases are supported.
> [21:22] <manu> but what we need to make sure we do is not allow pre-conceived notions of what RDF/JSON syntaxes should look like affect JSON-LD negatively.
> [21:22] <dlongley> but just because RDF people won't use framing/ever need to understand what it is, i don't think that that should be a reason to cut it from the spec ...
> [21:23] <gkellogg> spec complexity can be a barrier, and I can see that there could be a big kerfluffle over normalization.
> [21:23] <manu> The concern that I have w/ Richards push for RDF/JSON is that it is a fairly easy transform from normalized to RDF/JSON form.
> [21:23] <dlongley> i'm worried about comments that suggest this line of thinking: "Hardly anyone would use this in RDF world, so let's cut it."
> [21:23] <gkellogg> I tend to agree with you about framing, and I don't find it too complex an issue.
> [21:24] <manu> Right, the goal here isn't to support every RDF use case
> [21:24] <gkellogg> dlongley: absolutely
> [21:24] <dlongley> and *maybe*, if it makes sense to do so, we could just change normalized form to use a map instead of an array as an output.
> [21:24] <manu> (even though I think we do)
> [21:24] <dlongley> the issue with that is that it no longer looks like the rest of JSON-LD
> [21:24] <manu> yes, but by doing that - we totally screw with the RDF conversion algorithm.
> [21:24] <dlongley> and it's so darn easy to generate.
> [21:24] <manu> (in a non-positive way)
> [21:25] <dlongley> i guess what i'm saying is this: some RDF folks have joined in on the discussion now ... and I think that's a good thing
> [21:25] <dlongley> and we should work to try and cover their use cases.
> [21:25] <dlongley> however, i don't think it's a good thing to have them starting to request cutting features if they aren't coming from the JSON world.
> [21:26] <manu> especially if they haven't built a system using JSON-LD yet.
> [21:26] <gkellogg> The requests to cut CURIEs come from many places, I think we should call them something else.
> [21:26] <manu> or grok why we have those features in there in the first place.
> [21:27] <dlongley> gkellogg: like "prefixes"
> [21:27] * manu wonders if we could change CURIE syntax to use "+" -> foaf+name
> [21:27] <gkellogg> We should invite some RDF folks to the telecon where we can discuss
> [21:27] <manu> gkellogg: good idea
> [21:27] * gkellogg interesting
> [21:28] <manu> The only reason we'd do that is to placate people that have a strong emotional reaction to CURIEs.
> [21:28] <dlongley> that sort of thing always bothers me ...
> [21:29] <dlongley> (changing things in no real substantive way)
> [21:29] <manu> I mean, we've considered dropping CURIEs many times at Digital Bazaar... and every time we come to the same conclusion: "The number of terms that we're going to be using in these digital contracts is going to explode when 3rd parties start extending the digital contracts to suit their needs."
> [21:29] <gkellogg> Really, in JSON-LD they are prefixes; the term is used as a prefix for expansion using a ":" separator. It falls out of the term stuff pretty easily.
> [21:30] <gkellogg> Maybe need to show an example @context exploded based on actual vocabulary use.
> [21:30] <manu> "keys in @context can be used as terms or prefixes."
> [21:30] <manu> I'd be fine with that...
> [21:30] <dlongley> totally fine with that here.
> [21:30] <gkellogg> We could do an @context example from the schema.org vocab
> [21:30] <gkellogg> +1
> [21:31] <manu> alright, well - you guys ok with me copy/pasting this conversation to the mailing list for further discussion there?
> [21:31] <gkellogg> Well, it is a public forum, so why not?
> [21:32] <dlongley> sure
> 
> -- manu
> 
> -- 
> Manu Sporny (skype: msporny, twitter: manusporny)
> Founder/CEO - Digital Bazaar, Inc.
> blog: Building a Better World with Web Payments
> http://manu.sporny.org/2011/better-world/
> 


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Wednesday, 31 August 2011 10:45:19 UTC