- From: Markus Lanthaler <markus.lanthaler@gmx.net>
- Date: Fri, 4 Jan 2013 16:35:21 +0100
- To: <public-linked-json@w3.org>
On Friday, January 04, 2013 1:24 AM, Gregg Kellogg wrote: > I've spent some time going over the updated algorithms and updating my > parser. I have to say, that I'm distressed at how much the core > algorithms have changed. In some cases (e.g., Inverse Context) it > seems that the motivation was for performance, at the expense of being > able to walk through the algorithms; I really think this is the wrong > idea, and pretty much requires that an implementation exactly > implement the algorithms to have a chance at conforming. This is also > quite de-stabilizing, as we are trying to get to LC soon. IMO, the > purpose of algorithm definitions in W3C specs is to allow them to be > unambiguously understood by an implementor as to what the intent of > the algorithms is, rather than the specific way in which they must be > implemented. I assume you are mostly talking about compaction here because the other algorithms haven't been changed much (apart from the re-labeling of blank nodes in expansion). We all agreed that compaction was already too complex before the introduction of property generators and the new container types. I looked at various ways to simplify it and the inverse context approach was by far the simplest and most performant way to implement it. It was also the only scalable solution I could come up with. I do agree that the pure algorithms may not be enough to easily understand what's going on - we should add some more high-level prose; I just haven't had the time yet. So let me try to explain it in this email and see if it is understandable if explained at a higher level. The inverse context allows to more or less directly access the best-matching term for a IRI-value pair (were the value is optional). All terms are put in a multi-dimensional array (or nested object structure if you prefer) whose structure looks as follow: IRI Container Type/Lang? Value PropGen? Term(s) --------------------------------------------------------------- @null @set @null @null term IRI @list @type type IRI propGens term(s) @annotation @language lang. tag @language term term (shortest term to use for IRIs without value) What's worth noting is that the inverse context is sorted, e.g., by IRI (longest first). So, if you need to create a compact IRI, you simply loop over the IRIs, at the first match you take invCtx[IRI][term] and you are done. If you need to compact a IRI-value pair, e.g., IRI: { "@value": "hi", "@language": "en" }, you first look at invCtx[IRI][@language]... to see if there's a language container, then at invCtx[IRI][@set][@language][en][propGens] as property generators and sets are preferred. If there are no potential property generators, you look at invCtx[IRI][@set][@language][en][term], then you try it at invCtx[IRI][@null][@language]... (@null stands for no container) etc. If you have the inverse context in front of you it is trivial to see which term is chosen. You just need to know the fallback rules, e.g., for containers: @language -> @set -> @null. If there weren't property generators the algorithms would be a whole lot simpler as you wouldn't need to collect candidates. > Looking at the record [1], the group decided to adopt the algorithm I > had specified (with some modifications) as an update to the current > compaction algorithm, however the new compaction algorithm bears > little resemblance to the old algorithm. If you look more closely at the algorithms you will see that the algorithms are doing exactly that: - when expanding unlabeled nodes are automatically labeled with blank node identifiers (this is also the reason why we need to re-label all blank nodes in expansion; as decided in [1]) - when compacting, all potential property generators are evaluated. The algorithms check if duplicates can be found for *all* IRIs using deep comparison, if so, they are eliminated and the property generator term is used. > Looking at IRI compaction, the sense of the group definitely seemed to > be to stick as close as we can to the current algorithms [2]. However, > the IRI compaction algorithm is also substantially different than > before. That was just a proposal without a resolution. I announced in the 2012-12-11 telecon [2] (not transcribed, 1:27:00 onwards in the audio file) that I will introduce the notion of a inverse context if I'm going to update the algorithms. > My current implementation sticks closely to the original algorithms, > with relatively minor updates to Term Rank, Expansion and Compaction > to support language maps, annotations and property generators. I > haven't completed my property generator compaction implementation, but > I'm confident that the modification I specified relative to the > previous algorithm can work (with some minor tweaks, no doubt). Do you pass the whole test suite? There were a lot of minor things (mostly corner cases) that weren't handled properly in the old algorithms. I added I number of tests but I'm not sure I captured them all in the tests. > I recommend that we go back to the previous algorithms and look for > the minimal changes that are necessary to specify the desired > behavior. I can certainly do this based on the work I've done to bring > my implementation up-to-date. > > Modifications to those algorithms should try to make the meaning of > them clear when reading, rather than relying on emergent behavior. For > example, knowing what term to choose when compaction should be > something you can determine by walking through the IRI Compaction and > Term Rank algorithms given specific input. And if you have an inverse context you can pick it from a "table". I think we all agree that the previous algorithms weren't simple at all. It's debatable whether the new ones are simpler - but at least I tried hard to simplify them. > Lastly, I'm concerned that the tests depend too much on specifics of > Blank Node naming. I can see why this is desirable, but it again > limits implementations. Well, that's only true if you want to do a simple deep comparison. In that case I can't see how it could be solved in another way. A more sophisticated test suite could ignore the blank node identifiers completely and check for graph equivalence - I doubt that that would be simpler. > One way to do this would be to transform the > resulting JSON-LD into RDF and use well-understood graph isomorphism. Nothing prevents you from doing that. But since our target group are JSON developers that are (at least not directly) interested in RDF I think that would be a step in the wrong direction. > To allow us to continue to use JSON object equivalence as the result > test, we could add a "preserve blank node identifiers" flag to be used > for the bulk of operations that involve expansion. It's only in the > case that blank nodes are renamed that we run into problems, and some > graph isomorphism might be necessary. We discussed such a flag when we discussed the renaming of blank nodes in expansion but decided to not do it. I completely understand your concerns and the fact that you already invested a lot of time in the old algorithms but I think reverting the algorithms at this point is not a productive way forward. Both specs still need quite some editorial edits and also the test suite should be extended with more corner cases. We should invest our time in doing that instead of continuing discussing the algorithms - especially considering that there are no technical reasons to do so. Simplicity/complexity is - at least to a certain degree - always subjective. [1] http://json-ld.org/minutes/2012-12-18/#resolution-1 [2] http://json-ld.org/minutes/2012-12-11/ Cheers, Markus -- Markus Lanthaler @markuslanthaler
Received on Friday, 4 January 2013 15:35:52 UTC