- From: Markus Lanthaler <markus.lanthaler@gmx.net>
- Date: Mon, 11 Feb 2013 13:53:02 +0100
- To: "'JSON-LD CG'" <public-linked-json@w3.org>
Hi folks, As promised I reviewed the changes in alternate2.html. I was happy to see that overall there were just very subtle differences to the algorithms in index.html. Therefore, some of the comments in the review are sometimes really nitpicks but I think we are at a stage where we should pay attention to those little details. It's quite a long review with a lot of details so let me try to briefly summarize the main points. The expansion part of the algorithms is in a very good shape and I think they are quite understandable by someone not that familiar with JSON-LD. Basically the algorithms are the same with the only real difference that keyword aliases are treated specially by alternate2.html whereas in index.html they are treated just as every other term mapping. Some steps were moved from one algorithm to another, but the result is the same. I'm not convinced that splitting the Context Processing algorithm into two (Context Processing and Create Term Definition Subalgorithm) is really such a good idea - especially considering that the Create Term Definition Subalgorithm is just as long as the complete Context Processing algorithm in index.html. The cost of having two algorithms is that, when you are reading it, you have to jump between the two (probably first reading the intro of the second and forgetting where you've been before in the meantime). The compaction algorithms look to be very similar at the first sight but the devil lies in the details. I've found a number of bugs for which I added tests to the test suite and I saw that Dave already fixed at least one of them in his implementation. Most problems have to do with property generators. They are solvable but I don't know much sense it makes to do so instead of just sticking to the algorithms in index.html and improve their description. They are not so different after all. Alternatively, we could also try to get rid of the Inverse Context altogether as I proposed on the mailing list some time ago, see: http://lists.w3.org/Archives/Public/public-linked-json/2013Jan/0047.html What follows is a detailed review of the changes in alternate2.html. I hope you have a chance to at least skim over it so that we can discuss how to proceed in tomorrow's telecon. Alternatively, I would suggest we discuss this next week and try to resolve the remaining open issues tomorrow. Please note that the algorithm reviews are in a slightly different order than in the spec. The reason is that I reviewed them in the order they are used, i.e., depend on each other. 2. Features =========================== Considering the discussion on the mailing list, I would propose to keep using FOAF in these examples. 2.1 Expansion --------------------------- 1st paragraph: Emphasize removal of context, not switching between contexts Under Example2: "The difference is in their context information" - that's wrong, the contexts are the same. I would thus propose to replace the paragraph with the intro from the corresponding paragraph in index.html 2.3 Flattening --------------------------- I would like to mention that all blank nodes will be labeled. This is an integral part of flattening. It makes the examples slightly more difficult but certainly helps to understand the algorithms later on. 5. Algorithms =========================== 5.2 Remote Context Resolution --------------------------- I really like the idea in general but would change it slightly. Instead of pre-processing the whole document I would call the algorithm as first step whenever a local context is being processed. So it would be a sub-algorithm of context processing. It should also be described more formally and include error handling, mention that other data in the remote context document is ignored, etc. Slightly adapting the steps under Context Processing 2.2 in index.html will do the job. 5.3 Context Processing --------------------------- Steps 3.4.4 and 3.5.4 are unclear. Instead of removing the keys, they are set to true. This is confusing till you read the Create Term Definition Subalgorithm. 5.4 Create Term Definition Subalgorithm --------------------------- I'm not sure that it's a good idea to separate these steps from Context Processing. I also find the introduction quite confusing. Is it talking about the term (the left hand side) or its IRI mapping? What if both sides contain compact IRIs? Step 3 seems to suggest that it is the left hand side but it is not clear how to proceed after that step. Step 5 suggests that keyword aliases are managed separately. Why is then just the alias removed and not a previous IRI mapping? Step 8.1 Remove @preserve Step 8.2 vocabRelative must be set to true; the result might be a keyword, but it won't be put handled properly as keyword alias as in 8.1 Step 8.4 is empty Step 11, really? Including type mapping etc. What's the point? Step 12.2.2.2 Property generators should just expand to IRIs, not @type, this should also be checked. Step 12.3 Why special case for @type? vocabRelative flag missing again Step 14 Where comes the prefix mapping from? Active ctx, local ctx? The order of the algorithms is a bit confusing IMHO. It clearly makes sense to put context processing first, but then the IRI expansion algorithm should directly follow it. Alternatively the order of index.html could be used, i.e., the order of their dependency: expansion -> context processing -> iri expansion 5.6 IRI Expansion --------------------------- The defined map may be more efficient but I believe it's easier to understand how cycles are detected if you pass an array of the currently being processed terms, e.g., [ A, B, C ] if C then requires B you can simply detect the recursion if it's already in the algorithm. Don't know, maybe it's also just a personal preference. I don't have a strong opinion about this but wanted to mention it. Problem desc: The algorithm doesn't care if it's already an absolute IRI, it tries nevertheless. The given value might be an absolute IRI, we don't know if it is at this point. General solution: I would shorten it, maybe just keeping the first paragraph. The rest is just as complicated to read as the algorithm itself. Algorithm Step 1: Also keywords should be returned as is Step 2 & 3: This should be made a bit clearer. It's quite difficult to understand what's going on. The point is, that terms/prefixes in the local ctx that haven't been processed yet, are processed first. Step 4: no need to say "explicitly ignore value", this is confusing. Just return null. Step 6 onwards: isAbsoluteIri is confusing, could be a bNode ID as well. Step 7: Why not just return the mapping (given vocabRelative is true)? The whole branching around isAbsoluteIri is really confusing. Maybe it's simpler to use the algorithm index.html and invoke the Create Term Def. algorithm there instead of recursing into itself (even if I find that simpler since the algorithm is self-contained). 5.5 Expansion --------------------------- Step 3.1: The @context member is not removed Step 3.2: This wouldn't be needed if @graph would be handled accordingly under 3.4.3 (see index.html 4.3.2.8) and would simplify 3.4.6, 3.9, and 4.1 Step 3.4.2: ... or a bnode ID Steps 3.4.3.x: IMHO it would be much simpler to expand the values directly here (just as in steps 4.3.2.x in index.html) instead of having to expand the property again in value expansion to find out that it is @id and then expand using the IRI expansion algorithm. All keywords would then be completely processed after this step. Step 3.4.4.2: "and value language value in value" is confusing Step 3.4.6: might need some clarification. It might be easier to put it under 3.4.3 Step 3.4.11: also this step would be simpler if everything keyword-related would be done under 3.4.3 Step 3.5.3: already verified in 3.4.3.5, but @type must be a string and not an array Step 3.9: could be simplified if 3.2 is changed (see index.html 4.6) insideList flag can be eliminated by checking looking instead into active property's container mapping or active property=@list 5.7 Value Expansion --------------------------- As already said above, I think it would be simpler to handle @id and @type expansion directly in the expansion algo. Fewer jumps, fewer branches, shorter algorithm(s) - not a deal breaker though. Step 5: not needed, it's done again in step 9. Step 10: default language missing Step 12: missing, return result 5.8 Label Blank Nodes Subalgorithm --------------------------- The isId flag isn't needed, @type is relabeled already in IRI expansion When will step 4 ever be reached (and do something, not reassigning the same bnodeId it already is)? 5.9 Generate Blank Node Identifier --------------------------- Proposal: Get rid of the configurable prefix, we never change it anyway. Just hardcode _:t or, even better because it's easier to remember, _:b. Briefly describe what the identifier map is used for, i.e., keeping track of blank node reassignments. 5.10 Compaction Algorithm --------------------------- I didn't mention this for the other algorithms, but I think it would be simpler to reorder the branches so that the longest comes last. In this case e.g., first process scalars, then arrays, and finally objects (instead of array, object, scalar which results in a lonely one-liner at the end). Another advantage is that the steps in object-processing would be a level higher since the if-block is not needed anymore, there won't be any other case and the two previous blocks returned already (see step 2-3 in index.html). The algorithm needs to be passed the inverse context otherwise it can't pass it to the subalgorithms (or they have to re-create it constantly from the active context) Step 1.3: The compactArrays option is not used Step 2.2: I don't think we need to explain a shallow copy is created and certainly we don't need to distinguish between array values and non-array values, the point is that each value is copied but the references contained in that copy remain the same references. Step 2.4: Say keys from shallow and mention that this has to be done because keys can be removed from shallow in the following loop Step 2.5.3.1: The IRI compaction algorithm has no documentRelative flag; it is also not needed as vocabRelative can be used to determine if the value is being used in @type (true) or @id (false). All IRIs are document-relative. Step 2.5.3.2.2: Drop documentRelative Step 2.5.5.1: Is activeProperty the same as "active property" or not? It's confusing to have two variables with the same name Step 2.5.5.1 (and in general): Passing parent to the IRI compaction algo which then passes it to term selection just to eliminate *some* of the potential property generators because not all IRIs are in shallow doesn't make sense to me since it doesn't optimize much but makes the algorithms much harder to understand. You can easily do that check in the Find and Remove Property Generator Duplicates Subalgorithm and I think that's the place where it belongs. Step 2.5.5 and substeps: There's a bug that's a bit difficult to explain, please see compact-0046 which I just added. Step 2.5.6.2: Active property could be an array of potential property generators, how do you choose the container then? Actually I went on and saw that property generators are not handled correctly at all. I added another test, compact-0047 which shows the problem. Given this bug, the rest of the algorithm works under the wrong assumptions. So it doesn't make much sense to review the remaining steps. 5.12 Inverse Context Creation Subalgorithm --------------------------- Good idea to sort the terms before processing them! Overall, the algorithm is 4 times as long as the one in index.html. While I think we should clarify the one in index.html I don't think it's necessary to blow it up that much. Overall: I find it confusing to "initialize [variable] to the value associated with the [key] key in [map]. What you do here is to store a reference to that member, you don't assign its value to [variable]. Step 3.3.3: It would be more efficient to do this outside of the loop Step 3.3.4: No need to initialize the defaultLanguage key here, this is done again in 3.3.6.3.1 Step 3.3.6: I would find it easier to understand if the checks would be "If there's a type mapping.", "Otherwise, if there's a language mapping", and finally just "Otherwise". This would also eliminate some nestings. There's a bug since the algorithm doesn't distinguish between a term with no language (and no type) mapping and a term with a language mapping set to null. I've added compact-0048. 5.11 IRI Compaction Algorithm --------------------------- As already said in the Compaction algorithm, I would suggest dropping the "parent" parameter. Instead I would pass the inverse context explicitly. Depending of the outcome of ISSUE-204, also the base IRI is required. Step 5.4: This is related to the bug described above, numbers, e.g., have no language mapping. This is different than strings with a language that equals null. Step 5.4.4 and substeps: At least @id objects are not handled properly; I've added compact-0049 for this bug. I won't review the rest of these sub-steps but continue directly with compact IRIs. Step 7.4 and 7.5: Why introduce the variable isUsableCurie here? I can't see how the algorithm works if there are potential property generators that turn out to not match because the data doesn't match. I've added compact-0050 as a minimal test. The point is that the fallback terms are never returned. As I understand the algorithm, in this case only "gen" would be returned. Depending on the outcome of ISSUE-204, compaction to relative IRIs is missing. 5.13 Term Selection Subalgorithm --------------------------- Drop parent parameter and the corresponding "optimization" Sometimes @null is used, sometimes @none (also in the other algorithms). I find this confusing as it is very difficult to remember what's in use where. Step 3: Missing space "underwhich" Step 4: Typo in typeOrLangageValue. In step 3 typeOrLanguageValue is modified, in step 4 then a new variable "preferred" is introduced. Why not reuse typeOrLanguageValue? Why not do those steps directly in IRI compaction? Same question for step 1 Overall, I'm not sure if the introduction of so many variables with very similar names is helpful (e.g. typeOrLanguage, typeOrLanguage map, typeOrLanguageValue map) Only one property generator is ever selected but it is not checked whether the values really match so that the property generator is applicable. There's also no way to invoke the algorithm again saying to ignore that specific property generator in the second run. So the algorithm doesn't really work. 5.14 Value Compaction --------------------------- Substeps of 1: The whole preserveIndex stuff is quite complicated. Why not just create a copy of value and then modify that copy as the algorithm in index.html does? Step 2: What if there's also @index? Why expand the value? Why IRI expand an object? It's already expanded, we are compacting here. I think it should mean expand active property. Typo "passing passing" PROPOSAL: Keep the algorithm in index.html which is half as long. Modify it to work on a copy of value if necessary or construct a copy step-by-step (this should require just minimal changes). 5.15 Find and Remove Property Generator Duplicates Subalgorithm --------------------------- In contrast to the algorithm in index.html, the algorithm accepts just a single property generator. So they do something different. The algorithm in index.html does the following: - Check for *every potential* property generator if all of its IRIs are included as properties in element - If so, check if they all contain the passed value - If so remove those duplicate values (and potentially the whole property if no other value is left) - Return the first matching property generator for which all duplicates have been found - If no matching property generator was found, return the fallback term, compact IRI or IRI. The algorithm in alternate2.html does the following: - It checks for a *single's* property generator IRIs if element contains the duplicates - If a duplicate is found, it is deleted straight ahead (not checking first verifying whether all duplicates are really there) - this algorithm doesn't seem to work on a copy of element so data is potentially lost - It doesn't return anything Since the algorithm is just invoked in the compaction algorithm the remaining work must be done there but it isn't. Here's what done there: If there is a term definition for activeProperty in active context that is a property generator, then invoke the Find and Remove Property Generator Duplicates algorithm, passing active context, shallow for element, expanded property, expanded item for value, and activeProperty. So I cannot see how this would possibly work. I would thus propose to keep the algorithm of index.html but include steps 2.3.1.X from alternate2.html to describe in detail what's a duplicate and what's not. The remaining algorithms haven't been changed. -- Markus Lanthaler @markuslanthaler
Received on Monday, 11 February 2013 12:53:32 UTC