RE: Review of alternate2.html from Markus Lanthaler on 2013-02-12 (public-linked-json@w3.org from February 2013)

From: Markus Lanthaler <markus.lanthaler@gmx.net>
Date: Tue, 12 Feb 2013 13:21:24 +0100
To: <public-linked-json@w3.org>
Message-ID: <010501ce091b$7a2a69d0$6e7f3d70$@lanthaler@gmx.net>
Instead of creating two threads, I try to reply to both of you in one mail.

>>> 2.1 Expansion
>>> ---------------------------
>>> 1st paragraph: Emphasize removal of context, not switching between
contexts
>>
>> Maybe. I like how the main purpose of removing contexts is explained
>> now. JSON-LD is often really, IMO, about being able to express the same
>> data using different contexts, so it makes sense to talk about
>> everything in that ... context.
> 
> Hmm; the purpose of expansion is clearly to remove a context. Alternative
> contexts only come in when re-compacting.

That's exactly what I meant.


>>> Under Example2: "The difference is in their context information" -
that's
>>> wrong, the contexts are the same.
>>> I would thus propose to replace the paragraph with the intro from the
>>> corresponding paragraph in index.html
>>
>> Are you sure? The two contexts look different to me, but we may be
>> talking about something else. The context in Example 1 has the term
>> "name", whilst Example 2's context does not.

You are right, "name" is missing. But it's very difficult to see because
other things are highlighted and even if "name" would be there, it wouldn't
make a difference. 


>>> 2.3 Flattening
>>> ---------------------------
>>> I would like to mention that all blank nodes will be labeled. This is an
>>> integral part of flattening. It makes the examples slightly more
difficult
>>> but certainly helps to understand the algorithms later on.
>>
>> This was removed because blank nodes and blank node label is a fairly
>> complex issue, IMO. I didn't think it was appropriate to be in the
>> introduction. Perhaps it needs more highlighting later on?
> 
> All blank nodes are already relabeled in expansion, AKAIKR, so I'm not
sure
> why this stands out as being any different.

Unlabeled blank nodes are not labeled in expansion. Only labeled blank nodes
get relabeled. So I think it's important to highlight that straight in the
introduction. I think in the API spec we can work under the assumption that
the data model is clear.


>>> 5.2 Remote Context Resolution
>>> ---------------------------
>>> I really like the idea in general but would change it slightly. Instead
of
>>> pre-processing the whole document I would call the algorithm as first
step
>>> whenever a local context is being processed. So it would be a
sub-algorithm
>>> of context processing. It should also be described more formally and
include
>>> error handling, mention that other data in the remote context document
is
>>> ignored, etc. Slightly adapting the steps under Context Processing 2.2
in
>>> index.html will do the job.
>>
>> Well, one of the reasons for getting it all out of the way to begin
>> with, is to avoid introducing a lot of asynchronous complexity with the
>> other algorithms. If someone were implementing a new processor in an
>> asynchronous environment, and they tried to follow the algorithms, they
>> might find control flow to be a pain. Hopefully they'd notice that if
>> they got remote context resolution out of the way first their work would
>> be easier, but why not give them a headstart in solving that problem for
>> them? I do think both approaches should be mentioned either way.
> 
> For my part,  do synchronous processing anyway, and HTTP caching can
eliminate
> most sources of latency; I find it better to keep it in the cor Expansion
> algorithm, which works well in my case. I'm agnostic about separating the
> algorithm out, but it sounds like an implementation-specific optimization.

The API call itself is asynchronous, whether the algorithms it calls are
doesn't really matter. In some languages you would spawn a thread, in others
you rely on an event-loop, ... So I agree with Gregg that this is, IMO,
clearly an implementation-specific optimization. Whether you have one async
call before expansion or a number of them during expansion doesn't really
matter. Furthermore, the current pre-processing step directly modifies the
input document which you tried to avoid in all other operations. Passing
around a map of external contexts just makes it more complex to explain.


>>> 5.3 Context Processing
>>> ---------------------------
>>> Steps 3.4.4 and 3.5.4 are unclear. Instead of removing the keys, they
are
>>> set to true. This is confusing till you read the Create Term Definition
>>> Subalgorithm.
>>
>> I don't know that it's confusing to mark those keys as having been
>> defined (their entries in the defined map are set to true). I will say
>> that the reason why those keys aren't removed is because, in this
>> rewrite, I tried to take the approach of not modifying inputs as well as
>> not having to say "copy the input" everywhere. This means it takes an
>> implementor less mental effort to have to tease out where they can
>> optimize away copying and where they can't.

.. and yet you modify your input document when you dereference external
contexts. As already said above I believe it's much simpler to pre-process
each local context and then pass the pre-processed result (i.e. a modified
copy) to the context processing algorithm. You can then modify it as you
like which would simplify the algorithm.



>>> 5.4 Create Term Definition Subalgorithm
>>> ---------------------------
>>> I'm not sure that it's a good idea to separate these steps from Context
>>> Processing. I also find the introduction quite confusing. Is it talking
>>> about the term (the left hand side) or its IRI mapping?
>>
>> Is the confusion in the choice of the word "term"? I have found some of
>> the terminology to be confusing myself; we create "term definitions" for
>> keys that are CURIEs ... are these "terms" or not? The terminology in
>> the spec indicates that a term is any "short word" defined in a context
>> that may be expanded to an IRI. Perhaps we should be saying "keys" (in
>> the local context) before terms are defined? I think this is a general
>> problem we have in the specification; I know it was there when I was
>> looking over the old spec during the rewrite. I think we should get some
>> consensus on our terminology for this moving forward.
> 
> If a CURIE is on the LHS, it is a term, IMO.

+1, everything on the LHS is a term, it's a (more or less) opaque string.
The only time you really look into the string at LHS is when @id is missing
in its definition to check if you can expand it nevertheless to an IRI.


>>> Step 3 seems to suggest that it is the left hand side
>>> but it is not clear how to proceed after that step.
>>
>> This may just arise from the general confusion w/"term". All you do is
>> invoke the create term definition algorithm recursively to define the
>> dependency (as step 3 states), and then you move onto step 4. There's
>> nothing special going on there.

And that's exactly what confuses me. Just by looking at the LHS you don't
know if it's a dependency. It's a dependency if a compact IRI is used in @id
*or* if @id is missing and the LHS is a compact IRI.


> The algorithm also causes curies to inherit the term definitions of the
> prefix, which isn't called out in the Syntax spec, AFAIKR.

That's another concern I mentioned further down (step 11). The problem is
illustrated in expand-0050.


>>> Step 5 suggests that keyword aliases are managed separately. Why is then
>>> just the alias removed and not a previous IRI mapping?
>>
>> Because the IRI mapping will be overwritten when you define the term.
>> I'm not sure if this needs to be made more clear with a note?

That's exactly the point. Just the IRI mapping is overwritten. The algorithm
might return in step 8.3 never without resetting the type, language, or
container mapping.


>>> Step 8.1 Remove @preserve
>>
>> Yes, this is for supporting the framing algorithm and it should be
>> removed at this time.
>>
>>> Step 8.2 vocabRelative must be set to true; the result might be a
keyword,
>>> but it won't be put handled properly as keyword alias as in 8.1
>>
>> I thought you are not allowed to use keyword aliases in the context. Has
>> this changed?

There are two issues: One is that vocabRelative must be set to true because
all the IRIs in the context are vocabRelative. The other one is that the
result of the IRI expansion operation might be a keyword but the description
of the algorithm doesn't handle it correctly. Your implementation does it -
and it also supports keyword aliases (expand-0051).

Regarding keyword aliasing: we agreed that you cannot use keyword aliases as
keys of term definitions, they have to be @id/@type/@language/@container.
You can, however, use keyword aliases as values. Just as you can use other
terms or compact IRIs. I think the whole separation of keyword aliases from
term definitions causes more confusion than it helps. The algorithms
shouldn't care (in most cases) and thus, shouldn't separate those mappings
IMO.


>>> Step 11, really? Including type mapping etc. What's the point?
>>
>> Inheritance from dependencies. You may add/override a container, type,
>> or language mapping. For example, if you have a term that is a CURIE
>> that defines only a container, language, or type mapping, you want to
>> inherit the IRI mapping from the prefix's definition. It sounds like we
>> should add several test cases to the test-suite for this.

No, you don't inherit anything. Quote from the syntax spec:

  "Duplicate context terms are overridden using a last-defined-wins
mechanism"

The only thing that's special when you use a compact IRI at the LHS is that
you might omit the IRI mapping since it can be calculated automatically.


>>> Step 12.2.2.2 Property generators should just expand to IRIs, not @type,
>>> this should also be checked.
>>
>> This has been fixed ("@type" is now disallowed as a property generator
>> IRI mapping).
> 
> Good.

Why a special case for @type? @graph is disallowed as well, just as dlfkj if
it doesn't expand to an IRI.


>>> Step 12.3 Why special case for @type? vocabRelative flag missing again
>>
>> The special case has been removed, it was an artifact from a previous
>> algorithm. The vocabRelative flag is missing for the same reason as
>> above, no keyword aliases are permitted in the context.

Talked about keywords above. What about vocabRelative IRIs? See expand-0052.


>>> 5.6 IRI Expansion
>>> ---------------------------
>>> The defined map may be more efficient but I believe it's easier to
>>> understand how cycles are detected if you pass an array of the currently
>>> being processed terms, e.g., [ A, B, C ] if C then requires B you can
simply
>>> detect the recursion if it's already in the algorithm. Don't know, maybe
>>> it's also just a personal preference. I don't have a strong opinion
about
>>> this but wanted to mention it.
>>
>> The map was also used because of the way the algorithm was rewritten --
>> it stores a tri-state not defined/defining/defined, not just
>> defining/not defined. This is used to avoid infinite recursion and is
>> simpler to understand, IMO. I thought showing that there were three
>> states was more clear than just having an array. Clearly we differ here;
>> I don't really mind which way it is explained in the algorithm so long
>> as the consensus is that it is easiest to understand (or maybe it just
>> doesn't matter that much).
> 
> I sort of liked this too.

As said, I don't have a strong opinion about this. If both of you agree its
simpler, let's keep it.


>>> Step 4: no need to say "explicitly ignore value", this is confusing.
Just
>>> return null.
>>
>> I disagree. I wanted to know why null was being returned and I wanted to
>> know it right there in the algorithm without having to go find out what
>> was going on via another document or statement somewhere. I think it's
>> important to have notes within these algorithms about what the special
>> meaning of things, like a "null mapping", are.

OK, but then the note should be understandable. I wouldn't understand it. We
could change it something like the following sentence (maybe someone has an
idea how to make it a bit less clumsy).

"If value has a null mapping in active context return null as it has been
explicitly marked to be ignored."


>>> The whole branching around isAbsoluteIri is really confusing.
>>> Maybe it's simpler to use the algorithm index.html and invoke the Create
>>> Term Def. algorithm there instead of recursing into itself (even if I
find
>>> that simpler since the algorithm is self-contained).
>>
>> I definitely prefer keeping the algorithm self-contained. Maybe we can
>> clean up the branching in another way. I would prefer that we don't
>> repeat ourselves too, if we can avoid it. I'm ok with the general idea:
>> get the value associated with the active context, if it's a keyword or
>> null return, if you got something else don't look for a CURIE, otherwise
>> you got nothing, so you have to check for a CURIE. Once you've dealt
>> with the possible CURIE, do blank node renaming or apply vocab or base.
>> I think that flow is easy to follow and it's what we should go with, we
>> might just need to clean up the algorithm text a bit.

+1


> Another thing, it should be clear what the document base is when running
> this. When expanding a context IRI, it should be the referencing context.

Yes, we should pass it as a parameter to the algorithm. And probably at the
parameter everywhere else as well. Finally we should then add one
API-specific section where we describe how the different algorithms are
invoked by the API calls (and how they pass their parameters to them).


>>> 5.5 Expansion
>>> ---------------------------
>>> Step 3.1: The @context member is not removed
>>
>> Yeah, I see that now. Perhaps we want to instead skip that key further
>> down (rather than removing it, to keep with the notion of not having to
>> copy inputs).

Or we create an array of the keys, sort it and then iterate over that array
as we do in the compaction algorithm.


>>> Step 3.2: This wouldn't be needed if @graph would be handled accordingly
>>> under 3.4.3 (see index.html 4.3.2.8) and would simplify 3.4.6, 3.9, and
4.1
>>
>> We should look into this. From what I recall, the algorithm in
>> index.html was buggy. One of the test cases I added might be related to
>> this as well (it didn't work w/your playground). There may be another
>> that is required to further expose the problem.

Did you create test cases for the bugs you've discovered?
I was too lazy to add a GitHub web hook for my playground so it might not be
up to date. I'll check the tests you added later. Which one are you
referring to here?


>>> Step 3.4.2: ... or a bnode ID
>>
>> Yes, just another case related to bnode IDs not being IRIs. It would be
>> nice to have a catch-all identifier name that includes bnode IDs. Either
>> that or we can use "if the expanded property is a string that contains a
>> colon", but I find that somewhat ugly and doesn't do as good of a job
>> explaining what's going on.

That's all we really do. We check for a colon. We should say it because
saying "if it's an absolute IRI" is a bit a stretch if we never validate
IRIs. 


> Yes, let's mint a "term" for this.

Would be fine with that as well. But what do we call it? A string containing
a colon? :-P



>>> Steps 3.4.3.x: IMHO it would be much simpler to expand the values
directly
>>> here (just as in steps 4.3.2.x in index.html) instead of having to
expand
>>> the property again in value expansion to find out that it is @id and
then
>>> expand using the IRI expansion algorithm. All keywords would then be
>>> completely processed after this step.
>>
>> The split was made to separate validation from the general processing
>> and to avoid repeating the same steps for array processing of @type,
>> etc. In this way @type isn't handled differently from @list or @graph,
>> both of which recurse. These array-handling steps are not present in the
>> index.html version (so it doesn't work if you implement it according to
>> the text because IRI expansion doesn't handle arrays). You can either
>> recurse to handle arrays or you can repeat yourself; I tried to keep
>> things consistent with recursing for @type ... which also meant it
>> seemed cleaner to keep validation consistent and separate from general
>> processing. I think this is all a matter of style. Of course, if we can
>> simplify it all without repeating ourselves somewhere I'm all for that.
>>
>> It may be that it's just always a tradeoff with where we repeat
>> ourselves. In that case, I'd prefer to just recurse through the
>> expansion algorithm to let it handle arrays vs. other values and let all
>> values run through "value expansion". The difference in approach may
>> again arise from the somewhat amorphous nature of our terminology (eg:
>> "value").

I just find it easier to do it in one place. When you read the algorithms
and come to a point where it says "If expanded property is @type..." it
seems natural to handle it completely there so that I can then forget about
that case. Otherwise I have to find my way through the algorithm to see how
it's really handled. That's the reason why in index.html all keyword
processing is consolidated in one place (steps 4.3.2.x). After that, you
don't have to think about keywords anymore.


>>> Step 3.4.4.2: "and value language value in value" is confusing
>>
>> Agree. We should comb over the spec and read it without special markup
>> like italics, etc. -- for similar issues.

I would also argue that it would be much simpler to recognize variables by
giving them names such as languageValue and formatting them in monospace
(not orange though) instead of italics.


>>> 5.10 Compaction Algorithm
>>> ---------------------------

[...]

>>> Step 2.2: I don't think we need to explain a shallow copy is created and
>>> certainly we don't need to distinguish between array values and
non-array
>>> values, the point is that each value is copied but the references
contained
>>> in that copy remain the same references.
>>
>> The reason for the distinction was to highlight that the shallow copy is
>> actually two-levels deep. If we can clean this up without losing what's
>> going on, that would be great.

Does it really have to be two levels deep? I don't think so. You just need
the copy because you remove property or property values (references in case
of objects) from shallow. So you create a copy of the array but not of
element it contains.


>>> Step 2.4: Say keys from shallow and mention that this has to be done
because
>>> keys can be removed from shallow in the following loop
>>
>> I think it says from "element", not from "shallow".

I think I wasn't clear. I meant it would be clearer if we would say we
"initialize keys to an array containing all of the keys in *shallow*,
ordered lexicographically" and mention that we need to do this because keys
from shallow can be removed within the loop - that's the reason why you
create the array containing all of them but have the check in 2.5.1.


>>> Step 2.5.5.1 (and in general): Passing parent to the IRI compaction algo
>>> which then passes it to term selection just to eliminate *some* of the
>>> potential property generators because not all IRIs are in shallow
doesn't
>>> make sense to me since it doesn't optimize much but makes the algorithms
>>> much harder to understand. You can easily do that check in the Find and
>>> Remove Property Generator Duplicates Subalgorithm and I think that's the
>>> place where it belongs.
>>
>> I think it makes more sense to consolidate how term selection happens.
>> It's easier, IMO, to understand that you pass in some parameters to a
>> term selection algorithm -- which then gives you back the term you are
>> supposed to use. I think that's better than it maybe returning you a
>> list of terms to try out yourself.

But you don't check for property generator duplicates. So how do you know if
it's the right property generator? Just because the properties exist doesn't
mean the values exist as well.


>>> 5.12 Inverse Context Creation Subalgorithm
>>> ---------------------------
>>> Overall: I find it confusing to "initialize [variable] to the value
>>> associated with the [key] key in [map]. What you do here is to store a
>>> reference to that member, you don't assign its value to [variable].
>>
>> I'm not sure what language we want to use to clarify what's going on
>> here. We could easily be more verbose to hide what is really an
>> implementation detail, but it is also just used to help people
>> understand the steps by breaking them down a bit. I'm not sure what we
>> want to do with these situations in general.

This is definitely not an implementation detail because the algorithm
wouldn't work at all if it's not a reference.


>>> There's a bug since the algorithm doesn't distinguish between a term
with no
>>> language (and no type) mapping and a term with a language mapping set to
>>> null. I've added compact-0048.
>>
>> I think it does make that distinction, but I think it makes in an
>> opposite direction from the test case. I'm not sure that test-case has
>> the desired output. It seems like we'd prefer to select the null
>> language term over the no-language term because it is more specific.

Since it's a number it doesn't have a language. If it would be a string, the
other term would have to be selected. I would say that if I have something
with "@language: null" my intent is to return plain strings.


>>> 5.13 Term Selection Subalgorithm
>>> ---------------------------
>>> Drop parent parameter and the corresponding "optimization"
>>>
>>> Sometimes @null is used, sometimes @none (also in the other algorithms).
I
>>> find this confusing as it is very difficult to remember what's in use
where.
>>
>> I thought it was very important to distinguish between the two. Saying
>> something is null in JSON-LD explicitly means you don't want to use it;
>> whereas "@none" is used when you haven't said anything at all.

I don't think that's the case here. If I said nothing, the key will be
"@null". If I said it is null, the value of the according will be "@null".


>>> 5.14 Value Compaction
>>> ---------------------------
>>> Substeps of 1: The whole preserveIndex stuff is quite complicated. Why
not
>>> just create a copy of value and then modify that copy as the algorithm
in
>>> index.html does?
>>
>> I didn't think it was that complicated. I'll have to compare the two
>> algorithms again.
>>
>>> Step 2: What if there's also @index? Why expand the value? Why IRI
expand an
>>> object? It's already expanded, we are compacting here. I think it should
>>> mean expand active property. Typo "passing passing"
>>>
>>> PROPOSAL: Keep the algorithm in index.html which is half as long. Modify
it
>>> to work on a copy of value if necessary or construct a copy step-by-step
>>> (this should require just minimal changes).
>>
>> I think the case where @index is in a subject reference may not be
>> handled. Sounds like we need a test-case for that. Also, the algorithm
>> in index.html doesn't seem to handle compacting the IRI in that case.

But the compaction algorithm recurses into that object and compact the value
@id (the place where you just validate its value).




--
Markus Lanthaler
@markuslanthaler
Received on Tuesday, 12 February 2013 12:21:55 UTC