Re: Review of alternate2.html from Dave Longley on 2013-02-12 (public-linked-json@w3.org from February 2013)

From: Dave Longley <dlongley@digitalbazaar.com>
Date: Tue, 12 Feb 2013 16:20:49 -0500
CC: Linked JSON <public-linked-json@w3.org>
Message-ID: <511AB231.5090004@digitalbazaar.com>
I removed some of the resolved issues/thread text in my reply below. I 
made spec changes to incorporate most of Markus' feedback.

On 02/12/2013 07:21 AM, Markus Lanthaler wrote:
>>>
>>> Are you sure? The two contexts look different to me, but we may be
>>> talking about something else. The context in Example 1 has the term
>>> "name", whilst Example 2's context does not.
> You are right, "name" is missing. But it's very difficult to see because
> other things are highlighted and even if "name" would be there, it wouldn't
> make a difference.

I think the context should be different in that example. We should 
probably use different terms. I think that would be a better 
illustration of what's different there -- and would emphasize different 
contexts rather than just the use of an array vs. no array. I still 
think it's important to note changes to make data more regular, but that 
is secondary to context differences.

> Unlabeled blank nodes are not labeled in expansion. Only labeled blank nodes
> get relabeled. So I think it's important to highlight that straight in the
> introduction. I think in the API spec we can work under the assumption that
> the data model is clear.

IMO, blank node relabeling in the introduction is a little too advanced. 
It needs to be mentioned, but I think that should happen elsewhere.

> -- snip re: remote context handling --
> The API call itself is asynchronous, whether the algorithms it calls are
> doesn't really matter. In some languages you would spawn a thread, in others
> you rely on an event-loop, ... So I agree with Gregg that this is, IMO,
> clearly an implementation-specific optimization. Whether you have one async
> call before expansion or a number of them during expansion doesn't really
> matter. Furthermore, the current pre-processing step directly modifies the
> input document which you tried to avoid in all other operations. Passing
> around a map of external contexts just makes it more complex to explain.

I disagree that the algorithms it calls being async/not async doesn't 
matter. I prefer a separation of concerns and isolation of problems for 
each algorithm as much as possible. I don't want to be thinking about 
other problems while dealing with a particular algorithm, I'd rather be 
more focused.

That a pre-processing step requires the input document to be modified 
isn't the point regarding minimizing copying inputs. IMO, each 
algorithm, taken on its own, should be minimized in this respect as much 
as possible. It's ok if that doesn't work for some of them but does for 
others. What I wanted to avoid was having it be unclear whether or not a 
developer really needed to be copying data. If the spec still does that 
we should rectify it.

In short, I think isolating remote context handling is better. I don't 
think passing around a map in that algorithm makes things more difficult 
to understand, I think it is worse to be vague about how to deal with 
circular dependencies and context reuse because we are trying to avoid 
being too verbose.

All that being said, so long as we mention that you might want to do 
remote context processing first if you're in an asynchronous environment 
I could settle on mixing it into the expansion algorithm. But, really, I 
do think we should prefer separating concerns (all the async vs. sync 
stuff aside).

>> If a CURIE is on the LHS, it is a term, IMO.
> +1, everything on the LHS is a term, it's a (more or less) opaque string.
> The only time you really look into the string at LHS is when @id is missing
> in its definition to check if you can expand it nevertheless to an IRI.

Regarding "term definition inheritance", I've corrected this in the 
spec. It doesn't happen any more unless a term has the form of a CURIE 
and omits the IRI mapping (and only then is the IRI mapping inherited + 
the suffix).

>
> Did you create test cases for the bugs you've discovered?

Not all of them. Mostly because I need to get the API error test cases 
working with my processor. I think that's where I may have a test case 
or two to add.

>>>
>>> The split was made to separate validation from the general processing
>>> and to avoid repeating the same steps for array processing of @type,
>>> etc. In this way @type isn't handled differently from @list or @graph,
>>> both of which recurse. These array-handling steps are not present in the
>>> index.html version (so it doesn't work if you implement it according to
>>> the text because IRI expansion doesn't handle arrays). You can either
>>> recurse to handle arrays or you can repeat yourself; I tried to keep
>>> things consistent with recursing for @type ... which also meant it
>>> seemed cleaner to keep validation consistent and separate from general
>>> processing. I think this is all a matter of style. Of course, if we can
>>> simplify it all without repeating ourselves somewhere I'm all for that.
>>>
>>> It may be that it's just always a tradeoff with where we repeat
>>> ourselves. In that case, I'd prefer to just recurse through the
>>> expansion algorithm to let it handle arrays vs. other values and let all
>>> values run through "value expansion". The difference in approach may
>>> again arise from the somewhat amorphous nature of our terminology (eg:
>>> "value").
> I just find it easier to do it in one place. When you read the algorithms
> and come to a point where it says "If expanded property is @type..." it
> seems natural to handle it completely there so that I can then forget about
> that case. Otherwise I have to find my way through the algorithm to see how
> it's really handled. That's the reason why in index.html all keyword
> processing is consolidated in one place (steps 4.3.2.x). After that, you
> don't have to think about keywords anymore.

I don't really mind if we move this around. We'll need to keep in mind 
that it will likely generate steps where we have to repeat ourselves, 
however, to do array processing, etc.

>
> I would also argue that it would be much simpler to recognize variables by
> giving them names such as languageValue and formatting them in monospace
> (not orange though) instead of italics.

+1 to changing the formatting.

> -- snip re: compaction shallow copying for property generators --
> Does it really have to be two levels deep? I don't think so. You just need
> the copy because you remove property or property values (references in case
> of objects) from shallow. So you create a copy of the array but not of
> element it contains.

This is what I meant by two-levels deep. You don't just create a shallow 
copy of the object; you must shallow-copy each one of its keys' arrays.

>>> I think the case where @index is in a subject reference may not be
>>> handled. Sounds like we need a test-case for that. Also, the algorithm
>>> in index.html doesn't seem to handle compacting the IRI in that case.
> But the compaction algorithm recurses into that object and compact the value
> @id (the place where you just validate its value).

This should be an easy fix, just can't get to it at the moment.

-- 
Dave Longley
CTO
Digital Bazaar, Inc.
Received on Tuesday, 12 February 2013 21:21:04 UTC