Framing explanation and framing return values

During the most recent telecon we briefly discussed changing the framing 
API so that it no longer returns NULL. The reason for doing this seemed 
to be a general feeling that when NULL is returned it indicates an 
"error" and errors should be indicated through exceptions instead. It 
also wasn't very clear to those who haven't yet worked directly with 
JSON-LD framing what was really being discussed and what the potential 
issues were. So I decided that I'd send an email out explaining the 
current state of framing in a little more detail and then talk about the 
"NULL vs {} issue" from the telecon. Perhaps we can also integrate some 
of the language here into the spec's explanation on framing. If you have 
already worked extensively with frames, feel free to skip to the bottom 
of this email to the telecon issue discussion.

JSON-LD Framing

There is often more than one way to represent the same directed graph in 
JSON-LD. The subjects in the graph might be arranged in a flat 
structure, much like the output the JSON-LD normalization algorithm. 
Alternatively, the subjects might be expressed in a way that is more 
natural to many JSON developers, as leaves in a tree. However, there are 
many different trees that could be constructed to represent the same 
directed graph. JSON-LD framing allows JSON developers to work more 
naturally with directed graphs by structuring them in a way that they 
specify.

A JSON-LD frame can be thought of both as a scaffold and as filtering 
mechanism. When a JSON-LD frame is applied to a JSON-LD document, the 
resulting output is the content of the JSON-LD document that passed the 
frame's filters structured in a way that mirrors the way the filters are 
structured in the frame.

A frame can filter content into two ways: strict-typing and duck-typing. 
A frame that specifies a strict-type filter will only allow subjects 
from the JSON-LD document that have a @type that matches the filter into 
the output. A frame that does not specify a strict-type filter will 
allow any subject that matches the duck-type specified by the filter 
into the output. For instance:

A frame that uses strict-typing:

{"@type": "http://example.com/my-type"}

This frame will match the first subject found in a JSON-LD document that 
has the @type "http://example.com/my-type". Note that "the first" is 
determined by JSON-LD normalization order. To match all subjects with 
that @type, this frame would be used:

[{"@type": "http://example.com/my-type"}]

A frame that uses duck-typing:

{"http://example.com/my-property": {}}

This frame will match the first subject found in a JSON-LD document that 
has at least the property "http://example.com/my-property".

Frames may also include @contexts:

{
"@context": { "mytype": "http://example.com/my-type" },
"@type": "mytype"
}

When a frame includes a @context, that same @context will be applied to 
the output.

Now, which subjects will pass through a filter also depends on where in 
the frame structure the filter occurs. For instance, if we look at the 
duck-typing example from above, there are actually two filters being 
used. The first filter works on the JSON-LD document to find a subject 
with the property "http://example.com/my-property". But the second 
filter is the empty {}. This filter will cause only the first object for 
that property to be present in the output. If that filter were instead 
an array [], then all objects for that property would be present in the 
output:

{"http://example.com/my-property": []}

Furthermore, each filter, by default, will "embed" subjects in the 
output. This is how a tree structure gets specified and built. For 
instance, if the JSON-LD document that the above frame was applied to 
was this:

[{
"@subject": "http://example.com/subject1",
"http://example.com/my-property": {"@iri": "http://example.com/subject2"}
},
{
"@subject": "http://example.com/subject2",
"http://example.com/foo": "42"
}]

Then the output would be this:

{
"@subject": "http://example.com/subject1",
"http://example.com/my-property": [{
    "@subject": "http://example.com/subject2",
    "http://example.com/foo": "42"
    }]
}

Take note that the value of the "http://example.com/my-property" key is 
still an array. If an array is specified in a frame for a property other 
than @type, then that property's value will always be an array, even if 
the output has 0 or 1 matching value. If an array is specified for the 
@type property, then a subject that contains any of the types in the 
array will be considered a match for the filter.

Hopefully from these examples, one can extrapolate how complex tree 
structures can be specified via framing. There are some more details and 
options involved in framing that I'll mention:

 From the last example you can see that the "http://example.com/foo" 
property was pulled in for the embedded subject even though it wasn't 
specified in the frame filter. By default, any properties that are not 
explicitly mentioned in the frame are included in the output, so long as 
the subject itself matches the strict-type or duck-type specified. 
However, this behavior can be modified by using a frame keyword 
@explicit. If a frame filter has "@explicit" set to true, then when that 
filter is applied, the output will only include those properties that 
are explicitly mentioned.

Some related behavior, that is worth noting, occurs when a strict-type 
filter is used that also specifies other properties. In this case, a 
subject that matches the strict-type will be present in the output, but 
will contain properties that are set to NULL. This is done so that a 
developer needs to only check a property for NULL, which is believed to 
be fairly natural in JSON, rather than checking it for existence. This 
relates to the issue discussed on the telecon and I will come back to it 
later.

If returning NULL for missing properties is not desired behavior, then 
value that is returned for missing properties can be modified using the 
frame keywords: @default and @omitDefault. The @default keyword may be 
set in a frame filter to a value to return instead of NULL whenever a 
property is missing. The @omitDefault keyword, when set to true, will 
simply not include the property in the output.

The last option in framing involves the keyword @embed. As I mentioned 
earlier, by default, subjects will be embedded according to frame filter 
structure. To change this behavior on a per-filter basis, you set the 
@embed property to false in a frame filter. This will cause only the 
@iri of a subject to be used as the object value of a property rather 
than the full subject and all of its properties. There is also a 
restriction in the current framing algorithm that requires that subjects 
only be embedded up to once in an output document, so it is sometimes 
necessary to specify @embed for complicated structures that reference 
the same subject in multiple places in the tree.

There may be keyword added in the future called @sort. This would be 
used to sort the objects of a property (when it has more than one). It 
would specify the property of the objects (if they are subjects) to sort 
according to and the sort order (ascending or descending). This relates 
to providing JSON developers a consistent sort order for working with 
data that isn't a @list.

Hopefully this explanation sheds some light on how framing works and 
what one's expectations should be when crafting a frame to structure 
your data.

---

So, getting back to the telecon issue.

As mentioned before, when a property does not exist in a subject that 
matches a frame filter, that property, by default, is set to a value of 
NULL in the output. Similarly, if a frame filter of {} is specified for 
a property, as opposed to [], and no value matches that property, then 
it will also be set to NULL in the output. This holds true for the 
"top-level" of an output tree as well as any of its branches. This means 
that if an object (as opposed to array) frame was applied to a JSON-LD 
document, and none of the subjects matched the "top-level" filter in the 
frame, the output would be NULL.

It was suggested on the call that we change the output of a "top-level" 
match of none from NULL to {}. Without considering anything other than 
top-level matches, I don't think that there's any issue with this. 
However, when you consider that NULL is returned for non-top-level 
matches (property matches), then it seems to me that we're being 
inconsistent (which isn't necessarily a bad thing). Furthermore, if we 
wanted to be consistent, we should also set properties with no matches 
to {} -- but this is problematic as it would seem to potentially 
conflict with properties that have specific ranges. For instance, a 
property may be only a string or only an integer, and here we've gone 
and set it to an object. Setting it to NULL instead, IMO, seems to avoid 
this strangeness.

For those who were in support of using {} at the top-level rather than 
NULL, do you still have the same opinion now that you (perhaps) have a 
more in-depth view of the JSON-LD framing? What do you think of the 
non-top-level cases?

To be clear, I'm not necessarily opposed to changing the framing API to 
return {} rather than NULL, but I want to make sure that we're making an 
informed decision about it; I felt that it was more natural to work with 
NULL under the circumstances but I may not be in the majority.

-Dave

-- 
Dave Longley
CTO
Digital Bazaar, Inc.

Received on Wednesday, 24 August 2011 04:03:41 UTC