second review of json-ld

This is a partial follow-up review of json-ld.     Here I'm reviewing:

    JSON-LD 1.0
    A JSON-based Serialization for Linked Data
    [prepared as] W3C Working Draft 04 April 2013
    https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html

Summary:  more of the same - mostly editorial - a few issues that will 
hopefully be simple to review.   I'm not quite done, but may have to 
stop for a day or two, so I'm sending this along now.


Details:

    Simply speaking, a context is used to map terms
    <cid:part1.09020209.02030904@w3.org>, to IRIs
    <cid:part2.07000407.04050006@w3.org>.


s/terms,/terms/


    and types that do not match a term
    <cid:part1.09020209.02030904@w3.org> or are neither a compact IRI
    <cid:part4.08090808.05050901@w3.org> nor


s/or are neither/and are neither/


    If multiple embedded JSON-LD documents are extracted as RDF, the
    result is the RDF merge of the extracted datasets.


Alas, there is no defined way to merge RDF datasets.

The problem is that sometimes it's obvious that the merge of
     <g> { <a> <b> 1 }
and
     <g> { <a> <b> 2 }
is
     <g> { <a> <b> 1,2 }
and sometimes it's obvious the two can't be merged because they 
contradict each other.

See: http://www.w3.org/2011/rdf-wg/track/issues/17
RESOLVED: close issue-17 <http://www.w3.org/2011/rdf-wg/track/issues/17> 
-- there is no general purpose way to merge datasets; it can only be 
done with external knowledge.

Proposed solution is to define it here, something like:  If multiple 
embedded JSON-LD documents are extracted as RDF, the result is a dataset 
formed by merging all the graphs that have the same name (and thus 
making a single named graph per graph name) and all the default graphs 
(to make one resulting default graph).

    Figure 1: An illustration of JSON-LD's data model.


Broken image link.

More importantly, the diagram is both misleading and wrong.   It's 
misleading in that each of the nodes is shown as being in exactly one 
graph; nodes are actually allowed to be in multiple graphs, and nearly 
always are.   It's wrong in that it shows two arcs that aren't in any 
graph, when actually every arc has to be in one or more graphs.

I haven't managed to produce a good drawing of this.   Sometimes I think 
of it as color-coding arcs, like this:

http://www.w3.org/Consortium/Offices/Presentations/RDFTutorial/figures/AnimMerge8.png

and somtimes I think of it as layers:

http://www.flickr.com/photos/danbri/3472944745/
http://farm4.static.flickr.com/3613/3384528143_8304792836_b.jpg

although I image the layers closer together, like transparent sheets of 
plastic, each with writing on them.

    Whenever possible, the graph name
    <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-graph-name>
    /SHOULD/ be an IRI
    <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-iri>.

s/possible/practical/      (I think)

    At Risk

I'm a little lost in the AT RISK features.   Can we do it like this: 
http://www.w3.org/TR/2009/CR-owl2-syntax-20090611/#atRisk1  ?   So each 
at-risk feature is identified separately from where it occurs in the 
specs, on a wiki page (rdf-wg/wiki/JSON-LD_Features_at_Risk or 
something).   And each time it comes up in the specs, that is 
referenced, along with a clear explanation for people who've never heard 
of this little feature of the W3C process.

    Within the JSON-LD syntax these edge labels are called properties.

Actually, you use the term somewhat inconsistently -- sometimes you call 
those labels "property names" and sometimes you call them "property 
labels".    I'm not sure this is worth fixing -- I'm probably being 
overly pedantic to mention it -- but in RDF they'd be considered 
property names.  The property itself is the thing denoted by the IRI.  I 
think in general it's fine to call these things "properties" (and skip 
over the detail that they are property names), but maybe in the formal 
model it's better to be precise.

    Issue 217 <https://github.com/json-ld/json-ld.org/issues/217>

    In contrast to the RDF data model as defined in [RDF11-CONCEPTS
    <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#bib-RDF11-CONCEPTS>],
    JSON-LD allows blank nodes as property labels and graph names. Thus,
    some data that is valid JSON-LD cannot be converted to RDF. This
    feature may be removed in the future.

This notion appears a few other times.  As I mention in my review of 
json-ld-api, I think we should say it *can* be converted, it just 
requires Skolemizing.

Also, the At Risk phrasing should be more clear about what the change 
might be.   Something like:  "Based on implementor feedback, the Working 
Group may decide to prohibit the use of blank nodes as property labels 
and graph names."


    A JSON-LD document
    <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-json-ld-document>
    /MUST/ be a single node object
    <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-node-object>
    or a JSON array
    <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-array>
    containing a set of one or more node objects
    <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-node-object>
    at the top level.

How about:   ... or a JSON array 
<https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-array> 
whose elements are each node objects 
<https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-node-object>.


          B.1 Terms

    A term is a short-hand string
    <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-string>
    that expands to an IRI
    <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-iri>
    or a blank node identifier
    <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-blank-node-identifier>.

    A term
    <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-term>
    /MUST NOT/ equal any of the JSON-LD keywords
    <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-keyword>.

    To avoid forward-compatibility issues, a term
    <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-term>
    /SHOULD NOT/ start with an |@| character as future versions of
    JSON-LD may introduce additional keywords
    <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-keyword>.
    Furthermore, the term /MUST NOT/ be an empty string
    <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-string>
    (|""|) as not all programming languages are able to handle empty
    property names.

This whole section concerns me.   Can a term contain a colon? Can it be 
a plain colon?   Can it be an apostrophe?   Can it be a string of 2^32 
ASCII NUL characters?   I rather doubt every implementation will allow 
all of these, but some might, so there could be interoperability 
problems.    And there should be tests in the test suite of all the 
weird ones (but maybe there already are).

    A JSON object
    <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-json-object>
    is a node object
    <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-node-object>
    if it exists outside of a JSON-LD context
    <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-context>
    and:

      * it does not contain the |@value|, |@list|, or |@set| keywords, and
      * it is not the top-most JSON object
        <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-json-object>
        in the JSON-LD document consisting of no other members than
        |@graph| and |@context|.

Ah, I've seen this text before.  :-)    Maybe you've replied on that 
already.    Short version: it'd help to give a name to those things 
mentioned in that last bullet point, at least.  Maybe call them "binder 
objects" or "envelope objects" or something like that.     Actually, I 
think they should have their own section in the Advanced Topics.   (And 
I've already said I don't think they should use the @graph keyword, but 
I gather you decided against me on that.    I'll go check old emails 
later, I hope.)

    the keys of the different node objects
    <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-node-object>
    are merged to create the properties of the resulting node
    <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-node>.

maybe s/are merged/need to be merged/ ?

    Keys in a node object
    <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-node-object>
    that are not keywords
    <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-keyword>
    /MAY/ expand to an absolute IRI
    <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-absolute-iri>
    using the active context
    <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-active-context>.

That use of "MAY" technically means that implementations have the option 
of expanding them or not, right?  Maybe something more like: "Each key 
can be classified as one of: (1) a keyword, (2) a keyword alias, (3) an 
absolute IRI, (4) a relative IRI, convertable to an absolute IRI using 
the active base, (5) a term which expands to an absolute IRI according 
to the active context, or (6) a term which does not expand to an 
absolute IRI, (7) a string which does not conform to the term syntax.   
Keys of type (6) and (7) are ignored."

Actually, writing that makes clear my concern about terms above. How can 
you tell a term from a relative IRI?   Isn't "foo" both? I'd suggest 
that in json-ld relative IRI's be required to contain a "/" character 
and terms be limited to c-identifier syntax.

Also, class (6) keys might well be due to a typo -- is it okay to issue 
warnings on class (6) and class (7) keys, instead of just ignoring them?

    The value associated with the |@type| key /MUST/ be a term
    <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-term>,
    a compact IRI
    <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-compact-iri>,
    an absolute IRI
    <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-absolute-iri>,
    a relative IRI
    <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-relative-iri>,
    or null
    <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-null>.

What does it mean for a @type to be null?   I don't see anything in the 
spec about this case.

    /This section is non-normative.
    /

It seems like there are too many of these....   I think.  How can most 
of the document be non-normative?   For example, how am I supposed to 
know what to do with @index?   If I'm writing a generic JSON-LD display 
tool, do I have to convert it to RDF first?    If not, I'm going to have 
to know what I'm supposed to do with @index.

    Summarized these differences mean that JSON-LD is capable of
    serializing any RDF graph or dataset and most, but not all, JSON-LD
    documents can be transformed to RDF.

Yeah, I guess every RDF graph can be converted to JSON-LD with explicit 
use of the rdf:first and rdf:rest properties.   Ugly, but technically 
correct.

And (again), I'd suggest that every JSON-LD document can be transformed 
to RDF, but with a few losses in the process -- you may need to 
Skolemize, you lose @index information, and any other "ignored" bits.

     -- Sandro

Received on Friday, 29 March 2013 15:25:00 UTC