- From: Sandro Hawke <sandro@w3.org>
- Date: Fri, 29 Mar 2013 11:24:39 -0400
- To: W3C RDF WG <public-rdf-wg@w3.org>
- Message-ID: <5155B237.8000407@w3.org>
This is a partial follow-up review of json-ld. Here I'm reviewing: JSON-LD 1.0 A JSON-based Serialization for Linked Data [prepared as] W3C Working Draft 04 April 2013 https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html Summary: more of the same - mostly editorial - a few issues that will hopefully be simple to review. I'm not quite done, but may have to stop for a day or two, so I'm sending this along now. Details: Simply speaking, a context is used to map terms <cid:part1.09020209.02030904@w3.org>, to IRIs <cid:part2.07000407.04050006@w3.org>. s/terms,/terms/ and types that do not match a term <cid:part1.09020209.02030904@w3.org> or are neither a compact IRI <cid:part4.08090808.05050901@w3.org> nor s/or are neither/and are neither/ If multiple embedded JSON-LD documents are extracted as RDF, the result is the RDF merge of the extracted datasets. Alas, there is no defined way to merge RDF datasets. The problem is that sometimes it's obvious that the merge of <g> { <a> <b> 1 } and <g> { <a> <b> 2 } is <g> { <a> <b> 1,2 } and sometimes it's obvious the two can't be merged because they contradict each other. See: http://www.w3.org/2011/rdf-wg/track/issues/17 RESOLVED: close issue-17 <http://www.w3.org/2011/rdf-wg/track/issues/17> -- there is no general purpose way to merge datasets; it can only be done with external knowledge. Proposed solution is to define it here, something like: If multiple embedded JSON-LD documents are extracted as RDF, the result is a dataset formed by merging all the graphs that have the same name (and thus making a single named graph per graph name) and all the default graphs (to make one resulting default graph). Figure 1: An illustration of JSON-LD's data model. Broken image link. More importantly, the diagram is both misleading and wrong. It's misleading in that each of the nodes is shown as being in exactly one graph; nodes are actually allowed to be in multiple graphs, and nearly always are. It's wrong in that it shows two arcs that aren't in any graph, when actually every arc has to be in one or more graphs. I haven't managed to produce a good drawing of this. Sometimes I think of it as color-coding arcs, like this: http://www.w3.org/Consortium/Offices/Presentations/RDFTutorial/figures/AnimMerge8.png and somtimes I think of it as layers: http://www.flickr.com/photos/danbri/3472944745/ http://farm4.static.flickr.com/3613/3384528143_8304792836_b.jpg although I image the layers closer together, like transparent sheets of plastic, each with writing on them. Whenever possible, the graph name <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-graph-name> /SHOULD/ be an IRI <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-iri>. s/possible/practical/ (I think) At Risk I'm a little lost in the AT RISK features. Can we do it like this: http://www.w3.org/TR/2009/CR-owl2-syntax-20090611/#atRisk1 ? So each at-risk feature is identified separately from where it occurs in the specs, on a wiki page (rdf-wg/wiki/JSON-LD_Features_at_Risk or something). And each time it comes up in the specs, that is referenced, along with a clear explanation for people who've never heard of this little feature of the W3C process. Within the JSON-LD syntax these edge labels are called properties. Actually, you use the term somewhat inconsistently -- sometimes you call those labels "property names" and sometimes you call them "property labels". I'm not sure this is worth fixing -- I'm probably being overly pedantic to mention it -- but in RDF they'd be considered property names. The property itself is the thing denoted by the IRI. I think in general it's fine to call these things "properties" (and skip over the detail that they are property names), but maybe in the formal model it's better to be precise. Issue 217 <https://github.com/json-ld/json-ld.org/issues/217> In contrast to the RDF data model as defined in [RDF11-CONCEPTS <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#bib-RDF11-CONCEPTS>], JSON-LD allows blank nodes as property labels and graph names. Thus, some data that is valid JSON-LD cannot be converted to RDF. This feature may be removed in the future. This notion appears a few other times. As I mention in my review of json-ld-api, I think we should say it *can* be converted, it just requires Skolemizing. Also, the At Risk phrasing should be more clear about what the change might be. Something like: "Based on implementor feedback, the Working Group may decide to prohibit the use of blank nodes as property labels and graph names." A JSON-LD document <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-json-ld-document> /MUST/ be a single node object <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-node-object> or a JSON array <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-array> containing a set of one or more node objects <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-node-object> at the top level. How about: ... or a JSON array <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-array> whose elements are each node objects <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-node-object>. B.1 Terms A term is a short-hand string <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-string> that expands to an IRI <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-iri> or a blank node identifier <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-blank-node-identifier>. A term <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-term> /MUST NOT/ equal any of the JSON-LD keywords <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-keyword>. To avoid forward-compatibility issues, a term <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-term> /SHOULD NOT/ start with an |@| character as future versions of JSON-LD may introduce additional keywords <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-keyword>. Furthermore, the term /MUST NOT/ be an empty string <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-string> (|""|) as not all programming languages are able to handle empty property names. This whole section concerns me. Can a term contain a colon? Can it be a plain colon? Can it be an apostrophe? Can it be a string of 2^32 ASCII NUL characters? I rather doubt every implementation will allow all of these, but some might, so there could be interoperability problems. And there should be tests in the test suite of all the weird ones (but maybe there already are). A JSON object <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-json-object> is a node object <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-node-object> if it exists outside of a JSON-LD context <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-context> and: * it does not contain the |@value|, |@list|, or |@set| keywords, and * it is not the top-most JSON object <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-json-object> in the JSON-LD document consisting of no other members than |@graph| and |@context|. Ah, I've seen this text before. :-) Maybe you've replied on that already. Short version: it'd help to give a name to those things mentioned in that last bullet point, at least. Maybe call them "binder objects" or "envelope objects" or something like that. Actually, I think they should have their own section in the Advanced Topics. (And I've already said I don't think they should use the @graph keyword, but I gather you decided against me on that. I'll go check old emails later, I hope.) the keys of the different node objects <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-node-object> are merged to create the properties of the resulting node <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-node>. maybe s/are merged/need to be merged/ ? Keys in a node object <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-node-object> that are not keywords <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-keyword> /MAY/ expand to an absolute IRI <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-absolute-iri> using the active context <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-active-context>. That use of "MAY" technically means that implementations have the option of expanding them or not, right? Maybe something more like: "Each key can be classified as one of: (1) a keyword, (2) a keyword alias, (3) an absolute IRI, (4) a relative IRI, convertable to an absolute IRI using the active base, (5) a term which expands to an absolute IRI according to the active context, or (6) a term which does not expand to an absolute IRI, (7) a string which does not conform to the term syntax. Keys of type (6) and (7) are ignored." Actually, writing that makes clear my concern about terms above. How can you tell a term from a relative IRI? Isn't "foo" both? I'd suggest that in json-ld relative IRI's be required to contain a "/" character and terms be limited to c-identifier syntax. Also, class (6) keys might well be due to a typo -- is it okay to issue warnings on class (6) and class (7) keys, instead of just ignoring them? The value associated with the |@type| key /MUST/ be a term <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-term>, a compact IRI <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-compact-iri>, an absolute IRI <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-absolute-iri>, a relative IRI <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-relative-iri>, or null <https://dvcs.w3.org/hg/json-ld/raw-file/a1bc3776ed3a/spec/WD/json-ld-syntax/20130404/index.html#dfn-null>. What does it mean for a @type to be null? I don't see anything in the spec about this case. /This section is non-normative. / It seems like there are too many of these.... I think. How can most of the document be non-normative? For example, how am I supposed to know what to do with @index? If I'm writing a generic JSON-LD display tool, do I have to convert it to RDF first? If not, I'm going to have to know what I'm supposed to do with @index. Summarized these differences mean that JSON-LD is capable of serializing any RDF graph or dataset and most, but not all, JSON-LD documents can be transformed to RDF. Yeah, I guess every RDF graph can be converted to JSON-LD with explicit use of the rdf:first and rdf:rest properties. Ugly, but technically correct. And (again), I'd suggest that every JSON-LD document can be transformed to RDF, but with a few losses in the process -- you may need to Skolemize, you lose @index information, and any other "ignored" bits. -- Sandro
Received on Friday, 29 March 2013 15:25:00 UTC