review of json-ld-syntax from Sandro Hawke on 2013-03-05 (public-rdf-wg@w3.org from March 2013)

From: Sandro Hawke <sandro@w3.org>
Date: Tue, 05 Mar 2013 00:23:20 -0500
To: W3C RDF WG <public-rdf-wg@w3.org>
Message-ID: <51358148.1060507@w3.org>
This is my review of json-ld-syntax, as promised in the last meeting.


Summary: The document is in pretty good shape, and I think the
underlying design is very good. Below, I suggest a few million
editorial changes, a handful of which I think really need to be
addressed before publication (and are marked MEDIUM or SERIOUS). I
also raise a handful of concerns about the design, but I think they
can probably all be dealt with in a few minutes of conversation. I think
everything not marked MEDIUM or SERIOUS is fairly trivial.


I reviewed the latest editor's draft:
https://dvcs.w3.org/hg/json-ld/raw-file/e582aaa9ee43/spec/latest/json-ld-syntax/index.html

I did not read the json-ld-api. I did play around with the json-ld
"playground" site after I was into the appendiced. I haven't reviewed
Appendix B yet; I'll try to get to that soon, but it's going to take 
more brain
cells than I have left tonight.

Without further ado...

 > In an attempt to harmonize the representation of Linked Data in JSON

My first comment turns out to be, I think, the most utterly trivial. Sorry.

My sense is that one "harmonizes" the elements in a set (by modifying
them to make them more similar or related in some way); I don't know
what it means to harmonize a single item like this.

 > ; mixing both Linked Data and non-Linked Data in a single document.

The clause after a semicolon should be a complete sentence.
Change to a comma or rephrase.

 > the name IRIs, when dereferenced, provide more information about the name

I think they provide information about the named thing. I don't really 
like this paraphrasing of the LD principles, and I don't think it's 
helpful to the document here. I'd suggest providing some references instead.

 > Since JSON-LD is 100% compatible with JSON the large number

comma needed after "JSON"

 > Additionally to all the features JSON provides,

How about: "In addition to ..."

 > the ability to express the language associated with a string

? maybe add "natural"

add comma at the end of the item

 > weights, and distances,

MEDIUM

Really? I pretty much never see people doing that with datatypes.

 > Software developers that

s/that/who/ on each line

 > This specification does not describe the programming interfaces for
the JSON-LD Syntax. The specification that describes the programming
interfaces for JSON-LD documents is the JSON-LD Application
Programming Interface [JSON-LD-API].

How about: A companion document, The JSON-LD Application Programming
Interface [JSON-LD-API], specifies how to work with JSON-LD
at a higher level: it provides a standard library
interface for common JSON-LD operations. Although that
document is not required for understanding and working with
JSON-LD, for some readers it will be a better starting
point.

 > A number of design goals were established before the creation of
this markup language:

I don't think the history matters.

How about: JSON-LD satisfies the following design goals:

 > language. We should focus on simplicity when possible.

I don't think that's what you mean. I think you mean simplicity is
paramount.

How about: to the language, so sometimes we do not achieve Zero Edits.

 > A character is represented as a single character string.

Hard to parse.

How about: A character is represented using a string of length one.

 > and that leading zeros are not allowed.
^^^^ omit "that"

 > Used to specify the native language

s/native/natural (human)/

 > For the avoidance of doubt, all keys, keywords, and values in JSON-LD 
are case-sensitive.

Awkward phrase.

s/For the avoidance of doubt, all/All/

 > Conformance

SERIOUS

It's somewhat odd that all one needs for conformance is appendix B.
So what are the other normative parts of this document for...?

I think there may be a notion of a conformant JSON-LD generator or
parser here, too -- one that follows the rules of the rest of this
spec. That should be stated here.

 > different concepts instead of terms such as "name", "homepage", etc.

I think, in this case, the word "terms" should NOT be linked to
#dfn-term because you DON'T mean "term" in the JSON-LD sense, here.
This is supposed to be the pre-JSON-LD counter-example.

 > a context is used to map terms, i.e., properties with associated 
values, to IRIs.

Uh, that doesn't match the definition in #dfn-term. Is a term really
a property with its associated value? I don't think so.

How about: s/i.e., properties with associated values/such as the keys in 
an object structure/

 > Expanded term definitions may be defined using absolute or compact 
IRIs as keys, which is mainly used to associate type or language 
information with an absolute or compact IRI.

This is the first sentence in the document where I have no idea what
it means, because it uses concepts not introduced yet. Maybe this can
be dropped? Or maybe I'll just have to get it on the second pass.

Later -- Yeah, I'd just drop that sentence, I think.

 > This information gives the data global context and allows developers 
to re-use each other's data without having to agree to how their data 
will interoperate on a site-by-site basis.

I find the re-use of the word "context" awkward here.

How about: This information allows developers to re-use each other's 
data without having to agree to how their data will interoperate on a 
site-by-site basis.

 > External JSON-LD context documents may contain extra information 
located outside of the @context key,

That makes me wonder if it can be HTML, to be more readable. There would 
have to be some standard way to find the @context json in the HTML....

Later - I see it can't. Okay, con-neg works, too.

 > EXAMPLE 5

after this example I was expecting the next example to use a Link header 
(what turns out to be EXAMPLE 29). Maybe mention it here?

 > EXAMPLE 6 -- In the example above, the key http://schema.org/name is 
interpreted as an absolute IRI because it contains a colon (:) and the 
"http" prefix does not exist in the context.

Now would be a perfect place to have a relative IRI example. You've
just talked about there being absolute and relative IRIs, and given an
example only of absolute ones.

 > JSON keys that do not expand to an absolute IRI are ignored, or 
removed in some cases, by the [JSON-LD-API]. However, JSON keys that do 
not include a mapping in the context are still considered valid 
expressions in JSON-LD documents—the keys just don't expand to 
unambiguous identifiers.

This is kind of weird. It doesn't tell me what I'm supposed to do; it
just confuses me.

I guess it means they're like comments, and to be ignored?

This is where we need a clear notion of a processor that reads JSON-LD 
and extracts all the triples and quads from it, it seems to me.

 > EXAMPLE 8

It's confusing to have @type here. Maybe stick to just showing
@vocab, and not also introducing something we haven't seen yet.

Later -- I see @type is never defined at all. Sigh. I guess it's
consider an API thing.

 > An IRI is generated when a JSON object is used in the value position 
and contains an @id keyword:

This is the first place you use the word "generated" and it's not at
all clear what it means. If we were talking about mapping to RDF it
would make sense.

 > To be able to externally reference nodes in a graph, it is important 
that each node has an unambiguous identifier. IRIs are a fundamental 
concept of Linked Data, and nodes should have a de-referenceable 
identifier used to name and locate them. For nodes to be truly linked, 
de-referencing the identifier should result in a representation of that 
node. Associating an IRI with a node tells an application that it can 
fetch the resource associated with the IRI and get back a description of 
the node.

I'm not a fan of this paragraph. Can we just delete it?


 > A node is identified using the @id keyword:

Maybe clarify that @id is overloaded, and it means something different
used like this than used as either a key or a value in a context?

It'd be a little more clear if EXAMPLE 11 didn't use @id in all three
different ways. How about taking the context out of the example, and
just having something like:


{
"@id": "http://manu.sporny.org/#me"
"http://schema.org/name": "Manu Sporny",
}

(or some other example where an @id is more appropriate)

 > end of Section 5

As I come to Section 6 being marked normative, I see Section 5 was
neither informative nor normative.

 > A document on the Web that defines one or more IRIs for use as 
properties in Linked Data is called a vocabulary.

Don't conflate documents with vocabularies, please.

See:
https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-concepts/index.html#vocabularies

I would just drop that whole paragraph. It's motivational, not spec
text. And they're wonderfully motivated in the next paragraph anyway.

 > 6.1 Compact IRIs

 > Prefixes are expanded when the form of the value is a compact IRI 
represented as a prefix:suffix combination, and the prefix matches a 
term defined within the active context

 > Terms are interpreted as compact IRIs if they contain at least one 
colon and the first colon is not followed by two slashes (//, as in 
http://example.com)

These sentences contradict each other. Do slashes prevent recognizing
things as compact IRIs or not? I'd suggest not -- that's just extra
code that wont be helpful, IMHO. (TEST CASE?)

 > EXAMPLE 17

"foaf": "http://xmlns.com/foaf/0.1/",
"foaf:homepage": { "@type": "@id" },

Would that work if the order was reverse? I guess so, since JSON
doesn't preserve order. Maybe clarify that, and maybe put them in
the other order in the example. (TEST CASE?)

Later -- Oh, I see this is covered well in section 6.9. Maybe near
Example 17 say this is covered in more detail in section 6.9....?

 > 6.3 Type Coercion

MEDIUM

Okay, this overloading of @ keywords goes too far with @vocab serving
a completely different purpose (from normal @vocab) in this
situation. That's just silly.

Maybe we could at least have a table showing what how the meanings
differ in different places in the structure.

 > EXAMPLE 22

I've read this about 6 times and I can't make sense of it. That is, I
think the example makes perfect sense, but the paragraph after it,
explaining it, does not. When you say "not a prefix:suffix construct"
maybe you mean "not a string"?

 > Duplicate context terms are overridden using a last-defined-wins 
mechanism.

SERIOUS

That means you can't use natural JSON parsing, doesn't it? If I read
EXAMPLE 24 with a JSON parser into a nested object, then I don't know
the order of the @context blocks.

 > Note that this is rarely a good authoring practice

That doesn't go far enough. You could allow nesting to make Example
24 work, but I don't think it's okay to use order-of-statements.

 > It is a best practice to put the context definition at the top of the 
JSON-LD document.

MEDIUM

I don't agree. You're telling me I'm going against best practice to
build and object in memory and let my JSON serializer turn it into
JSON.

 > The @context subtree within that object is added to the top-level 
JSON object of the referencing document.

What if there's more than one @context subtree? Do you mean the merge
of all the @context subtrees? [TEST CASE]

 > end of 6.5

Thinking about this, I'd rather like .well-known/host-context.jsonld
as another place I can look. So if I'm trying to get RDF triples, and
I just get application/json, and there's no Link Header, I can look
for a host-context file. I dunno -- maybe everyone can set a Link
header easily enough.


 > For instance, in the example below the databaseId member would be 
ignored by a JSON-LD processor.

MEDIUM

This speaks to conformance. "JSON-LD processor" (maybe "consumer")
needs to be defined in the Conformance clause, and s/should not/MUST
not/ (with maybe some more rewriting).

 > This method can be accomplished by using the following markup pattern:

"markup"? JSON isn't markup, as I understand the word. Can you just drop 
the word from the sentence?

(glancing at appendix B for something)
 > To avoid forward-compatibility issues, a term should not start with 
an @ character

MEDIUM

Why only SHOULD NOT? Why not MUST NOT? The damage if they do is 
considerable.

Also, you kind of need to say what processors MUST do if they see a
keyword term they don't know -- ie one from the future. The options
are: ignore (if you can figure out what/how much to ignore); or halt;
or issue a warning to the user.

 > NOTE: The use of @container in the body of a JSON-LD document has no 
meaning

That doesn't seem worth saying here. I assume it's ruled out in Appendix B.

 > 6.11 Embedding

Odd section. It seems to have forgotten this was introduced as a
graph syntax. The main thing to highlight is that this is syntactic
sugar; sometimes it's nice to syntactically embed the node in one of
the places that had a link to it.

 > Example 46

SERIOUS

I suspect the first row of the table is wrong. I would think only the
triples inside the value associated with the @graph key would go
inside the graph. Please clarify which it is, and correct the table
if necessary.

 > Example 47, 48

MEDIUM

It seems very confusing to use @graph for this. Can't you find a more
direct way to do this?

It seemed from stuff earlier (around Example 22) that in Example 48
you wouldn't need to repeat the @context, because it occured earlier.
But maybe that example-22 stuff was wrong, and what was really meant
there was "closer to the root of the JSON object tree". No, that
can't be right, either. I cannot see any sensible rules for which
contexts are in effect at any point in the json tree.

How about this as a hack that's more elegant:

[
{ "@context": ...
}
{
"@id": "http://manu.sporny.org/i/public",
"@type": "foaf:Person",
"name": "Manu Sporny",
"knows": "http://greggkellogg.net/foaf#me"
},
{
"@id": "http://greggkellogg.net/foaf#me",
"@type": "foaf:Person",
"name": "Gregg Kellogg",
"knows": "http://manu.sporny.org/i/public"
}
]

... with a rule that an object that has JUST a @context key, and no
other keys, is actually omitted from arrays. That seems like a
cleaner hack than using the @graph keyword. Keep @graph for when
people really want named graphs.

 > 6.13 Identifying Blank Nodes

This is okay, but it would be pretty easy and much more in keeping
with the style of the document to avoid mentioning RDF, even here.

Something like:

For some topologies of the graph of nodes being expressed in
JSON-LD, such as topologies with loops, embedding along cannot be
used, and @id must be used to connect the nodes. In some cases,
one may not want to name nodes with IRIs. In these situations,
one can use "blank node identifiers", which look like IRIs but
with _ (underscore) as the scheme name. For example:

{ @id: _:n1,
name: Secret Agent 1
knows:
{ name: Secret Agent 2
knows: { @id: _:n1 }
}
}

In this case, we do not want to assign IRIs to the two people, but
we want to express that they know each other. We can say SA1
knows SA2 using embedding, but to say SA2 knows SA1 we need to use
a blank node identifier.


 > Every statement in the context having a keyword as the key (as in { 
"@type": ... }) will be ignored when being processed.

I think you mean this only for keywords that are known to be
meaningless when used as keys in a @context. I think it would be
better to make this an error. But the bigger question is about
forward compatibility -- MUST processors ignore all keyword keys in
contexts? (Are any allowed, with meaning? I don't see any.)

 > 6.15 and 6.16

These should probably be marked non-normative. There's nothing here I
need to know to work with JSON-LD (although it's very cool and all).

 > 6.17 Data Indexing

Not sure how I feel about this. It's kind of weird, but pretty
harmless, I guess.

I'm not sure it would work, but an alternative design would be to have
a particular property be @index'd. So instead of:
"@container": "@index"
in the context we'd say
"@index": "lang"
and then the stuff in green would be equivalent to:

"post": [
{
"lang": "en",
"@id": "http://example.com/posts/1/en",
"body": "World commodities were up today with heavy trading of crude 
oil...",
"words": 1539
},
{ lang: "de",
"@id": "http://example.com/posts/1/de",
"body": "Die Werte an Warenbörsen stiegen im Sog eines starken Handels 
von Rohöl...",
"words": 1204
}
]

I think that would provide the same functionality, but without these
keys that aren't really in the data. It would let you cleverly
generate JSON-LD like this from plain triples, if given the right
context. (You'd have to have triples with the same S and P, where
each O differs in the value of a DataProperty, as in this example.)

 > A. Data Model

What happens if the same @graph @id is used in two places? are the
graphs merged, or what? Shouldnt the spec say? Or is that left to
the API document as well? (it's a lot more than an API.) (in TriG
they are merged)

In general, I found Appendix A very confusing, and I'm thoroughly
familiar with the RDF data model. This does not bode well for JSON
folks. Do they need to understand this section, or can it be marked
non-normative?

 > Whenever possible, an edge should be labeled with an IRI.

As far as I can tell, from reading the spec up to this point, if it
doesn't have an IRI, it's ignored -- and thus not part of the data
model. Several times you say terms that dont map to IRIs are ignored.

 > This section is normative; This section is non-normative

SERIOUS

These labels seem to be applied inconsistently.

 > The JSON-LD Algorithms and API specification [JSON-LD-API] defines 
the conversion rules between JSON's native data types and RDF's 
counterparts to allow full round-tripping.

SERIOUS EDITORIAL

I really don't like the mapping-to-RDF being left to another, later
spec. I can live with it just being shown in the examples, except for
not knowing what happens with numbers. From the playground I see
integers end up as xsd:integer and otherwise they are xsd:double,
which is simple enough, but should really be said in this document, or
at least shown in an example.

(I see a bug in the playground. If you use too large an integer, it
converts the lexrep to being in scientific notation.)

 > In JSON-LD lists are part of the data model whereas in RDF they are 
part of a vocabulary, namely [RDF-SCHEMA].

Doesn't JSON-LD also have sets? As I read the spec, it seemed like
@collection: @set had some semantics, in addition to being a directive
to keep singletons in arrays. A set-valued property is somewhat
different from a repeated property.

 > The JSON-LD context has direct equivalents for the Turtle @prefix 
declaration:

True, but that doesn't seem to be what the examples are showing. I'd
just drop that line.

 > Appendix B

Not really reviewed at this time.

 > E. IANA Considerations
 > This section is non-normative.

SERIOUS

Actually, I think this section is Normative, like the profile stuff.

 > will be submitted to the Internet Engineering Steering Group if this 
specification becomes a W3C Recommendation.

MEDIUM

Actually it goes at Last Call, as per
http://www.w3.org/2002/06/registering-mediatype

 > To request or specify Expanded JSON-LD document form, the IRI 
http://www.w3.org/ns/json-ld#expanded SHOULD be used.

SERIOUS

I can't figure out who the SHOULD applies to. Do you mean:

if you want the expanded form, you SHOULD ask for it with this profile

(which I think would be silly) or do you mean:

if you receive a request that includes this profile parameter, you 
SHOULD return expanded form

? I guess the latter, but that's not what it says. I would think you'd 
use normal media-type rules here -- if you can't provide it in expanded 
form, then you're not providing it, and fallback to another media type.

 > Published specification: The JSON-LD specification.

This should be plain text, and the URL should be updated. I guess it 
will be http://www.w3.org/TR/json-ld-syntax

 > Fragment identifiers used with application/ld+json resources may 
identify a node in a JSON-LD graph expressed in the resource. This 
idiom, which is also used in RDF [RDF-CONCEPTS], gives a simple way to 
"mint" new, document-local IRIs to label nodes and therefore contributes 
considerably to the expressive power of JSON-LD.

MEDIUM

I have no idea what this text is trying to say. For my best guess, 
please replace it with:

Fragment identifiers used with application/ld+json are treated as
in other RDF syntaxes, as per RDF Concepts (link to
http://www.w3.org/TR/rdf11-concepts/#section-fragID) [RDF-CONCEPTS]

 > References

Some of them are out of date, like TURTLE-TR. Also, the reference style 
isn't correct -- it only has the dated links.

---

That's it. I'll try to get to Appendix B. before the meeting, but I 
wanted to send this early enough that it can be read & digested before 
Wednesday's meeting.

Keep up the great work, guys. I only point out all these places for 
improvement because I think this is so important and want it to have the 
best chance it can.

-- Sandro
Received on Tuesday, 5 March 2013 05:23:34 UTC