RE: More review of JSON-LD syntax from Markus Lanthaler on 2013-03-14 (public-rdf-wg@w3.org from March 2013)

From: Markus Lanthaler <markus.lanthaler@gmx.net>
Date: Thu, 14 Mar 2013 11:25:00 +0100
To: "'W3C RDF WG'" <public-rdf-wg@w3.org>
Cc: "'Charles Greer'" <cgreer@marklogic.com>
Message-ID: <008601ce209e$31401d10$93c05730$@lanthaler@gmx.net>
On Wednesday, March 13, 2013 7:38 PM, Charles Greer wrote:

> Overall:
> 
> The document presents the syntax in a reasonably clear way.  The one
> exception to this is the intersection of terms, absolute IRIs, compact
> IRIs and relative IRIs.

Yes, that's tricky to explain properly. I've clarified the grammar section
(before I got your review) which now clearly states where
terms/abs./rel./comp. IRIs are allowed.


> Until I came to flattening, I thought that JSON-LD was subject to a
> lot of the same problems as RDF/XML.  My concern had to do with
> manipulating structures as JSON - if there are a lot of ways to
> represent something, then one gets into a lot of issues with finding
> data within the structure.  Flattening seems to get rid of most of
> those concerns - it should probably be foregrounded as a good
> canonical representation if you can go that far.

I completely agree. I've added [1] a section similar to expanded/compacted
document form: [2].


> Otherwise this review is mainly editorial nits:
> 
> The Nits:
> 
> "a way to disambiguate the keys used between multiple JSON documents
> "by mapping them to IRIs via a context,
> 
> "keys used between sounds awkward to me (conflates identify with
> "reference) how about shared among?

Thanks, fixed in [1].


> 1.1  I think this characterization of JSON-LD is incorrect: "a
> serialization of Linked Data in JSON."   From what I'm reading, JSON-
> LD is a method for encoding linked data within JSON documents and
> generating RDF from them.  While it's possible to create JSON-LD
> documents that are serializations of linked data, the focus of this
> document presents JSON-LD as a superset of RDF.  Many things about
> JSON-LD rely on document scope, and a JSON-LD can contain much more
> than just the RDF within.  You've probably gone over this point many
> times before, but JSON-LD seems to be much more about authoring or
> incrementally creating Linked-Data-ready JSON than it is about writing
> out Linked Data as JSON.

Not sure what to do with this. Do you have something concrete in mind I
could use instead?


> 2. Design Goals Expressiveness:  Repetitive use of 'to be able to
> express.'  You'll want to reword one of those.  My sense is that
> syntax expresses a graph, but graphs don't express a data model.

Changed [1] to: 

"The syntax must be able to serialize directed graphs. This ensures that
almost every real world data model can be expressed."


> Zero-edits You have a missing reference "(see )."

Fixed.


> 5. Basic concepts A note on 'serialization' above -- dereferencing
> contexts make JSON-LD really different from other serializations of
> RDF.  Perhaps that's why you've shied away from the term "RDF."  Maybe
> only documents that are fully expanded/dereferenced actually conform
> to RDF.  It means that without the ability to dereference a context,
> the JSON-LD document has different data in it than it would were the
> context fully realized.

Obviously, if the context changes, the data changes as well. I wouldn't go
as far as saying that only expanded JSON-LD conforms to RDF. The situation
is similar to RDFa which has some predefined prefixes [3].


> 5.2 I find the introduction of relative IRIs disorienting here.  It's
> taken up later in the document, but not completely; this paragraph has
> the only mention of "base IRI" in the document, and the reference to
> 'directory path' seems to just muddy the issue further. In general
> the interaction between relative IRIs and other terms seems to be a
> difficult part of this document to understand.  As an example, it
> would seem that using @vocab would rid a document of relative IRIs --
> you might want to state that explicitly as a #5 at the end of this
> section "unmatched terms are relative IRIs"

I removed the "directory path" fragment [1] and there's also a new example
showing how a relative IRI might be used. The grammar section makes it clear
where relative IRIs can be used. Furthermore, there's now a section Base IRI
[4] which references RFC3986 and explains the @base keyword and a section
Default Vocabulary [5] explaining @vocab.

Does this address your concerns?


> 6 Advanced Concepts
> 
> On Compact IRIs, it surprises me that this is part of the normative
> section.  I can see why it is, but nonetheless it might be useful to
> point out why a separate syntax is part of this document, as opposed
> to an updated version of CURIE.  (Please disregard this comment if I'm
> being silly).

Simply speaking, in JSON-LD there are no restrictions at all except that, by
definition, the prefix cannot contain a colon (terms can but they will never
be selected as prefixes as they won't match anything).


> If a prefix:suffix pattern is not matched in the context, is it a
> relative IRI? (in 6.3 this is prohibited - we have a hole)

No, an absolute IRI -- that's also what the current text says btw. :-)


> 6.2 "native JSON type such as number, true, or false." Shouldn't this
> read "number or boolean"  true and false aren't types but values.

In JSON there's no boolean type but there are just the two values true and
false. Don't ask me why.. Why just reused the language used in RFC 4627.


> "A value type specifies the unit of measurement  This wording seems
> "wrong.  A date isn't a unit of measurement but it's still a range.  I
> "can't think of a better way of putting this though.  Also, I've never
> "thought of 'meters' as a value type.  I'd use a decimal-typed number
> "to represent meters.  Something is wrong with this notion.

Changed [1] to:

"A value type specifies the data type of a particular value, such as an
integer, a floating point number, or a date."


> 6.3 You mention correctly that the homepage property is ordered in
> example 21.  It reads strangely because there's no mention yet in the
> doc about how to order items.  Just parenthetically mentioning @list
> would help:
> 
> " property which explicitly represents an ordered list (with the
> " @container key)"

Good spot. I removed @list from this example [1]. As you noted, it is
introduced in detail later.


> 6.4 "last-defined-wins mechanism."  This looks more like a "most
> recently defined" mechanism, because of nested scopes.  I could be
> misinterpreting "last-defined-wins" though.

I, as a non-native speaker, can't really see a difference. It's not the
temporally last (which most recently would suggest to me) but the "closest"
one if you look from the current element towards the tree's root.


> 6.5  application/ld+json is introduced in a slightly jarring way.
> Moreover, there's a MUST stipulation attached to its usage, but later
> in the document its usage is MAY identify a node.  I'm just confused
> by this paragraph.

You are referring to this sentence:

"Please note that JSON-LD documents served with the application/ld+json
media type MUST have all context information, including references to
external contexts, within the body of the document. Contexts linked via a
http://www.w3.org/ns/json-ld#context HTTP Link Header MUST be ignored for
such documents."

I don't understand what you mean by "later in the document its usage is MAY
identify a node". The intention of this paragraph is to say that, if a
document is server as application/ld+json the context must be referenced
from within the document and not via a HTTP Link header. In other words, if
you want to use the link header, you must serve the document as
application/json.


> Does use of @language in the context mean that it will be applied to
> ALL strings in the document?  It looks like yes.  I'd put a big
> warning on this; it's risky to assume.

Yes. I've added [1] the following sentence:

"The default language applies to all string values that are not type
coerced."


> 6.6 Example 29 provides a method for identifying languages within key
> names.  I see why this works, but you might consider removing it to
> encourage more uniform language-tagging practice.  In other words, I'd
> prefer to see just "occupation" as a key with the @container method.
> I'm uncomfortable with so many ways to handle language tags, even
> though what you've got is internally consistent.

This is how most multi-lingual JSON is currently expressed. The advantage of
doing it this way is that you can access the desired language directly
(doc.occupation.en for the English string) instead of having to filter the
occupation array. It's always a tradeoff, but we believe that the API is
powerful enough to deal with this. If you prefer to have just the occupation
as key, just expand the document and re-compact it with a context that
doesn't use the container.

See http://bit.ly/ZAKzLm for a live example.


> Note -- "Language associations can only be applied to plain literal
> strings. Typed values or values that are subject to 6.3 Type Coercion
> cannot be language tagged."  Does this mean that these invalid
> language keys are ignored or raise an error?

Clarified [1] as follows:

"Language associations are only applied to plain strings. Typed values or
values that are subject to 6.5 Type Coercion are not language tagged."


> 6.14 Expanded Document Form and 6.15 compact form.  in api doc these
> are non normative.  Perhaps you don't mean that the API doc defines
> them, just refers to them?

The text says:

"The JSON-LD Processing Algorithms and API specification [JSON-LD-API]
defines a method for expanding"

The API spec defines (normatively) the algorithms to expand/compact
documents. The result of those algorithms are documents in
expanded/compacted document form.


> Appendix A I don't think a JSON-LD document serializes a collection of
> graphs.  Maybe you can define a subset of JSON-LD that does, however.

Well, it serializes a RDF Dataset which is defined as "a collection of RDF
graphs" in RDF Concepts. Why do you think it doesn't


> Restrictions on JSON-LD that make it serialized RDF might also help
> with document identity/signing (no references to external contexts, no
> blank node identifiers as graph names)
> 
> Just for my own edification, why MUST NOT? "A JSON-LD graph must not
> contain unconnected nodes, i.e., nodes which are not connected by an
> edge to any other node."

That was added due to feedback from the RDF WG (sorry, can't remember who
exactly it was). RDF doesn't allow free-floating nodes, they are not
expressing anything (except that the node exists which isn't really
informative given the OWA), and so we added a MUST NOT to the data model. In
fact, free-floating nodes are dropped during processing.


> "A blank node is a node... neither, nor, or.  There's some unclear
> "parallelism among these prepositions.

Changed [1] it to: neither, nor, nor -- not sure however if that's correct.
Native speakers?


> In Issue 217 box, please remove 'controversial' in favor or a less
> controversial word.

:-) Changed to [1]:

"Thus, some data that is valid JSON-LD cannot be converted to RDF. This
feature may be removed in the future."


> "JSON-LD documents may contain data that cannot be represented by the
> "data model defined above. Unless otherwise specified, such data is
> "ignored when a JSON-LD document is being processed. This means, e.g.,
> "that properties which are not mapped to an IRI or blank node will be
> "ignored.  This statement seems to allow for nodes without edges, but
> "I guess the point is you won't know they're nodes in that case?

This statements means that you can put data in your JSON-LD document that
can't be represented in the data model defined above. For example edges
(properties) that are just strings. Such things are ignored when being
processed and thus, e.g., dropped in expansion. Simply speaking it means
that your documents are valid even if some things aren't mapped to IRIs.
They will just be ignored when being interpreted as JSON-LD.


> Appendix B
> 
> "All keys which are not IRIs, compact IRIs, terms valid in the active
> "context, or one of the following keywords must be ignored when
> "processed: This points to some problem with the concept of a relative
> "IRI again.

Not sure what this has to do with relative IRIs!?


> I don't understand B.4.  Like Sandro I feel that there's something
> amiss with data indexing.  It looks suspiciously like @rdf:resource.

B.4. is describing Index Maps. The feature is there to allow developers to
structure (re-structure) the data in a way that it is easier to work with.
This was kind of a compromise because indexing using arbitrary properties
(as Sandro suggested) was considered to be too complex (at least for JSON-LD
1.0). You can put whatever you want in the index, it doesn't matter. You
could, e.g., put the nodes IRI in the index and then create a map so that
you can efficiently access the various nodes.
 

> I really appreciate the effort put into 'flattened view' and think it
> should be foregrounded in the main body of the document.  It's even
> more important than compaction I think.

Does the section I added do it justice?


> B6 - must a list + set contain objects of all the same type?  You
> might want to be explicit about an error if so.

I've improved that part already. The text now says: ... "or an array of zero
or more of the above possibilities"


> I appreciate all of the examples in Appendix D a lot.
> 
> That wraps up what I've to say overall.  It was a pleasure to review
> this document.


Thank you very much for your feedback,
Markus


[1]
https://github.com/json-ld/json-ld.org/commit/0111ecb395d23b50c4ab413d099fbb
4949d0c7a5
[2] http://json-ld.org/spec/latest/json-ld-syntax/#flattened-document-form
[3] http://www.w3.org/2011/rdfa-context/rdfa-1.1
[4] http://json-ld.org/spec/latest/json-ld-syntax/#base-iri
[5] http://json-ld.org/spec/latest/json-ld-syntax/#default-vocabulary



--
Markus Lanthaler
@markuslanthaler
Received on Thursday, 14 March 2013 10:25:40 UTC