RE: review of json-ld-syntax from Markus Lanthaler on 2013-03-05 (public-rdf-wg@w3.org from March 2013)

From: Markus Lanthaler <markus.lanthaler@gmx.net>
Date: Tue, 5 Mar 2013 09:19:20 +0100
To: "'W3C RDF WG'" <public-rdf-wg@w3.org>
Message-ID: <007b01ce197a$236eede0$6a4cc9a0$@lanthaler@gmx.net>
Thanks a lot for the review Sandro. I don't have time to go over it at the
moment. I've created ISSUE-224 to keep track of it:

 https://github.com/json-ld/json-ld.org/issues/224


Cheers,
Markus


--
Markus Lanthaler
@markuslanthaler




> -----Original Message-----
> From: Sandro Hawke [mailto:sandro@w3.org]
> Sent: Tuesday, March 05, 2013 6:23 AM
> To: W3C RDF WG
> Subject: review of json-ld-syntax
> 
> This is my review of json-ld-syntax, as promised in the last meeting.
> 
> 
> Summary: The document is in pretty good shape, and I think the
> underlying design is very good. Below, I suggest a few million
> editorial changes, a handful of which I think really need to be
> addressed before publication (and are marked MEDIUM or SERIOUS). I
> also raise a handful of concerns about the design, but I think they
> can probably all be dealt with in a few minutes of conversation. I
> think
> everything not marked MEDIUM or SERIOUS is fairly trivial.
> 
> 
> I reviewed the latest editor's draft:
> https://dvcs.w3.org/hg/json-ld/raw-file/e582aaa9ee43/spec/latest/json-
> ld-syntax/index.html
> 
> I did not read the json-ld-api. I did play around with the json-ld
> "playground" site after I was into the appendiced. I haven't reviewed
> Appendix B yet; I'll try to get to that soon, but it's going to take
> more brain
> cells than I have left tonight.
> 
> Without further ado...
> 
>  > In an attempt to harmonize the representation of Linked Data in JSON
> 
> My first comment turns out to be, I think, the most utterly trivial.
> Sorry.
> 
> My sense is that one "harmonizes" the elements in a set (by modifying
> them to make them more similar or related in some way); I don't know
> what it means to harmonize a single item like this.
> 
>  > ; mixing both Linked Data and non-Linked Data in a single document.
> 
> The clause after a semicolon should be a complete sentence.
> Change to a comma or rephrase.
> 
>  > the name IRIs, when dereferenced, provide more information about the
> name
> 
> I think they provide information about the named thing. I don't really
> like this paraphrasing of the LD principles, and I don't think it's
> helpful to the document here. I'd suggest providing some references
> instead.
> 
>  > Since JSON-LD is 100% compatible with JSON the large number
> 
> comma needed after "JSON"
> 
>  > Additionally to all the features JSON provides,
> 
> How about: "In addition to ..."
> 
>  > the ability to express the language associated with a string
> 
> ? maybe add "natural"
> 
> add comma at the end of the item
> 
>  > weights, and distances,
> 
> MEDIUM
> 
> Really? I pretty much never see people doing that with datatypes.
> 
>  > Software developers that
> 
> s/that/who/ on each line
> 
>  > This specification does not describe the programming interfaces for
> the JSON-LD Syntax. The specification that describes the programming
> interfaces for JSON-LD documents is the JSON-LD Application
> Programming Interface [JSON-LD-API].
> 
> How about: A companion document, The JSON-LD Application Programming
> Interface [JSON-LD-API], specifies how to work with JSON-LD
> at a higher level: it provides a standard library
> interface for common JSON-LD operations. Although that
> document is not required for understanding and working with
> JSON-LD, for some readers it will be a better starting
> point.
> 
>  > A number of design goals were established before the creation of
> this markup language:
> 
> I don't think the history matters.
> 
> How about: JSON-LD satisfies the following design goals:
> 
>  > language. We should focus on simplicity when possible.
> 
> I don't think that's what you mean. I think you mean simplicity is
> paramount.
> 
> How about: to the language, so sometimes we do not achieve Zero Edits.
> 
>  > A character is represented as a single character string.
> 
> Hard to parse.
> 
> How about: A character is represented using a string of length one.
> 
>  > and that leading zeros are not allowed.
> ^^^^ omit "that"
> 
>  > Used to specify the native language
> 
> s/native/natural (human)/
> 
>  > For the avoidance of doubt, all keys, keywords, and values in JSON-
> LD
> are case-sensitive.
> 
> Awkward phrase.
> 
> s/For the avoidance of doubt, all/All/
> 
>  > Conformance
> 
> SERIOUS
> 
> It's somewhat odd that all one needs for conformance is appendix B.
> So what are the other normative parts of this document for...?
> 
> I think there may be a notion of a conformant JSON-LD generator or
> parser here, too -- one that follows the rules of the rest of this
> spec. That should be stated here.
> 
>  > different concepts instead of terms such as "name", "homepage", etc.
> 
> I think, in this case, the word "terms" should NOT be linked to
> #dfn-term because you DON'T mean "term" in the JSON-LD sense, here.
> This is supposed to be the pre-JSON-LD counter-example.
> 
>  > a context is used to map terms, i.e., properties with associated
> values, to IRIs.
> 
> Uh, that doesn't match the definition in #dfn-term. Is a term really
> a property with its associated value? I don't think so.
> 
> How about: s/i.e., properties with associated values/such as the keys
> in
> an object structure/
> 
>  > Expanded term definitions may be defined using absolute or compact
> IRIs as keys, which is mainly used to associate type or language
> information with an absolute or compact IRI.
> 
> This is the first sentence in the document where I have no idea what
> it means, because it uses concepts not introduced yet. Maybe this can
> be dropped? Or maybe I'll just have to get it on the second pass.
> 
> Later -- Yeah, I'd just drop that sentence, I think.
> 
>  > This information gives the data global context and allows developers
> to re-use each other's data without having to agree to how their data
> will interoperate on a site-by-site basis.
> 
> I find the re-use of the word "context" awkward here.
> 
> How about: This information allows developers to re-use each other's
> data without having to agree to how their data will interoperate on a
> site-by-site basis.
> 
>  > External JSON-LD context documents may contain extra information
> located outside of the @context key,
> 
> That makes me wonder if it can be HTML, to be more readable. There
> would
> have to be some standard way to find the @context json in the HTML....
> 
> Later - I see it can't. Okay, con-neg works, too.
> 
>  > EXAMPLE 5
> 
> after this example I was expecting the next example to use a Link
> header
> (what turns out to be EXAMPLE 29). Maybe mention it here?
> 
>  > EXAMPLE 6 -- In the example above, the key http://schema.org/name is
> interpreted as an absolute IRI because it contains a colon (:) and the
> "http" prefix does not exist in the context.
> 
> Now would be a perfect place to have a relative IRI example. You've
> just talked about there being absolute and relative IRIs, and given an
> example only of absolute ones.
> 
>  > JSON keys that do not expand to an absolute IRI are ignored, or
> removed in some cases, by the [JSON-LD-API]. However, JSON keys that do
> not include a mapping in the context are still considered valid
> expressions in JSON-LD documents—the keys just don't expand to
> unambiguous identifiers.
> 
> This is kind of weird. It doesn't tell me what I'm supposed to do; it
> just confuses me.
> 
> I guess it means they're like comments, and to be ignored?
> 
> This is where we need a clear notion of a processor that reads JSON-LD
> and extracts all the triples and quads from it, it seems to me.
> 
>  > EXAMPLE 8
> 
> It's confusing to have @type here. Maybe stick to just showing
> @vocab, and not also introducing something we haven't seen yet.
> 
> Later -- I see @type is never defined at all. Sigh. I guess it's
> consider an API thing.
> 
>  > An IRI is generated when a JSON object is used in the value position
> and contains an @id keyword:
> 
> This is the first place you use the word "generated" and it's not at
> all clear what it means. If we were talking about mapping to RDF it
> would make sense.
> 
>  > To be able to externally reference nodes in a graph, it is important
> that each node has an unambiguous identifier. IRIs are a fundamental
> concept of Linked Data, and nodes should have a de-referenceable
> identifier used to name and locate them. For nodes to be truly linked,
> de-referencing the identifier should result in a representation of that
> node. Associating an IRI with a node tells an application that it can
> fetch the resource associated with the IRI and get back a description
> of
> the node.
> 
> I'm not a fan of this paragraph. Can we just delete it?
> 
> 
>  > A node is identified using the @id keyword:
> 
> Maybe clarify that @id is overloaded, and it means something different
> used like this than used as either a key or a value in a context?
> 
> It'd be a little more clear if EXAMPLE 11 didn't use @id in all three
> different ways. How about taking the context out of the example, and
> just having something like:
> 
> 
> {
> "@id": "http://manu.sporny.org/#me"
> "http://schema.org/name": "Manu Sporny",
> }
> 
> (or some other example where an @id is more appropriate)
> 
>  > end of Section 5
> 
> As I come to Section 6 being marked normative, I see Section 5 was
> neither informative nor normative.
> 
>  > A document on the Web that defines one or more IRIs for use as
> properties in Linked Data is called a vocabulary.
> 
> Don't conflate documents with vocabularies, please.
> 
> See:
> https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-
> concepts/index.html#vocabularies
> 
> I would just drop that whole paragraph. It's motivational, not spec
> text. And they're wonderfully motivated in the next paragraph anyway.
> 
>  > 6.1 Compact IRIs
> 
>  > Prefixes are expanded when the form of the value is a compact IRI
> represented as a prefix:suffix combination, and the prefix matches a
> term defined within the active context
> 
>  > Terms are interpreted as compact IRIs if they contain at least one
> colon and the first colon is not followed by two slashes (//, as in
> http://example.com)
> 
> These sentences contradict each other. Do slashes prevent recognizing
> things as compact IRIs or not? I'd suggest not -- that's just extra
> code that wont be helpful, IMHO. (TEST CASE?)
> 
>  > EXAMPLE 17
> 
> "foaf": "http://xmlns.com/foaf/0.1/",
> "foaf:homepage": { "@type": "@id" },
> 
> Would that work if the order was reverse? I guess so, since JSON
> doesn't preserve order. Maybe clarify that, and maybe put them in
> the other order in the example. (TEST CASE?)
> 
> Later -- Oh, I see this is covered well in section 6.9. Maybe near
> Example 17 say this is covered in more detail in section 6.9....?
> 
>  > 6.3 Type Coercion
> 
> MEDIUM
> 
> Okay, this overloading of @ keywords goes too far with @vocab serving
> a completely different purpose (from normal @vocab) in this
> situation. That's just silly.
> 
> Maybe we could at least have a table showing what how the meanings
> differ in different places in the structure.
> 
>  > EXAMPLE 22
> 
> I've read this about 6 times and I can't make sense of it. That is, I
> think the example makes perfect sense, but the paragraph after it,
> explaining it, does not. When you say "not a prefix:suffix construct"
> maybe you mean "not a string"?
> 
>  > Duplicate context terms are overridden using a last-defined-wins
> mechanism.
> 
> SERIOUS
> 
> That means you can't use natural JSON parsing, doesn't it? If I read
> EXAMPLE 24 with a JSON parser into a nested object, then I don't know
> the order of the @context blocks.
> 
>  > Note that this is rarely a good authoring practice
> 
> That doesn't go far enough. You could allow nesting to make Example
> 24 work, but I don't think it's okay to use order-of-statements.
> 
>  > It is a best practice to put the context definition at the top of
> the
> JSON-LD document.
> 
> MEDIUM
> 
> I don't agree. You're telling me I'm going against best practice to
> build and object in memory and let my JSON serializer turn it into
> JSON.
> 
>  > The @context subtree within that object is added to the top-level
> JSON object of the referencing document.
> 
> What if there's more than one @context subtree? Do you mean the merge
> of all the @context subtrees? [TEST CASE]
> 
>  > end of 6.5
> 
> Thinking about this, I'd rather like .well-known/host-context.jsonld
> as another place I can look. So if I'm trying to get RDF triples, and
> I just get application/json, and there's no Link Header, I can look
> for a host-context file. I dunno -- maybe everyone can set a Link
> header easily enough.
> 
> 
>  > For instance, in the example below the databaseId member would be
> ignored by a JSON-LD processor.
> 
> MEDIUM
> 
> This speaks to conformance. "JSON-LD processor" (maybe "consumer")
> needs to be defined in the Conformance clause, and s/should not/MUST
> not/ (with maybe some more rewriting).
> 
>  > This method can be accomplished by using the following markup
> pattern:
> 
> "markup"? JSON isn't markup, as I understand the word. Can you just
> drop
> the word from the sentence?
> 
> (glancing at appendix B for something)
>  > To avoid forward-compatibility issues, a term should not start with
> an @ character
> 
> MEDIUM
> 
> Why only SHOULD NOT? Why not MUST NOT? The damage if they do is
> considerable.
> 
> Also, you kind of need to say what processors MUST do if they see a
> keyword term they don't know -- ie one from the future. The options
> are: ignore (if you can figure out what/how much to ignore); or halt;
> or issue a warning to the user.
> 
>  > NOTE: The use of @container in the body of a JSON-LD document has no
> meaning
> 
> That doesn't seem worth saying here. I assume it's ruled out in
> Appendix B.
> 
>  > 6.11 Embedding
> 
> Odd section. It seems to have forgotten this was introduced as a
> graph syntax. The main thing to highlight is that this is syntactic
> sugar; sometimes it's nice to syntactically embed the node in one of
> the places that had a link to it.
> 
>  > Example 46
> 
> SERIOUS
> 
> I suspect the first row of the table is wrong. I would think only the
> triples inside the value associated with the @graph key would go
> inside the graph. Please clarify which it is, and correct the table
> if necessary.
> 
>  > Example 47, 48
> 
> MEDIUM
> 
> It seems very confusing to use @graph for this. Can't you find a more
> direct way to do this?
> 
> It seemed from stuff earlier (around Example 22) that in Example 48
> you wouldn't need to repeat the @context, because it occured earlier.
> But maybe that example-22 stuff was wrong, and what was really meant
> there was "closer to the root of the JSON object tree". No, that
> can't be right, either. I cannot see any sensible rules for which
> contexts are in effect at any point in the json tree.
> 
> How about this as a hack that's more elegant:
> 
> [
> { "@context": ...
> }
> {
> "@id": "http://manu.sporny.org/i/public",
> "@type": "foaf:Person",
> "name": "Manu Sporny",
> "knows": "http://greggkellogg.net/foaf#me"
> },
> {
> "@id": "http://greggkellogg.net/foaf#me",
> "@type": "foaf:Person",
> "name": "Gregg Kellogg",
> "knows": "http://manu.sporny.org/i/public"
> }
> ]
> 
> ... with a rule that an object that has JUST a @context key, and no
> other keys, is actually omitted from arrays. That seems like a
> cleaner hack than using the @graph keyword. Keep @graph for when
> people really want named graphs.
> 
>  > 6.13 Identifying Blank Nodes
> 
> This is okay, but it would be pretty easy and much more in keeping
> with the style of the document to avoid mentioning RDF, even here.
> 
> Something like:
> 
> For some topologies of the graph of nodes being expressed in
> JSON-LD, such as topologies with loops, embedding along cannot be
> used, and @id must be used to connect the nodes. In some cases,
> one may not want to name nodes with IRIs. In these situations,
> one can use "blank node identifiers", which look like IRIs but
> with _ (underscore) as the scheme name. For example:
> 
> { @id: _:n1,
> name: Secret Agent 1
> knows:
> { name: Secret Agent 2
> knows: { @id: _:n1 }
> }
> }
> 
> In this case, we do not want to assign IRIs to the two people, but
> we want to express that they know each other. We can say SA1
> knows SA2 using embedding, but to say SA2 knows SA1 we need to use
> a blank node identifier.
> 
> 
>  > Every statement in the context having a keyword as the key (as in {
> "@type": ... }) will be ignored when being processed.
> 
> I think you mean this only for keywords that are known to be
> meaningless when used as keys in a @context. I think it would be
> better to make this an error. But the bigger question is about
> forward compatibility -- MUST processors ignore all keyword keys in
> contexts? (Are any allowed, with meaning? I don't see any.)
> 
>  > 6.15 and 6.16
> 
> These should probably be marked non-normative. There's nothing here I
> need to know to work with JSON-LD (although it's very cool and all).
> 
>  > 6.17 Data Indexing
> 
> Not sure how I feel about this. It's kind of weird, but pretty
> harmless, I guess.
> 
> I'm not sure it would work, but an alternative design would be to have
> a particular property be @index'd. So instead of:
> "@container": "@index"
> in the context we'd say
> "@index": "lang"
> and then the stuff in green would be equivalent to:
> 
> "post": [
> {
> "lang": "en",
> "@id": "http://example.com/posts/1/en",
> "body": "World commodities were up today with heavy trading of crude
> oil...",
> "words": 1539
> },
> { lang: "de",
> "@id": "http://example.com/posts/1/de",
> "body": "Die Werte an Warenbörsen stiegen im Sog eines starken Handels
> von Rohöl...",
> "words": 1204
> }
> ]
> 
> I think that would provide the same functionality, but without these
> keys that aren't really in the data. It would let you cleverly
> generate JSON-LD like this from plain triples, if given the right
> context. (You'd have to have triples with the same S and P, where
> each O differs in the value of a DataProperty, as in this example.)
> 
>  > A. Data Model
> 
> What happens if the same @graph @id is used in two places? are the
> graphs merged, or what? Shouldnt the spec say? Or is that left to
> the API document as well? (it's a lot more than an API.) (in TriG
> they are merged)
> 
> In general, I found Appendix A very confusing, and I'm thoroughly
> familiar with the RDF data model. This does not bode well for JSON
> folks. Do they need to understand this section, or can it be marked
> non-normative?
> 
>  > Whenever possible, an edge should be labeled with an IRI.
> 
> As far as I can tell, from reading the spec up to this point, if it
> doesn't have an IRI, it's ignored -- and thus not part of the data
> model. Several times you say terms that dont map to IRIs are ignored.
> 
>  > This section is normative; This section is non-normative
> 
> SERIOUS
> 
> These labels seem to be applied inconsistently.
> 
>  > The JSON-LD Algorithms and API specification [JSON-LD-API] defines
> the conversion rules between JSON's native data types and RDF's
> counterparts to allow full round-tripping.
> 
> SERIOUS EDITORIAL
> 
> I really don't like the mapping-to-RDF being left to another, later
> spec. I can live with it just being shown in the examples, except for
> not knowing what happens with numbers. From the playground I see
> integers end up as xsd:integer and otherwise they are xsd:double,
> which is simple enough, but should really be said in this document, or
> at least shown in an example.
> 
> (I see a bug in the playground. If you use too large an integer, it
> converts the lexrep to being in scientific notation.)
> 
>  > In JSON-LD lists are part of the data model whereas in RDF they are
> part of a vocabulary, namely [RDF-SCHEMA].
> 
> Doesn't JSON-LD also have sets? As I read the spec, it seemed like
> @collection: @set had some semantics, in addition to being a directive
> to keep singletons in arrays. A set-valued property is somewhat
> different from a repeated property.
> 
>  > The JSON-LD context has direct equivalents for the Turtle @prefix
> declaration:
> 
> True, but that doesn't seem to be what the examples are showing. I'd
> just drop that line.
> 
>  > Appendix B
> 
> Not really reviewed at this time.
> 
>  > E. IANA Considerations
>  > This section is non-normative.
> 
> SERIOUS
> 
> Actually, I think this section is Normative, like the profile stuff.
> 
>  > will be submitted to the Internet Engineering Steering Group if this
> specification becomes a W3C Recommendation.
> 
> MEDIUM
> 
> Actually it goes at Last Call, as per
> http://www.w3.org/2002/06/registering-mediatype
> 
>  > To request or specify Expanded JSON-LD document form, the IRI
> http://www.w3.org/ns/json-ld#expanded SHOULD be used.
> 
> SERIOUS
> 
> I can't figure out who the SHOULD applies to. Do you mean:
> 
> if you want the expanded form, you SHOULD ask for it with this profile
> 
> (which I think would be silly) or do you mean:
> 
> if you receive a request that includes this profile parameter, you
> SHOULD return expanded form
> 
> ? I guess the latter, but that's not what it says. I would think you'd
> use normal media-type rules here -- if you can't provide it in expanded
> form, then you're not providing it, and fallback to another media type.
> 
>  > Published specification: The JSON-LD specification.
> 
> This should be plain text, and the URL should be updated. I guess it
> will be http://www.w3.org/TR/json-ld-syntax
> 
>  > Fragment identifiers used with application/ld+json resources may
> identify a node in a JSON-LD graph expressed in the resource. This
> idiom, which is also used in RDF [RDF-CONCEPTS], gives a simple way to
> "mint" new, document-local IRIs to label nodes and therefore
> contributes
> considerably to the expressive power of JSON-LD.
> 
> MEDIUM
> 
> I have no idea what this text is trying to say. For my best guess,
> please replace it with:
> 
> Fragment identifiers used with application/ld+json are treated as
> in other RDF syntaxes, as per RDF Concepts (link to
> http://www.w3.org/TR/rdf11-concepts/#section-fragID) [RDF-CONCEPTS]
> 
>  > References
> 
> Some of them are out of date, like TURTLE-TR. Also, the reference style
> isn't correct -- it only has the dated links.
> 
> ---
> 
> That's it. I'll try to get to Appendix B. before the meeting, but I
> wanted to send this early enough that it can be read & digested before
> Wednesday's meeting.
> 
> Keep up the great work, guys. I only point out all these places for
> improvement because I think this is so important and want it to have
> the
> best chance it can.
> 
> -- Sandro
Received on Tuesday, 5 March 2013 08:19:52 UTC