RE: review of json-ld-syntax from Markus Lanthaler on 2013-03-07 (public-rdf-wg@w3.org from March 2013)

From: Markus Lanthaler <markus.lanthaler@gmx.net>
Date: Thu, 7 Mar 2013 01:18:28 +0100
To: "'Sandro Hawke'" <sandro@w3.org>, "'W3C RDF WG'" <public-rdf-wg@w3.org>
Message-ID: <01b301ce1ac9$4ba5f880$e2f1e980$@lanthaler@gmx.net>
> This is my review of json-ld-syntax, as promised in the last meeting.

Thanks again for the very detailed review Sandro. I'm going to fix most of
the things directly. I will also send a second mail containing the things
that in my opinion need to be further discussed. I hope this makes it easier
to discuss things as it keeps the second email considerably shorter.

You can find the diff here:
https://github.com/json-ld/json-ld.org/commit/c0386d4cad176cb20a76f9f0bc4ebd
869f1b8194


There's one feature missing from the current syntax spec, @reverse. I'll
notify you when I've added that section so that you can review that part as
well.


More comments inline.



>  > In an attempt to harmonize the representation of Linked Data in JSON
> 
> My first comment turns out to be, I think, the most utterly trivial.
> Sorry.
> 
> My sense is that one "harmonizes" the elements in a set (by modifying
> them to make them more similar or related in some way); I don't know
> what it means to harmonize a single item like this.

Changed to "In an attempt to standardize.."


>  > ; mixing both Linked Data and non-Linked Data in a single document.
> 
> The clause after a semicolon should be a complete sentence.
> Change to a comma or rephrase.

Changed to a comma


>  > the name IRIs, when dereferenced, provide more information about the
> name
> 
> I think they provide information about the named thing. I don't really
> like this paraphrasing of the LD principles, and I don't think it's
> helpful to the document here. I'd suggest providing some references
> instead.

Changed to "about the thing".. we should discuss this further.


 
>  > Since JSON-LD is 100% compatible with JSON the large number
> 
> comma needed after "JSON"

Done

>  > Additionally to all the features JSON provides,
> 
> How about: "In addition to ..."

Done


>  > the ability to express the language associated with a string
> 
> ? maybe add "natural"

Change to "the ability to annotate strings with their language"


> add comma at the end of the item
> 
>  > weights, and distances,
> 
> MEDIUM
> 
> Really? I pretty much never see people doing that with datatypes.

Reduced to just dates and times


>  > Software developers that
> 
> s/that/who/ on each line

Fixed


>  > This specification does not describe the programming interfaces for
> the JSON-LD Syntax. The specification that describes the programming
> interfaces for JSON-LD documents is the JSON-LD Application
> Programming Interface [JSON-LD-API].
> 
> How about: A companion document, The JSON-LD Application Programming
> Interface [JSON-LD-API], specifies how to work with JSON-LD
> at a higher level: it provides a standard library
> interface for common JSON-LD operations. Although that
> document is not required for understanding and working with
> JSON-LD, for some readers it will be a better starting
> point.

Fixed, slightly reworded.


>  > A character is represented as a single character string.
> 
> Hard to parse.
> 
> How about: A character is represented using a string of length one.

Removed, it's not important and it's basically just briefly explaining JSON
which is referenced


>  > and that leading zeros are not allowed.
> ^^^^ omit "that"

Done


>  > Used to specify the native language
> 
> s/native/natural (human)/

Done


>  > For the avoidance of doubt, all keys, keywords, and values in JSON-
> LD
> are case-sensitive.
> 
> Awkward phrase.
> 
> s/For the avoidance of doubt, all/All/

Done


>  > different concepts instead of terms such as "name", "homepage", etc.
> 
> I think, in this case, the word "terms" should NOT be linked to
> #dfn-term because you DON'T mean "term" in the JSON-LD sense, here.
> This is supposed to be the pre-JSON-LD counter-example.

Good spot. I changed it to tokens to make it crystal clear.

 
>  > a context is used to map terms, i.e., properties with associated
> values, to IRIs.
> 
> Uh, that doesn't match the definition in #dfn-term. Is a term really
> a property with its associated value? I don't think so.
>
> How about: s/i.e., properties with associated values/such as the keys
> in an object structure/

You are right. I removed it for the time being. We need to explain where
terms can be used in more detail here. 


>  > Expanded term definitions may be defined using absolute or compact
> IRIs as keys, which is mainly used to associate type or language
> information with an absolute or compact IRI.
> 
> This is the first sentence in the document where I have no idea what
> it means, because it uses concepts not introduced yet. Maybe this can
> be dropped? Or maybe I'll just have to get it on the second pass.
> 
> Later -- Yeah, I'd just drop that sentence, I think.

Dropped it.



>  > This information gives the data global context and allows developers
> to re-use each other's data without having to agree to how their data
> will interoperate on a site-by-site basis.
> 
> I find the re-use of the word "context" awkward here.
> 
> How about: This information allows developers to re-use each other's
> data without having to agree to how their data will interoperate on a
> site-by-site basis.

Replaced


>  > External JSON-LD context documents may contain extra information
> located outside of the @context key,
> 
> That makes me wonder if it can be HTML, to be more readable. There
> would
> have to be some standard way to find the @context json in the HTML....
> 
> Later - I see it can't. Okay, con-neg works, too.

Do we need to change something here?


>  > EXAMPLE 6 -- In the example above, the key http://schema.org/name is
> interpreted as an absolute IRI because it contains a colon (:) and the
> "http" prefix does not exist in the context.
> 
> Now would be a perfect place to have a relative IRI example. You've
> just talked about there being absolute and relative IRIs, and given an
> example only of absolute ones.

Agree.. will add something using @id and a relative IRI.


 
>  > EXAMPLE 8
> 
> It's confusing to have @type here. Maybe stick to just showing
> @vocab, and not also introducing something we haven't seen yet.
>
> Later -- I see @type is never defined at all. Sigh. I guess it's
> consider an API thing.

This is now example 16, @type has been introduced at that point.


>  > An IRI is generated when a JSON object is used in the value position
> and contains an @id keyword:
> 
> This is the first place you use the word "generated" and it's not at
> all clear what it means. If we were talking about mapping to RDF it
> would make sense.

Change to "A string is interpreted as IRI when it is the value of an @id
member".


>  > A node is identified using the @id keyword:
> 
> Maybe clarify that @id is overloaded, and it means something different
> used like this than used as either a key or a value in a context?

I simplified the example so that @id is used only once.


> It'd be a little more clear if EXAMPLE 11 didn't use @id in all three
> different ways.

Fixed (supposing you meant example 10)


> As I come to Section 6 being marked normative, I see Section 5 was
> neither informative nor normative.

We rely on ReSpec.. not sure what's going wrong here. Will find it out 


>  > A document on the Web that defines one or more IRIs for use as
> properties in Linked Data is called a vocabulary.
> 
> Don't conflate documents with vocabularies, please.
> 
> See:
> https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-
> concepts/index.html#vocabularies
> 
> I would just drop that whole paragraph. It's motivational, not spec
> text. And they're wonderfully motivated in the next paragraph anyway.

Dropped it


>  > 6.1 Compact IRIs
> 
>  > Prefixes are expanded when the form of the value is a compact IRI
> represented as a prefix:suffix combination, and the prefix matches a
> term defined within the active context
> 
>  > Terms are interpreted as compact IRIs if they contain at least one
> colon and the first colon is not followed by two slashes (//, as in
> http://example.com)
> 
> These sentences contradict each other. Do slashes prevent recognizing
> things as compact IRIs or not? I'd suggest not -- that's just extra
> code that wont be helpful, IMHO. (TEST CASE?)

That was a safety measure that someone (Manu?) proposed because it is also
in RDFa. I wouldn't be opposed dropping it. It is really just extra code.


>  > EXAMPLE 17
> 
> "foaf": "http://xmlns.com/foaf/0.1/",
> "foaf:homepage": { "@type": "@id" },
> 
> Would that work if the order was reverse? I guess so, since JSON
> doesn't preserve order. Maybe clarify that, and maybe put them in
> the other order in the example. (TEST CASE?)
>
> Later -- Oh, I see this is covered well in section 6.9. Maybe near
> Example 17 say this is covered in more detail in section 6.9....?

Yes, it would work. I don't think we should put it in the example in reverse
order though. That's just confusing for readers. Do you think it really adds
any value to mention this here as well?


>  > EXAMPLE 22
> 
> I've read this about 6 times and I can't make sense of it. That is, I
> think the example makes perfect sense, but the paragraph after it,
> explaining it, does not. When you say "not a prefix:suffix construct"
> maybe you mean "not a string"?

Changed it to

"In this case the @id definition in the term definition is optional. If it
does exist, the compact IRI or IRI representing the term will always be
expanded to IRI defined by the @id key—regardless of whether a prefix is
defined or not."

Does this clarify it?


>  > Duplicate context terms are overridden using a last-defined-wins
> mechanism.
> 
> SERIOUS
> 
> That means you can't use natural JSON parsing, doesn't it? If I read
> EXAMPLE 24 with a JSON parser into a nested object, then I don't know
> the order of the @context blocks.
> 
>  > Note that this is rarely a good authoring practice
> 
> That doesn't go far enough. You could allow nesting to make Example
> 24 work, but I don't think it's okay to use order-of-statements.

That's exactly what happens here. Copying the example and increasing the
indentation makes it clear I think:

{
  "@context":
  {
    "name": "http://example.com/person#name,
    "details": "http://example.com/person#details"
  }",
  "name": "Markus Lanthaler",
  ...
  "details":
       {
         "@context":
           {
             "name": "http://example.com/organization#name"
           },
           "name": "Graz University of Technology"
       }
}



>  > It is a best practice to put the context definition at the top of
> the
> JSON-LD document.
> 
> MEDIUM
> 
> I don't agree. You're telling me I'm going against best practice to
> build and object in memory and let my JSON serializer turn it into
> JSON.

Well.. you could say the same about whitespace. If you completely remove it
(which most serializers would do by default) you get documents that you
can't read. I think recommending it in a note is ok. It makes documents much
easier to read for humans.


>  > The @context subtree within that object is added to the top-level
> JSON object of the referencing document.
> 
> What if there's more than one @context subtree? Do you mean the merge
> of all the @context subtrees? [TEST CASE]

Theoretically there could be multiple context subtrees because JSON doesn't
prohibit (yet) multiple members with the same key. However, if you parse it
you will typically only get one (the last one) back. Does this make sense?


>  > end of 6.5
> 
> Thinking about this, I'd rather like .well-known/host-context.jsonld
> as another place I can look. So if I'm trying to get RDF triples, and
> I just get application/json, and there's no Link Header, I can look
> for a host-context file. I dunno -- maybe everyone can set a Link
> header easily enough.

I'm strictly against such an idea. Not only is it a Web anti-pattern but
might also cause severe problems to people using e.g. shared hosting.


>  > For instance, in the example below the databaseId member would be
> ignored by a JSON-LD processor.
> 
> MEDIUM
> 
> This speaks to conformance. "JSON-LD processor" (maybe "consumer")
> needs to be defined in the Conformance clause, and s/should not/MUST
> not/ (with maybe some more rewriting).

Changed it to "For instance, in the example below the databaseId member
would not expand to an IRI."


>  > This method can be accomplished by using the following markup
> pattern:
> 
> "markup"? JSON isn't markup, as I understand the word. Can you just
> drop
> the word from the sentence?

Yes, actually the whole section (Property Generators) was dropped. We don't
support them anymore.


>  > NOTE: The use of @container in the body of a JSON-LD document has no
> meaning
> 
> That doesn't seem worth saying here. I assume it's ruled out in
> Appendix B.

Removed it


>  > Example 46
> 
> SERIOUS
> 
> I suspect the first row of the table is wrong. I would think only the
> triples inside the value associated with the @graph key would go
> inside the graph. Please clarify which it is, and correct the table
> if necessary.

Good spot! Fixed


>  > Example 47, 48
> 
> MEDIUM
> 
> It seems very confusing to use @graph for this. Can't you find a more
> direct way to do this?

We did consider using @set for this at some point but I think the cleanest
approach is to use @graph.


> It seemed from stuff earlier (around Example 22) that in Example 48
> you wouldn't need to repeat the @context, because it occured earlier.
> But maybe that example-22 stuff was wrong, and what was really meant
> there was "closer to the root of the JSON object tree". No, that
> can't be right, either. I cannot see any sensible rules for which
> contexts are in effect at any point in the json tree.

Yes, you misunderstood that example. The rules are simple. As soon as you
enter an object you check if it has an @context member. If it does, you use
that context for this object and all its children. 


> How about this as a hack that's more elegant:
> 
> [
> { "@context": ...
> }
> {
> "@id": "http://manu.sporny.org/i/public",
> "@type": "foaf:Person",
> "name": "Manu Sporny",
> "knows": "http://greggkellogg.net/foaf#me"
> },
> {
> "@id": "http://greggkellogg.net/foaf#me",
> "@type": "foaf:Person",
> "name": "Gregg Kellogg",
> "knows": "http://manu.sporny.org/i/public"
> }
> ]
> 
> ... with a rule that an object that has JUST a @context key, and no
> other keys, is actually omitted from arrays. That seems like a
> cleaner hack than using the @graph keyword. Keep @graph for when
> people really want named graphs.

I don't think that's a cleaner solution. But that's a personal opinion. 


>  > 6.13 Identifying Blank Nodes
> 
> This is okay, but it would be pretty easy and much more in keeping
> with the style of the document to avoid mentioning RDF, even here.
> 
> Something like:
> 
> For some topologies of the graph of nodes being expressed in
> JSON-LD, such as topologies with loops, embedding along cannot be
> used, and @id must be used to connect the nodes. In some cases,
> one may not want to name nodes with IRIs. In these situations,
> one can use "blank node identifiers", which look like IRIs but
> with _ (underscore) as the scheme name. For example:
> 
> { @id: _:n1,
> name: Secret Agent 1
> knows:
> { name: Secret Agent 2
> knows: { @id: _:n1 }
> }
> }
>
> In this case, we do not want to assign IRIs to the two people, but
> we want to express that they know each other. We can say SA1
> knows SA2 using embedding, but to say SA2 knows SA1 we need to use
> a blank node identifier.

Will update it tomorrow. It's already late...


>  > Every statement in the context having a keyword as the key (as in {
> "@type": ... }) will be ignored when being processed.
> 
> I think you mean this only for keywords that are known to be
> meaningless when used as keys in a @context. I think it would be
> better to make this an error. But the bigger question is about
> forward compatibility -- MUST processors ignore all keyword keys in
> contexts? (Are any allowed, with meaning? I don't see any.)

You are completely right. We forgot to update this sentence. We do indeed
throw an error in such a case. I removed this statement. 


>  > 6.15 and 6.16
> 
> These should probably be marked non-normative. There's nothing here I
> need to know to work with JSON-LD (although it's very cool and all).

Right, will fix it.


>  > 6.17 Data Indexing
> 
> Not sure how I feel about this. It's kind of weird, but pretty
> harmless, I guess.
> 
> I'm not sure it would work, but an alternative design would be to have
> a particular property be @index'd. So instead of:
> "@container": "@index"
> in the context we'd say
> "@index": "lang"
> and then the stuff in green would be equivalent to:
> 
> "post": [
> {
> "lang": "en",
> "@id": "http://example.com/posts/1/en",
> "body": "World commodities were up today with heavy trading of crude
> oil...",
> "words": 1539
> },
> { lang: "de",
> "@id": "http://example.com/posts/1/de",
> "body": "Die Werte an Warenbörsen stiegen im Sog eines starken Handels
> von Rohöl...",
> "words": 1204
> }
> ]
> 
> I think that would provide the same functionality, but without these
> keys that aren't really in the data. It would let you cleverly
> generate JSON-LD like this from plain triples, if given the right
> context. (You'd have to have triples with the same S and P, where
> each O differs in the value of a DataProperty, as in this example.)

This is part of a feature request from the Drupal community. They had the
explicit requirement that this data doesn't round-trip to RDF.


>  > A. Data Model
> 
> What happens if the same @graph @id is used in two places? are the
> graphs merged, or what? Shouldnt the spec say? Or is that left to
> the API document as well? (it's a lot more than an API.) (in TriG
> they are merged)

It's the same as with two nodes using the same @id, they are merged.


> In general, I found Appendix A very confusing, and I'm thoroughly
> familiar with the RDF data model. This does not bode well for JSON
> folks. Do they need to understand this section, or can it be marked
> non-normative?

We did include that section because the RDF WG asked us to do it :-) I would
be fine with removing it and moving the diagram to the introduction (a
simpler version of it).


>  > Whenever possible, an edge should be labeled with an IRI.
> 
> As far as I can tell, from reading the spec up to this point, if it
> doesn't have an IRI, it's ignored -- and thus not part of the data
> model. Several times you say terms that dont map to IRIs are ignored.

Could be a bnode ID


>  > This section is normative; This section is non-normative
> 
> SERIOUS
> 
> These labels seem to be applied inconsistently.

Yeah, seems we are having problems with ReSpec. Will fix it.


> (I see a bug in the playground. If you use too large an integer, it
> converts the lexrep to being in scientific notation.)

Unfortunately JSON doesn't specify the value space for numbers. So it's
somewhat undefined what "too large" is. The problem is that your browser
represent too large integers as floats.


>  > In JSON-LD lists are part of the data model whereas in RDF they are
> part of a vocabulary, namely [RDF-SCHEMA].
> 
> Doesn't JSON-LD also have sets? As I read the spec, it seemed like
> @collection: @set had some semantics, in addition to being a directive
> to keep singletons in arrays. A set-valued property is somewhat
> different from a repeated property.

No, it's exactly the same. It's just syntactic sugar plus the directive you
mentioned.

 
>  > The JSON-LD context has direct equivalents for the Turtle @prefix
> declaration:
> 
> True, but that doesn't seem to be what the examples are showing. I'd
> just drop that line.

Isn't it? The context contains a term "foaf" which is used as prefix.


>  > Appendix B
> 
> Not really reviewed at this time.
> 
>  > E. IANA Considerations
>  > This section is non-normative.
> 
> SERIOUS
> 
> Actually, I think this section is Normative, like the profile stuff.

Fixed


>  > will be submitted to the Internet Engineering Steering Group if this
> specification becomes a W3C Recommendation.
> 
> MEDIUM
> 
> Actually it goes at Last Call, as per
> http://www.w3.org/2002/06/registering-mediatype

Fixed

 
>  > To request or specify Expanded JSON-LD document form, the IRI
> http://www.w3.org/ns/json-ld#expanded SHOULD be used.
> 
> SERIOUS
> 
> I can't figure out who the SHOULD applies to. Do you mean:
> 
> if you want the expanded form, you SHOULD ask for it with this profile
> 
> (which I think would be silly) or do you mean:
> 
> if you receive a request that includes this profile parameter, you
> SHOULD return expanded form

I don't understand the difference in these two interpretations. Is there
any?


> ? I guess the latter, but that's not what it says. I would think you'd
> use normal media-type rules here -- if you can't provide it in expanded
> form, then you're not providing it, and fallback to another media type.

It's just a media type parameter.. it's safe to ignore them if they can't be
fulfilled.


>  > Published specification: The JSON-LD specification.
> 
> This should be plain text, and the URL should be updated. I guess it
> will be http://www.w3.org/TR/json-ld-syntax

Updated it to http://www.w3.org/TR/json-ld
We decided sometime ago to change the shortname. And you proposed the same
somewhere else.


>  > Fragment identifiers used with application/ld+json resources may
> identify a node in a JSON-LD graph expressed in the resource. This
> idiom, which is also used in RDF [RDF-CONCEPTS], gives a simple way to
> "mint" new, document-local IRIs to label nodes and therefore
> contributes
> considerably to the expressive power of JSON-LD.
> 
> MEDIUM
> 
> I have no idea what this text is trying to say. For my best guess,
> please replace it with:
> 
> Fragment identifiers used with application/ld+json are treated as
> in other RDF syntaxes, as per RDF Concepts (link to
> http://www.w3.org/TR/rdf11-concepts/#section-fragID) [RDF-CONCEPTS]

Fixed


>  > References
> 
> Some of them are out of date, like TURTLE-TR. Also, the reference style
> isn't correct -- it only has the dated links.

Will fix that later.



Cheers,
Markus



--
Markus Lanthaler
@markuslanthaler
Received on Thursday, 7 March 2013 00:19:01 UTC