- From: Sandro Hawke <sandro@w3.org>
- Date: Tue, 05 Mar 2013 00:23:20 -0500
- To: W3C RDF WG <public-rdf-wg@w3.org>
This is my review of json-ld-syntax, as promised in the last meeting. Summary: The document is in pretty good shape, and I think the underlying design is very good. Below, I suggest a few million editorial changes, a handful of which I think really need to be addressed before publication (and are marked MEDIUM or SERIOUS). I also raise a handful of concerns about the design, but I think they can probably all be dealt with in a few minutes of conversation. I think everything not marked MEDIUM or SERIOUS is fairly trivial. I reviewed the latest editor's draft: https://dvcs.w3.org/hg/json-ld/raw-file/e582aaa9ee43/spec/latest/json-ld-syntax/index.html I did not read the json-ld-api. I did play around with the json-ld "playground" site after I was into the appendiced. I haven't reviewed Appendix B yet; I'll try to get to that soon, but it's going to take more brain cells than I have left tonight. Without further ado... > In an attempt to harmonize the representation of Linked Data in JSON My first comment turns out to be, I think, the most utterly trivial. Sorry. My sense is that one "harmonizes" the elements in a set (by modifying them to make them more similar or related in some way); I don't know what it means to harmonize a single item like this. > ; mixing both Linked Data and non-Linked Data in a single document. The clause after a semicolon should be a complete sentence. Change to a comma or rephrase. > the name IRIs, when dereferenced, provide more information about the name I think they provide information about the named thing. I don't really like this paraphrasing of the LD principles, and I don't think it's helpful to the document here. I'd suggest providing some references instead. > Since JSON-LD is 100% compatible with JSON the large number comma needed after "JSON" > Additionally to all the features JSON provides, How about: "In addition to ..." > the ability to express the language associated with a string ? maybe add "natural" add comma at the end of the item > weights, and distances, MEDIUM Really? I pretty much never see people doing that with datatypes. > Software developers that s/that/who/ on each line > This specification does not describe the programming interfaces for the JSON-LD Syntax. The specification that describes the programming interfaces for JSON-LD documents is the JSON-LD Application Programming Interface [JSON-LD-API]. How about: A companion document, The JSON-LD Application Programming Interface [JSON-LD-API], specifies how to work with JSON-LD at a higher level: it provides a standard library interface for common JSON-LD operations. Although that document is not required for understanding and working with JSON-LD, for some readers it will be a better starting point. > A number of design goals were established before the creation of this markup language: I don't think the history matters. How about: JSON-LD satisfies the following design goals: > language. We should focus on simplicity when possible. I don't think that's what you mean. I think you mean simplicity is paramount. How about: to the language, so sometimes we do not achieve Zero Edits. > A character is represented as a single character string. Hard to parse. How about: A character is represented using a string of length one. > and that leading zeros are not allowed. ^^^^ omit "that" > Used to specify the native language s/native/natural (human)/ > For the avoidance of doubt, all keys, keywords, and values in JSON-LD are case-sensitive. Awkward phrase. s/For the avoidance of doubt, all/All/ > Conformance SERIOUS It's somewhat odd that all one needs for conformance is appendix B. So what are the other normative parts of this document for...? I think there may be a notion of a conformant JSON-LD generator or parser here, too -- one that follows the rules of the rest of this spec. That should be stated here. > different concepts instead of terms such as "name", "homepage", etc. I think, in this case, the word "terms" should NOT be linked to #dfn-term because you DON'T mean "term" in the JSON-LD sense, here. This is supposed to be the pre-JSON-LD counter-example. > a context is used to map terms, i.e., properties with associated values, to IRIs. Uh, that doesn't match the definition in #dfn-term. Is a term really a property with its associated value? I don't think so. How about: s/i.e., properties with associated values/such as the keys in an object structure/ > Expanded term definitions may be defined using absolute or compact IRIs as keys, which is mainly used to associate type or language information with an absolute or compact IRI. This is the first sentence in the document where I have no idea what it means, because it uses concepts not introduced yet. Maybe this can be dropped? Or maybe I'll just have to get it on the second pass. Later -- Yeah, I'd just drop that sentence, I think. > This information gives the data global context and allows developers to re-use each other's data without having to agree to how their data will interoperate on a site-by-site basis. I find the re-use of the word "context" awkward here. How about: This information allows developers to re-use each other's data without having to agree to how their data will interoperate on a site-by-site basis. > External JSON-LD context documents may contain extra information located outside of the @context key, That makes me wonder if it can be HTML, to be more readable. There would have to be some standard way to find the @context json in the HTML.... Later - I see it can't. Okay, con-neg works, too. > EXAMPLE 5 after this example I was expecting the next example to use a Link header (what turns out to be EXAMPLE 29). Maybe mention it here? > EXAMPLE 6 -- In the example above, the key http://schema.org/name is interpreted as an absolute IRI because it contains a colon (:) and the "http" prefix does not exist in the context. Now would be a perfect place to have a relative IRI example. You've just talked about there being absolute and relative IRIs, and given an example only of absolute ones. > JSON keys that do not expand to an absolute IRI are ignored, or removed in some cases, by the [JSON-LD-API]. However, JSON keys that do not include a mapping in the context are still considered valid expressions in JSON-LD documents—the keys just don't expand to unambiguous identifiers. This is kind of weird. It doesn't tell me what I'm supposed to do; it just confuses me. I guess it means they're like comments, and to be ignored? This is where we need a clear notion of a processor that reads JSON-LD and extracts all the triples and quads from it, it seems to me. > EXAMPLE 8 It's confusing to have @type here. Maybe stick to just showing @vocab, and not also introducing something we haven't seen yet. Later -- I see @type is never defined at all. Sigh. I guess it's consider an API thing. > An IRI is generated when a JSON object is used in the value position and contains an @id keyword: This is the first place you use the word "generated" and it's not at all clear what it means. If we were talking about mapping to RDF it would make sense. > To be able to externally reference nodes in a graph, it is important that each node has an unambiguous identifier. IRIs are a fundamental concept of Linked Data, and nodes should have a de-referenceable identifier used to name and locate them. For nodes to be truly linked, de-referencing the identifier should result in a representation of that node. Associating an IRI with a node tells an application that it can fetch the resource associated with the IRI and get back a description of the node. I'm not a fan of this paragraph. Can we just delete it? > A node is identified using the @id keyword: Maybe clarify that @id is overloaded, and it means something different used like this than used as either a key or a value in a context? It'd be a little more clear if EXAMPLE 11 didn't use @id in all three different ways. How about taking the context out of the example, and just having something like: { "@id": "http://manu.sporny.org/#me" "http://schema.org/name": "Manu Sporny", } (or some other example where an @id is more appropriate) > end of Section 5 As I come to Section 6 being marked normative, I see Section 5 was neither informative nor normative. > A document on the Web that defines one or more IRIs for use as properties in Linked Data is called a vocabulary. Don't conflate documents with vocabularies, please. See: https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-concepts/index.html#vocabularies I would just drop that whole paragraph. It's motivational, not spec text. And they're wonderfully motivated in the next paragraph anyway. > 6.1 Compact IRIs > Prefixes are expanded when the form of the value is a compact IRI represented as a prefix:suffix combination, and the prefix matches a term defined within the active context > Terms are interpreted as compact IRIs if they contain at least one colon and the first colon is not followed by two slashes (//, as in http://example.com) These sentences contradict each other. Do slashes prevent recognizing things as compact IRIs or not? I'd suggest not -- that's just extra code that wont be helpful, IMHO. (TEST CASE?) > EXAMPLE 17 "foaf": "http://xmlns.com/foaf/0.1/", "foaf:homepage": { "@type": "@id" }, Would that work if the order was reverse? I guess so, since JSON doesn't preserve order. Maybe clarify that, and maybe put them in the other order in the example. (TEST CASE?) Later -- Oh, I see this is covered well in section 6.9. Maybe near Example 17 say this is covered in more detail in section 6.9....? > 6.3 Type Coercion MEDIUM Okay, this overloading of @ keywords goes too far with @vocab serving a completely different purpose (from normal @vocab) in this situation. That's just silly. Maybe we could at least have a table showing what how the meanings differ in different places in the structure. > EXAMPLE 22 I've read this about 6 times and I can't make sense of it. That is, I think the example makes perfect sense, but the paragraph after it, explaining it, does not. When you say "not a prefix:suffix construct" maybe you mean "not a string"? > Duplicate context terms are overridden using a last-defined-wins mechanism. SERIOUS That means you can't use natural JSON parsing, doesn't it? If I read EXAMPLE 24 with a JSON parser into a nested object, then I don't know the order of the @context blocks. > Note that this is rarely a good authoring practice That doesn't go far enough. You could allow nesting to make Example 24 work, but I don't think it's okay to use order-of-statements. > It is a best practice to put the context definition at the top of the JSON-LD document. MEDIUM I don't agree. You're telling me I'm going against best practice to build and object in memory and let my JSON serializer turn it into JSON. > The @context subtree within that object is added to the top-level JSON object of the referencing document. What if there's more than one @context subtree? Do you mean the merge of all the @context subtrees? [TEST CASE] > end of 6.5 Thinking about this, I'd rather like .well-known/host-context.jsonld as another place I can look. So if I'm trying to get RDF triples, and I just get application/json, and there's no Link Header, I can look for a host-context file. I dunno -- maybe everyone can set a Link header easily enough. > For instance, in the example below the databaseId member would be ignored by a JSON-LD processor. MEDIUM This speaks to conformance. "JSON-LD processor" (maybe "consumer") needs to be defined in the Conformance clause, and s/should not/MUST not/ (with maybe some more rewriting). > This method can be accomplished by using the following markup pattern: "markup"? JSON isn't markup, as I understand the word. Can you just drop the word from the sentence? (glancing at appendix B for something) > To avoid forward-compatibility issues, a term should not start with an @ character MEDIUM Why only SHOULD NOT? Why not MUST NOT? The damage if they do is considerable. Also, you kind of need to say what processors MUST do if they see a keyword term they don't know -- ie one from the future. The options are: ignore (if you can figure out what/how much to ignore); or halt; or issue a warning to the user. > NOTE: The use of @container in the body of a JSON-LD document has no meaning That doesn't seem worth saying here. I assume it's ruled out in Appendix B. > 6.11 Embedding Odd section. It seems to have forgotten this was introduced as a graph syntax. The main thing to highlight is that this is syntactic sugar; sometimes it's nice to syntactically embed the node in one of the places that had a link to it. > Example 46 SERIOUS I suspect the first row of the table is wrong. I would think only the triples inside the value associated with the @graph key would go inside the graph. Please clarify which it is, and correct the table if necessary. > Example 47, 48 MEDIUM It seems very confusing to use @graph for this. Can't you find a more direct way to do this? It seemed from stuff earlier (around Example 22) that in Example 48 you wouldn't need to repeat the @context, because it occured earlier. But maybe that example-22 stuff was wrong, and what was really meant there was "closer to the root of the JSON object tree". No, that can't be right, either. I cannot see any sensible rules for which contexts are in effect at any point in the json tree. How about this as a hack that's more elegant: [ { "@context": ... } { "@id": "http://manu.sporny.org/i/public", "@type": "foaf:Person", "name": "Manu Sporny", "knows": "http://greggkellogg.net/foaf#me" }, { "@id": "http://greggkellogg.net/foaf#me", "@type": "foaf:Person", "name": "Gregg Kellogg", "knows": "http://manu.sporny.org/i/public" } ] ... with a rule that an object that has JUST a @context key, and no other keys, is actually omitted from arrays. That seems like a cleaner hack than using the @graph keyword. Keep @graph for when people really want named graphs. > 6.13 Identifying Blank Nodes This is okay, but it would be pretty easy and much more in keeping with the style of the document to avoid mentioning RDF, even here. Something like: For some topologies of the graph of nodes being expressed in JSON-LD, such as topologies with loops, embedding along cannot be used, and @id must be used to connect the nodes. In some cases, one may not want to name nodes with IRIs. In these situations, one can use "blank node identifiers", which look like IRIs but with _ (underscore) as the scheme name. For example: { @id: _:n1, name: Secret Agent 1 knows: { name: Secret Agent 2 knows: { @id: _:n1 } } } In this case, we do not want to assign IRIs to the two people, but we want to express that they know each other. We can say SA1 knows SA2 using embedding, but to say SA2 knows SA1 we need to use a blank node identifier. > Every statement in the context having a keyword as the key (as in { "@type": ... }) will be ignored when being processed. I think you mean this only for keywords that are known to be meaningless when used as keys in a @context. I think it would be better to make this an error. But the bigger question is about forward compatibility -- MUST processors ignore all keyword keys in contexts? (Are any allowed, with meaning? I don't see any.) > 6.15 and 6.16 These should probably be marked non-normative. There's nothing here I need to know to work with JSON-LD (although it's very cool and all). > 6.17 Data Indexing Not sure how I feel about this. It's kind of weird, but pretty harmless, I guess. I'm not sure it would work, but an alternative design would be to have a particular property be @index'd. So instead of: "@container": "@index" in the context we'd say "@index": "lang" and then the stuff in green would be equivalent to: "post": [ { "lang": "en", "@id": "http://example.com/posts/1/en", "body": "World commodities were up today with heavy trading of crude oil...", "words": 1539 }, { lang: "de", "@id": "http://example.com/posts/1/de", "body": "Die Werte an Warenbörsen stiegen im Sog eines starken Handels von Rohöl...", "words": 1204 } ] I think that would provide the same functionality, but without these keys that aren't really in the data. It would let you cleverly generate JSON-LD like this from plain triples, if given the right context. (You'd have to have triples with the same S and P, where each O differs in the value of a DataProperty, as in this example.) > A. Data Model What happens if the same @graph @id is used in two places? are the graphs merged, or what? Shouldnt the spec say? Or is that left to the API document as well? (it's a lot more than an API.) (in TriG they are merged) In general, I found Appendix A very confusing, and I'm thoroughly familiar with the RDF data model. This does not bode well for JSON folks. Do they need to understand this section, or can it be marked non-normative? > Whenever possible, an edge should be labeled with an IRI. As far as I can tell, from reading the spec up to this point, if it doesn't have an IRI, it's ignored -- and thus not part of the data model. Several times you say terms that dont map to IRIs are ignored. > This section is normative; This section is non-normative SERIOUS These labels seem to be applied inconsistently. > The JSON-LD Algorithms and API specification [JSON-LD-API] defines the conversion rules between JSON's native data types and RDF's counterparts to allow full round-tripping. SERIOUS EDITORIAL I really don't like the mapping-to-RDF being left to another, later spec. I can live with it just being shown in the examples, except for not knowing what happens with numbers. From the playground I see integers end up as xsd:integer and otherwise they are xsd:double, which is simple enough, but should really be said in this document, or at least shown in an example. (I see a bug in the playground. If you use too large an integer, it converts the lexrep to being in scientific notation.) > In JSON-LD lists are part of the data model whereas in RDF they are part of a vocabulary, namely [RDF-SCHEMA]. Doesn't JSON-LD also have sets? As I read the spec, it seemed like @collection: @set had some semantics, in addition to being a directive to keep singletons in arrays. A set-valued property is somewhat different from a repeated property. > The JSON-LD context has direct equivalents for the Turtle @prefix declaration: True, but that doesn't seem to be what the examples are showing. I'd just drop that line. > Appendix B Not really reviewed at this time. > E. IANA Considerations > This section is non-normative. SERIOUS Actually, I think this section is Normative, like the profile stuff. > will be submitted to the Internet Engineering Steering Group if this specification becomes a W3C Recommendation. MEDIUM Actually it goes at Last Call, as per http://www.w3.org/2002/06/registering-mediatype > To request or specify Expanded JSON-LD document form, the IRI http://www.w3.org/ns/json-ld#expanded SHOULD be used. SERIOUS I can't figure out who the SHOULD applies to. Do you mean: if you want the expanded form, you SHOULD ask for it with this profile (which I think would be silly) or do you mean: if you receive a request that includes this profile parameter, you SHOULD return expanded form ? I guess the latter, but that's not what it says. I would think you'd use normal media-type rules here -- if you can't provide it in expanded form, then you're not providing it, and fallback to another media type. > Published specification: The JSON-LD specification. This should be plain text, and the URL should be updated. I guess it will be http://www.w3.org/TR/json-ld-syntax > Fragment identifiers used with application/ld+json resources may identify a node in a JSON-LD graph expressed in the resource. This idiom, which is also used in RDF [RDF-CONCEPTS], gives a simple way to "mint" new, document-local IRIs to label nodes and therefore contributes considerably to the expressive power of JSON-LD. MEDIUM I have no idea what this text is trying to say. For my best guess, please replace it with: Fragment identifiers used with application/ld+json are treated as in other RDF syntaxes, as per RDF Concepts (link to http://www.w3.org/TR/rdf11-concepts/#section-fragID) [RDF-CONCEPTS] > References Some of them are out of date, like TURTLE-TR. Also, the reference style isn't correct -- it only has the dated links. --- That's it. I'll try to get to Appendix B. before the meeting, but I wanted to send this early enough that it can be read & digested before Wednesday's meeting. Keep up the great work, guys. I only point out all these places for improvement because I think this is so important and want it to have the best chance it can. -- Sandro
Received on Tuesday, 5 March 2013 05:23:34 UTC