- From: Damian Gessler <dgessler@iplantcollaborative.org>
- Date: Thu, 16 May 2013 17:10:44 -0600
- To: public-rdf-comments@w3.org
This is discussion is long, but hopefully offers constructive comment for RDF/JSON. It is submitted as an email per directions at https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-json/index.html. The model proposed here addresses untyped literals, typed literals, resources (URIs and bnodes), QNames (including reserved prefixes, user-defined prefixes, and a default namespace), preservation of XML encoding information, type declarations, comments, short-circuit parsing, and both aggregate and disbursed subject blocks. It does so with a "natural" reading of the resultant JSON that yields similarities to both N3 and RDF/XML. It is designed to be informationally lossless with respect to both RDF and RDF/XML, and can be used either as a pure RDF serialization independent of RDF/XML, or as a streaming transliteration on the large extant repository of legacy RDF/XML documents on the Web. We begin simply and pedagogically, but things will speed up: 1. We ask rhetorically what we are trying to achieve with RDF/JSON. We begin with an immediate and simple JSON serialization for RDF: a serialization that preserves the core and fundamental data model of RDF (the S,P,O triple) while adding little else; viz: [ [ "S", "P", "O" ], [ "S", "P", "O" ], ... ] Where S is the Subject, P is the Predicate (or Property), and O is the Object. This simple serialization can be expanded to support literal datatypes in a number of ways; e.g.: [ [ "S", "P", "L" ], [ "S", "P", { "L" : "D" } ], [ "S", "P", { "R" : {} } ], ... ] for RDF Objects L (Literal) (and datatype D) and R (Resource) (URI or bnode). There are also other minor variants and syntaxes that could differentiate between untyped literals, typed literals, and resources. We will reject this serialization per se; but it is important to offer it as a "null model" because that forces us to be explicit as to why another serialization with necessarily overloaded semantics is preferable. Clearly, by not stopping at this immediate and natural JSON serialization of triples, the vision of RDF/JSON must be either implicitly, or explicitly, something other than just serializing RDF into JSON. By presenting a data model of: { "S" : { "P" : { "O" : [ ... ] } } RDF/JSON shows that it prioritizes a subject-oriented data structure of the underlying RDF data model in achieving its JSON serialization. This elegant, natural, data model has similarities to the use and adoption of N3 over N-Triples. 2. We note that the goal of RDF/JSON cannot be interpreted as to translate legacy JSON -> RDF. This is because the semantics of any arbitrary, legacy, JSON document do not map to the semantics of RDF/JSON. For example, JSON arrays do not map to RDF List constructs--and indeed, nor should they, for an array is not a list (though in many cases it can be interpreted as such). Also, RDF/JSON introduces reserved keys ("type", "value", "lang", "datatype") that have implied semantics on the resultant de-serialized data models that are not recognized as such in JSON. This is not to say that one could not read legacy JSON, build an in-memory data model, and output RDF/JSON; it is to say that such an operation (arbitrary, legacy JSON -> RDF -> RDF/JSON) is outside both the goals and spec of RDF/JSON. For JSON -> RDF, see JSON-LD [1]. Thus the perspective of RDF/JSON is focused on RDF -> JSON, while leveraging some of the JSON data modeling constructs. The W3C recommend serialization for RDF is RDF/XML [2]. There is a large legacy presence of RDF/XML documents on the Web, especially for OWL. Thus a desirable characteristic of a JSON serialization would be the informationally lossless transformation of RDF/XML -> JSON. This becomes a key guide for the following discussion. While RDF/JSON can position itself as solely a RDF serialization independent of others, distinct, and separate from RDF/XML, this is perhaps a missed opportunity. Alternatively, RDF/JSON could position itself as an RDF -> JSON serialization that builds upon, and is receptive to, informationally lossless transliterations of the already-recommended W3C serialization for RDF: RDF/XML. The motivation is that such an approach builds a suite of complementary W3C technologies, including various serializations, rather than a merely a collection of competing formats. Of course, RDF/JSON should also be able to stand separate and independent of RDF/XML, such that one could go RDF -> RDF/JSON -> RDF without any serialization through RDF/XML. Thus we seek both worlds. Currently, RDF/JSON is not informationally lossless with respect to RDF/XML; we note a number of difficulties: 2a. QNames. RDF/JSON does not support QNames [3]. This presumably could be addressed by adding semantics on how to serialize prefixes. If RDF/JSON chooses not to support QNames then it can be still said to be informationally lossless with respect to RDF, but it cannot be said to be informationally lossless with respect to RDF/XML. This would seem to be an undesirable and unnecessary limitation. 2b. Serializing. RDF/JSON binds all of a Subject's predicates, and all and each of those Predicates' Objects into a single, compound JSON object. Yet RDF/XML does not require that all statements about a Subject be together or in any one place in the document, and RDF does not require this generically for serialization. Thus RDF/JSON cannot be implemented as a streaming syntactical re-serializer directly on RDF/XML: RDF/JSON must have knowledge of the entire RDF data model, such as to know all of a Subject's predicates and their objects, before it can serialize even the first subject. This is somewhat unfortunate, since we would like a serialization spec to be independent of implementation algorithms, be they streaming or "DOM"-based. RDF/JSON's requirement that "S" be unique (for each unique Subject) is forced upon it by JSON's requirement that all keys in a JSON object be unique (but see below). 2c. Parsing. RDF/JSON imposes a data model outside of RDF proper, which limits the utility of the serialization. But it is fair to say it also enhances the utility of the serialization: there is a trade-off. The elegance and "naturalness" of RDF/JSON's { "S" : { "P" : [ "O" ] } } model necessarily clusters statements about Subjects, while disbursing statements about Predicates and Objects throughout the document. I call this the "phone book" problem, where the chosen serialization of the producer limits the utility available to the consumer, even though the consumer "has all the data." In the "old days," phone books were distributed as serialized name:number pairs, sorted by name, printed on paper. The sorting produced essentially an array, such that one could use an approximate binary search to find a name amongst a million entries in a matter of seconds. The data producer (the phone company) gave the consumer both name and number, and at some level did not care whether the consumer was interested in the name, number, or both. But the serialization essentially forced the consumer to accept name:number ordered-pairs; the sorting and serialization on name biased against number:name utility. A separate serialization (called a reverse-lookup) was needed if one had a number and wanted to find its associated name. These books were usually hard to find. What is relevant here is not the old days of phone books, but to note that RDF has no such restriction. RDF does not bias Subjects over Objects, or Objects over Predicates, etc. One of the benefits of the RDF/JSON modeling is that once one is done processing a Subject, one is guaranteed that no more syntactic statements about the Subject (as a Subject, and as identified lexically by its key [i.e., not addressing the semantics of owl:sameAs]) shall be made. Thus unlike RDF/XML, a streaming parser can be implemented for RDF/JSON such that further processing of a document stream can be abandoned prior to the entire document being processed. I call this "short-circuit" parsing. But this comes at the cost that the RDF/JSON model limits the utility of the data when not consumed as intended, and in this case the "intent" is set not by the producer, but by RDF/JSON itself. One could say that RDF/JSON benefits the parser at the expense of the serializer. 2d. RDF/JSON has no mechanism to retain comments ex situ of RDF (e.g., RDF/XML XML comments [<!-- -->]). This is made difficult due to JSON's lack of support for embedded comments. The proposal below addresses the above issues while keeping very much in the flavor of RDF/JSON's { "S" : { "P" : [ "O" ] } } model. It is informationally lossless with respect to both RDF and RDF/XML (supports QNames and comments); it supports streaming serialization (e.g., as a syntactical transliterator on streaming RDF/XML); and it supports streaming parsing of its own serialization. The proposal is quite simple and contains two "forms": Form 1. Guarantee that all statements about a Subject are localized in the document, thus supporting short-circuit parsing. Short-circuit guarantees are "communicated" to the parser by virtue of an opening JSON object. A parser is guaranteed that all keys of a JSON object are unique, thus when it "sees" a JSON object, it "knows" that all statements about the key are localized to the JSON object. Form 1 is very similar in structure to RDF/JSON. 1a. Simple, untyped literals: { "S" : { "P" : "L" } } Examples: 1a.i { "http://example.org/about" : { "http://purl.org/dc/terms/title" : "Anna's Homepage" } } 1a.ii { "http://example.org/about" : { "http://purl.org/dc/terms/title" : [ "Anna's Homepage", "Annas hjemmeside" ], "http://anotherUniqueProperty/p" : "L" ... } } JSON array [] constructs are required for the Object only as needed. This differs from RDF/JSON which requires Object array constructs even in cases of there being only a single Object. JSON imposes no unique value restriction for array elements. Example 1a.i shows that simple statements are "simply" serialized. The examples below will show that more complex statements are built from the application of simple rules. Example 1a.ii shows JSON arrays as RDF Objects to package multiple property instances and values. 1b. Typed Literals. We note from RDF/XML that datatypes on literals are attributes on the Predicates (not on the literals themselves). In a similar manner, typed literals do not have a language, per se [4]: a language qualifier is on the Predicate. Thus we here make a simple extension that allows use to replace the literal "L" with an JSON object {} to capture arbitrary RDF/XML attribute data, with special semantics for "rdf:value"; i.e.: 1b. Typed literals: { "S" : { "P" : { "rdf:value" : "L", "rdf:datatype" : "D", ... } } } Example: { "http://example.org/about" : { "http://purl.org/dc/terms/title" : { "rdf:value" : "Annas hjemmeside", "rdf:datatype" : "http://www.w3.org/2001/XMLSchema#string", "xml:lang" : "da" } } } Here, rdf:value is akin to RDF/JSON "value." It and it alone is NOT an attribute on the Predicate (it is the "text content" of the equivalent XML element), but all other key:value pairs are interpreted as Predicate attributes. rdf:datatype is akin to RDF/JSON's "datatype," but there is no need to introduce a new and reserved key word: the RDF/XML attribute assumes the role immediately. This simple form--that RDF Objects are JSON Objects with a syntactical placement of RDF/XML attributes--yields an immediate and consistent extension for Objects as resources (URIs and bnodes): 1c. Objects as resources (URIs and bnodes): { "S" : { "P" : { "rdf:resource" : "O", ... } } } Compound example: { "http://example.org/about" : { "http://purl.org/dc/terms/title" : [ "Anna's Homepage", { "rdf:value" : "Annas hjemmeside", "rdf:datatype" : "http://www.w3.org/2001/XMLSchema#string", "xml:lang" : "da" } ], "http://xmlns.com/foaf/0.1/homepage" : { "rdf:resource" : "http://example.org/anna" }, "http://purl.org/dc/terms/creator" : "_:anna" } } At first it may not seem that the above proposal differs much in substance from RDF/JSON, but it does in a number of ways. It retains the essence of { "S" : { "P" : "O" } } model, but simplifies the serialization for simple cases, and aligns more complex cases with a transliteration of RDF/XML attributes. This requires no actual knowledge of RDF as a re-serializer. The model also lends itself "naturally" to QName support [3], thus becoming closer to being informationally lossless with respect to RDF/XML. We support Qnames by noting the "xmlns" attribute on the rdf:RDF "Subject"; viz.: { "rdf:RDF" : { "xmlns:rdf" : "http://www.w3.org/1999/02/22-rdf-syntax-ns#", "xmlns:xsd" : "http://www.w3.org/2001/XMLSchema#", "xmlns:" : "http://example.org/", "xmlns:dc" : "http://purl.org/dc/terms/", "xmlns:foaf" : "http://xmlns.com/foaf/0.1" }, ":about" : { ... } } We bootstrap the definition of the rdf: namespace within the rdf:RDF construct. We make the implicit assumption that the token "rdf:RDF" can never itself be the valid Subject of a user-defined payload--a topic we discuss further in section 4. below. We can achieve a slight clean-up in presentation by recognizing "xmlns" as a keyword, but we do this only as "syntactical sugar" on the underlying model of XML attributes on Subject entries; e.g.: { "xmlns" : { "" : "http://example.org/", "dc" : "http://purl.org/dc/terms/", "foaf" : "http://xmlns.com/foaf/0.1" }, ":about" : { ... } } RDF requires that all Subjects are resources: either URIs or bnodes. Resources can be lexically written in four variants: Absolute URIs; e.g., http://example.org/about, urn:example:about QName with prefix (namespace); e.g., dc:title QName with reserved underscore (_) for bnode; e.g., _:anna QName with user-defined default namespace; e.g., ":myTerm" Notably, RDF does not allow relative URIs for Subjects or Predicates [5]. Thus "a", "5", "a/b/c", are all valid (relative) URIs, but are lexically illegal as RDF Subjects. Thus we note that lexically, all valid Subjects and Predicates necessarily always contain a colon (:). Thus we can unambiguously allow the keyword "xmlns" (or "@xmlns") to appear in the "S" place and overload it with special meaning as a document directive. In a similar manner we can use "?xml" to preserve record of the XML document encoding that may appear on the first line of an RDF/XML document. In so doing we are not stating that 'this' document has the encoding; we are stating that this document, if transliterated from, or to, XML, has the encoding: { "?xml" : { "version" : "1.0", "encoding" : "UTF-8" }, "xmlns" : { "rdf" : "http://www.w3.org/1999/02/22-rdf-syntax-ns#", "xsd" : "http://www.w3.org/2001/XMLSchema#", "" : "http://example.org/", "dc" : "http://purl.org/dc/terms/", "foaf" : "http://xmlns.com/foaf/0.1" }, ":about" : { "dc:title" : [ "Anna's Homepage", { "rdf:value" : "Annas hjemmeside", "rdf:datatype" : "xsd:string", "xml:lang" : "da" } ], "foaf:homepage" : { "rdf:resource" : ":anna" }, "dc:creator" : "_:anna" }, "_:anna" : { "foaf:name" : "Anna", "foaf:homepage" : { "rdf:resource" : "http://example.org/anna" } } } Note in the above the use of (source) doc encoding, prefixes, default namespace, QNames, absolute URIs, bnodes, untyped literals, and typed literals. This could have been serialized from an RDF data model, or transliterated syntactically from RDF/XML. Our rules are still simple and consistent: almost the same as RDF/JSON, with the extension that object "metadata" is analogous to RDF/XML attributes and bundled inside a JSON object using existing rdf: namespace predicates. Form 2. Support the disbursement of statements throughout a document, for example as applicable when stream transliterating RDF/XML -> JSON. This currently cannot be done in RDF/JSON, but is quite simple to do: [ { "?xml" : { "version" : "1.0", "encoding" : "UTF-8" } }, { "xmlns" : { "rdf" : "http://www.w3.org/1999/02/22-rdf-syntax-ns#", "xsd" : "http://www.w3.org/2001/XMLSchema#", "" : "http://example.org/", "dc" : "http://purl.org/dc/terms/", "foaf" : "http://xmlns.com/foaf/0.1" } }, { ":about" : { "dc:title" : "Anna's Homepage", "dc:creator" : "_:anna" } }, { "_:anna" : { "foaf:name" : "Anna", "foaf:homepage" : { "rdf:resource" : "http://example.org/anna" } } }, { ":about" : { "dc:title" : { "rdf:value" : "Annas hjemmeside", "rdf:datatype" : "xsd:string", "xml:lang" : "da" }, "foaf:homepage" : { "rdf:resource" : ":anna" } } } ] (Note the repetition of :about). All the previous rules apply. We simply note that { "S" : { "P" : "O" } } used in the earlier examples was just a simplification of a larger, more encompassing model: [ { "S" : { "P" : "O" } }, { "S" : { "P" : "O" } }, ... ]. This reads "naturally:" an array of JSON objects, each making statements about an RDF Subject, with no restriction that successive Subjects be unique (Because each is enclosed in its own {} construct). The embracing opening and closing JSON array [] construct (Form 2) "communicates" the chosen serialization to the parser that it may NOT now assume that all statements about a given Subject are known, until it processes through the End-Of-File. If the serializer chooses to group all statements for all subjects (Form 1), then it can easily do this too by not using the opening JSON array [] construct and building JSON objects per the earlier examples above. Thus the "spec" does not bais towards parsers or serializers (it lets the producer decide). The spec supports short-circuiting for both streaming serializers and streaming parsers: just write/read the first non-whitespace character as a '[' or '{' and proceed accordingly. 3. RDF/XML has short-hand notation for rdf:type statements that allows concise "declarations" at the beginning of a document. These declarations can aid parsers. For example, OWL models can be aided by knowing if a property is an owl:ObjectProperty or an owl:DatatypeProperty when it is first *used* (i.e., when it first occurs as a resource in a statement). Because the serialization of RDF does not place restrictions on the ordering within a document of resource definitions and type statements, a predicate's use may precede its declaration and definition (if any). The RDF/XML "declaration" short-hand looks like this: <owl:Class rdf:about="http://mySite.org/MyClass"/> <mySite:MyClass rdf:about="http://mySite.org/MyThing"/> <owl:DatatypeProperty rdf:about="http://mySite.org/myDatatypeProperty"/> <owl:DatatypeProperty rdf:about="http://mySite.org/myOtherDatatypeProperty"/> .... and is semantically equivalent to more verbose rdf:type statements about each of the resources. Now note that the { "S" : { "P" : "O" } } construct leaves two other constructs undefined; namely: { "S" : "T" } and { "S" : [ "T", ... ] } where "T" is some text (a string). Thus we can define the use of these constructs to support concise rdf:type declarations in a manner similar to RDF/XML: { "owl:Class" : "mySite:myClass", "mySite:MyClass" : "mySite:myThing", "owl:DatatypeProperty" : [ "mySite:myDatatypeProperty", "mySite:myOtherDatatypeProperty" ] ... } The meaning of the above is that the JSON objects (or array elements) are each rdf:type of the JSON subject. There is no ambiguity in how to interpret the above because none of the constructs are of the form "S" : { ... }. This aligns nicely with RDF/XML declarations. Full example is below in 4. 4. Semantic serialization and parsing. RDF/JSON is presumably a sole RDF -> JSON serialization. It need know nothing about RDF/XML (though clearly here I advocate changing that to a tighter linkage to informationally lossless transliteration of RDF/XML). But it seems that the more that RDF/JSON differentiates itself as something more than "one more ad hoc way of representing RDF in JSON" (of which there are many such competing proposals), the more it could position itself as an important and distinct addition to the W3C toolbox. One way to do this is to more tightly embrace RDF as the underlying W3C Semantic Web technology and then use knowledge of those semantics to improve the serialization; i.e., RDF/JSON would be a "smart," semantically-aware JSON serialization of W3C Semantic Web technologies. We immediately distinguish here between "semantic serialization and parsing" and "inference." Various implicit forms of semantic parsing are already done by many parsers and interpreters--for example, a scripting language interpreter may assume from 'var x = 1' that x is an integer variable, even though it has not been declared as being of that type. The goal of semantic serialization and parsing is to improve and effect the serialization and parsing while neither adding nor removing any new knowledge. For example, with semantic parsing this: { "owl:DatatypeProperty" : ":myProperty", ":mySubject" : { ":myProperty" : { "rdf:resource" : "http://example.org/anna" } } } is equivalent to, and could be replaced by, this: { "owl:DatatypeProperty" : ":myProperty", ":mySubject" : { ":myProperty" : "http://example.org/anna" } } The token "http://example.org/anna" is necessarily a resource, not a literal. The line between semantic serialization and parsing and inference is subtle. The former is concerned with preservation of explicit statements of knowledge (or their absence) while using ex situ knowledge in a manner that improves the serialization or parsing; the latter is concerned with making statements explicit that may otherwise be necessarily-true yet only implicit (not stated). Our focus is on the former. (If a serialization is missing statements, we want to preserve that absence, since the action of serialization should maintain input->output data integrity [for example, cases of purposely "broken" data models for the purpose of testing]). A side-effect of the above is that in order to support streaming parsers, the order of statements in the document can be important (e.g., in the above example, if the declaration of myProperty occurred after its assignment, then the value "http://example.org/anna" would be considered a string literal, not a resource). This can be an issue, because RDF -> RDF/XML serializers may not give users control of the ordering of statements, nor even guarantee deterministic representations on successive invocations, thus RDF -> RDF/XML -> RDF/JSON -> RDF could fail to be informationally lossless. There are ways to address this, but at a minimum semantic serialization and parsing should be carefully weighed. If we accept due diligence on a dependency of statement ordering in the document, then we can outline at least four ways to support semantic serialization and parsing: 1. Recognize "rdf:RDF", "xmlns", etc. when they appear in the Subject position as document directives, not user-defined Subjects (see above). 2. Predefine the xmlns namespaces rdf, rdfs, xsd, and owl (require no explicit assignments). 3. Recognize the semantics of rdf:type, rdfs:range, rdfs:domain, rdfs:subClassOf, rdfs:subPropertyOf, etc.: the RDF Object of those predicates must be a resource (cannot be a literal). An exception and special semantics apply when the object is an XSD datatype (e.g., "rdfs:range xsd:integer"). 4. Allow the preservation of ex situ RDF comments with the keyword "comment" (or "@comment" or "//" or "#"). For example, if transliterating in RDF/XML, then the comments would be re-serialized as XML comments (<!-- -->). But if translating into N3, then the comments would be re-serialized as # comments. Example: { "?xml" : { "version" : "1.0", "encoding" : "UTF-8" }, "xmlns" : { "" : "http://example.org/", "dc" : "http://purl.org/dc/terms/", "foaf" : "http://xmlns.com/foaf/0.1", "mySite" : "http://mySite.org/myTerms/" }, "//" : "This is a comment", "rdf:Property" : [ "dc:title", "dc:creator" ], "owl:DatatypeProperty" : "mySite:aDatatypeProperty", "owl:ObjectProperty" : "mySite:hasHomepage", "owl:Class" : [ "mySite:myClass", "mySite:anotherClass" ], "mySite:aDatatypeProperty" : { "rdfs:range" : "xsd:string" }, "mySite:anObjectProperty" : { "rdfs:range" : "mySite:myClass" }, "mySite:anotherObjectProperty" : { "rdfs:subPropertyOf" : "mySite:anObjectProperty", "rdfs:domain" : "mySite:myClass" }, ":about" : { "dc:title" : "Anna's Homepage", "dc:creator" : "_:anna", "mySite:hasHomepage" : "http://example.org/anna", "rdfs:comment" : [ "This comment is an explicit property of the subject :about", "So is this one" ], "//" : [ "This is not a property of the subject.", "It is equivalent to two XML comments <!-- --> within the :about element block when re-serialized as RDF/XML" ] } } I believe the above will allow the informationally lossless transliteration of thousands (millons?) of extant RDF/XML documents into RDF/JSON--though a more thorough analysis is first warranted. The mere proliferation of said documents conforming to RDF/JSON should aid in its adoption. And of course, de novo RDF -> RDF/JSON is also satisfied. Summary: There are many candidates for serializing RDF as JSON. If we want anything more than the null model of a array of triples, then we should identify the goals and prioritize the trade-offs. The proposal here attempts the following goals: 1. RDF/JSON should enable RDF -> JSON serialization independent any other RDF serialization (specifically, one should be able to go directly from an RDF data model into RDF/JSON without any intervening serialization). 2. RDF/JSON should be able to be implemented as a streaming re-serializer on legacy RDF/XML without the need for building a complete, in-memory RDF data model. The special attention to RDF/XML is because it is already the W3C recommended serialization for RDF. 3. RDF/JSON should allow the enablement of short-circuit parsing, if the provider chooses to serialize content so as to support it. 4. RDF/JSON should be informationally lossless with respect to both RDF and to transliterations of RDF/XML. 5. RDF/JSON should reflect a "natural" JSON representation: simple things should be "simply serialized" and complex things should be built from simple things. If one knows JSON, but doesn't really know RDF, then one should feel comfortable that JSON constructs are being used in intuitive, "natural" ways without the need for syntactic convolutions. 6. As a proposed W3C recommendation, RDF/JSON should leverage RDF, RDFS, XSD, and OWL semantics when it can do so either without compromise to the above goals, or with clear and prioritized compromise (for example, identifying cases where reliance on statement ordering is acceptable). Damian Gessler References: [1] http://www.w3.org/TR/json-ld [2] http://www.w3.org/TR/#tr_RDF [3] http://www.w3.org/2001/tag/doc/qnameids, http://www.w3.org/TR/xml-names [4] http://www.w3.org/TR/REC-rdf-syntax/#section-Syntax-languages [5] http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#dfn-URI-reference
Received on Friday, 17 May 2013 10:18:28 UTC