Re: Public feedback on RDF/JSON: Proposal to align w/ W3C RDF/XML

Hi Damian,

Thank you for your understanding.  Please feel free to use this list to record your thoughts for any future working group.

Regards,
Dave
--
http://about.me/david_wood



On May 17, 2013, at 12:49, Damian Gessler <dgessler@iplantcollaborative.org> wrote:

> Hi Peter,
> 
> Thank you for your comments. The questions you raise are immanently answerable, but given David Wood's email (please see my response to him), perhaps any discussion here is not relevant for this mailing list.
> 
> Best,
> Damian.
> 
> 
> On 5/17/13 5:27 AM, Peter Ansell wrote:
>> 
>> 
>> On 17 May 2013 09:10, Damian Gessler <dgessler@iplantcollaborative.org
>> <mailto:dgessler@iplantcollaborative.org>> wrote:
>> 
>>    This is discussion is long, but hopefully offers constructive
>>    comment for RDF/JSON. It is submitted as an email per directions at
>>    https://dvcs.w3.org/hg/rdf/__raw-file/default/rdf-json/__index.html
>>    <https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-json/index.html>.
>> 
>>    The model proposed here addresses untyped literals, typed literals,
>>    resources (URIs and bnodes), QNames (including reserved prefixes,
>>    user-defined prefixes, and a default namespace), preservation of XML
>>    encoding information, type declarations, comments, short-circuit
>>    parsing, and both aggregate and disbursed subject blocks. It does so
>>    with a "natural" reading of the resultant JSON that yields
>>    similarities to both N3 and RDF/XML. It is designed to be
>>    informationally lossless with respect to both RDF and RDF/XML, and
>>    can be used either as a pure RDF serialization independent of
>>    RDF/XML, or as a streaming transliteration on the large extant
>>    repository of legacy RDF/XML documents on the Web.
>> 
>>    We begin simply and pedagogically, but things will speed up:
>> 
>>    1. We ask rhetorically what we are trying to achieve with RDF/JSON.
>>    We begin with an immediate and simple JSON serialization for RDF: a
>>    serialization that preserves the core and fundamental data model of
>>    RDF (the S,P,O triple) while adding little else; viz:
>> 
>>    [
>>       [ "S", "P", "O" ],
>>       [ "S", "P", "O" ],
>>       ...
>>    ]
>> 
>>    Where S is the Subject, P is the Predicate (or Property), and O is
>>    the Object. This simple serialization can be expanded to support
>>    literal datatypes in a number of ways; e.g.:
>> 
>>    [
>>       [ "S", "P", "L" ],
>>       [ "S", "P", { "L" : "D" } ],
>>       [ "S", "P", { "R" : {} } ],
>>       ...
>>    ]
>> 
>>    for RDF Objects L (Literal) (and datatype D) and R (Resource) (URI
>>    or bnode). There are also other minor variants and syntaxes that
>>    could differentiate between untyped literals, typed literals, and
>>    resources.
>> 
>>    We will reject this serialization per se; but it is important to
>>    offer it as a "null model" because that forces us to be explicit as
>>    to why another serialization with necessarily overloaded semantics
>>    is preferable.
>> 
>>    Clearly, by not stopping at this immediate and natural JSON
>>    serialization of triples, the vision of RDF/JSON must be either
>>    implicitly, or explicitly, something other than just serializing RDF
>>    into JSON.
>> 
>>    By presenting a data model of:
>> 
>>    { "S" : { "P" : { "O" : [ ... ] } }
>> 
>>    RDF/JSON shows that it prioritizes a subject-oriented data structure
>>    of the underlying RDF data model in achieving its JSON
>>    serialization. This elegant, natural, data model has similarities to
>>    the use and adoption of N3 over N-Triples.
>> 
>> 
>>    2. We note that the goal of RDF/JSON cannot be interpreted as to
>>    translate legacy JSON -> RDF. This is because the semantics of any
>>    arbitrary, legacy, JSON document do not map to the semantics of
>>    RDF/JSON. For example, JSON arrays do not map to RDF List
>>    constructs--and indeed, nor should they, for an array is not a list
>>    (though in many cases it can be interpreted as such). Also, RDF/JSON
>>    introduces reserved keys ("type", "value", "lang", "datatype") that
>>    have implied semantics on the resultant de-serialized data models
>>    that are not recognized as such in JSON. This is not to say that one
>>    could not read legacy JSON, build an in-memory data model, and
>>    output RDF/JSON; it is to say that such an operation (arbitrary,
>>    legacy JSON -> RDF -> RDF/JSON) is outside both the goals and spec
>>    of RDF/JSON. For JSON -> RDF, see JSON-LD [1].
>> 
>>    Thus the perspective of RDF/JSON is focused on RDF -> JSON, while
>>    leveraging some of the JSON data modeling constructs. The W3C
>>    recommend serialization for RDF is RDF/XML [2]. There is a large
>>    legacy presence of RDF/XML documents on the Web, especially for OWL.
>>    Thus a desirable characteristic of a JSON serialization would be the
>>    informationally lossless transformation of RDF/XML -> JSON. This
>>    becomes a key guide for the following discussion. While RDF/JSON can
>>    position itself as solely a RDF serialization independent of others,
>>    distinct, and separate from RDF/XML, this is perhaps a missed
>>    opportunity.
>> 
>>    Alternatively, RDF/JSON could position itself as an RDF -> JSON
>>    serialization that builds upon, and is receptive to, informationally
>>    lossless transliterations of the already-recommended W3C
>>    serialization for RDF: RDF/XML. The motivation is that such an
>>    approach builds a suite of complementary W3C technologies, including
>>    various serializations, rather than a merely a collection of
>>    competing formats. Of course, RDF/JSON should also be able to stand
>>    separate and independent of RDF/XML, such that one could go RDF ->
>>    RDF/JSON -> RDF without any serialization through RDF/XML. Thus we
>>    seek both worlds.
>> 
>>    Currently, RDF/JSON is not informationally lossless with respect to
>>    RDF/XML; we note a number of difficulties:
>> 
>>    2a. QNames. RDF/JSON does not support QNames [3]. This presumably
>>    could be addressed by adding semantics on how to serialize prefixes.
>>    If RDF/JSON chooses not to support QNames then it can be still said
>>    to be informationally lossless with respect to RDF, but it cannot be
>>    said to be informationally lossless with respect to RDF/XML. This
>>    would seem to be an undesirable and unnecessary limitation.
>> 
>>    2b. Serializing. RDF/JSON binds all of a Subject's predicates, and
>>    all and each of those Predicates' Objects into a single, compound
>>    JSON object. Yet RDF/XML does not require that all statements about
>>    a Subject be together or in any one place in the document, and RDF
>>    does not require this generically for serialization. Thus RDF/JSON
>>    cannot be implemented as a streaming syntactical re-serializer
>>    directly on RDF/XML: RDF/JSON must have knowledge of the entire RDF
>>    data model, such as to know all of a Subject's predicates and their
>>    objects, before it can serialize even the first subject. This is
>>    somewhat unfortunate, since we would like a serialization spec to be
>>    independent of implementation algorithms, be they streaming or
>>    "DOM"-based. RDF/JSON's requirement that "S" be unique (for each
>>    unique Subject) is forced upon it by JSON's requirement that all
>>    keys in a JSON object be unique (but see below).
>> 
>>    2c. Parsing. RDF/JSON imposes a data model outside of RDF proper,
>>    which limits the utility of the serialization. But it is fair to say
>>    it also enhances the utility of the serialization: there is a
>>    trade-off. The elegance and "naturalness" of RDF/JSON's { "S" : {
>>    "P" : [ "O" ] } } model necessarily clusters statements about
>>    Subjects, while disbursing statements about Predicates and Objects
>>    throughout the document. I call this the "phone book" problem, where
>>    the chosen serialization of the producer limits the utility
>>    available to the consumer, even though the consumer "has all the
>>    data." In the "old days," phone books were distributed as serialized
>>    name:number pairs, sorted by name, printed on paper. The sorting
>>    produced essentially an array, such that one could use an
>>    approximate binary search to find a name amongst a million entries
>>    in a matter of seconds. The data producer (the phone company) gave
>>    the consumer both name and number, and at some level did not care
>>    whether the consumer was interested in the name, number, or both.
>>    But the serialization essentially forced the consumer to accept
>>    name:number ordered-pairs; the sorting and serialization on name
>>    biased against number:name utility. A separate serialization (called
>>    a reverse-lookup) was needed if one had a number and wanted to find
>>    its associated name. These books were usually hard to find. What is
>>    relevant here is not the old days of phone books, but to note that
>>    RDF has no such restriction. RDF does not bias Subjects over
>>    Objects, or Objects over Predicates, etc. One of the benefits of the
>>    RDF/JSON modeling is that once one is done processing a Subject, one
>>    is guaranteed that no more syntactic statements about the Subject
>>    (as a Subject, and as identified lexically by its key [i.e., not
>>    addressing the semantics of owl:sameAs]) shall be made. Thus unlike
>>    RDF/XML, a streaming parser can be implemented for RDF/JSON such
>>    that further processing of a document stream can be abandoned prior
>>    to the entire document being processed. I call this "short-circuit"
>>    parsing. But this comes at the cost that the RDF/JSON model limits
>>    the utility of the data when not consumed as intended, and in this
>>    case the "intent" is set not by the producer, but by RDF/JSON
>>    itself. One could say that RDF/JSON benefits the parser at the
>>    expense of the serializer.
>> 
>>    2d. RDF/JSON has no mechanism to retain comments ex situ of RDF
>>    (e.g., RDF/XML XML comments [<!-- -->]). This is made difficult due
>>    to JSON's lack of support for embedded comments.
>> 
>> 
>>    The proposal below addresses the above issues while keeping very
>>    much in the flavor of RDF/JSON's { "S" : { "P" : [ "O" ] } } model.
>>    It is informationally lossless with respect to both RDF and RDF/XML
>>    (supports QNames and comments); it supports streaming serialization
>>    (e.g., as a syntactical transliterator on streaming RDF/XML); and it
>>    supports streaming parsing of its own serialization.
>> 
>>    The proposal is quite simple and contains two "forms":
>> 
>>    Form 1. Guarantee that all statements about a Subject are localized
>>    in the document, thus supporting short-circuit parsing.
>>    Short-circuit guarantees are "communicated" to the parser by virtue
>>    of an opening JSON object. A parser is guaranteed that all keys of a
>>    JSON object are unique, thus when it "sees" a JSON object, it
>>    "knows" that all statements about the key are localized to the JSON
>>    object.
>> 
>>    Form 1 is very similar in structure to RDF/JSON.
>> 
>>    1a. Simple, untyped literals:
>> 
>>    {
>>       "S" : { "P" : "L" }
>>    }
>> 
>>    Examples:
>> 
>>    1a.i
>>    {
>>       "http://example.org/about" :
>>         { "http://purl.org/dc/terms/__title
>>    <http://purl.org/dc/terms/title>" : "Anna's Homepage" }
>>    }
>> 
>>    1a.ii
>>    {
>>       "http://example.org/about" : {
>>         "http://purl.org/dc/terms/__title
>>    <http://purl.org/dc/terms/title>" : [ "Anna's Homepage", "Annas
>>    hjemmeside" ],
>>         "http://anotherUniqueProperty/__p
>>    <http://anotherUniqueProperty/p>" : "L"
>>         ...
>>       }
>>    }
>> 
>>    JSON array [] constructs are required for the Object only as needed.
>>    This differs from RDF/JSON which requires Object array constructs
>>    even in cases of there being only a single Object. JSON imposes no
>>    unique value restriction for array elements.
>> 
>>    Example 1a.i shows that simple statements are "simply" serialized.
>>    The examples below will show that more complex statements are built
>>    from the application of simple rules.
>> 
>>    Example 1a.ii shows JSON arrays as RDF Objects to package multiple
>>    property instances and values.
>> 
>>    1b. Typed Literals. We note from RDF/XML that datatypes on literals
>>    are attributes on the Predicates (not on the literals themselves).
>>    In a similar manner, typed literals do not have a language, per se
>>    [4]: a language qualifier is on the Predicate. Thus we here make a
>>    simple extension that allows use to replace the literal "L" with an
>>    JSON object {} to capture arbitrary RDF/XML attribute data, with
>>    special semantics for "rdf:value"; i.e.:
>> 
>>    1b. Typed literals:
>> 
>>    {
>>       "S" : { "P" : {
>>         "rdf:value" : "L",
>>         "rdf:datatype" : "D",
>>          ...
>>         }
>>       }
>>    }
>> 
>>    Example:
>> 
>>    {
>>       "http://example.org/about" : {
>>         "http://purl.org/dc/terms/__title
>>    <http://purl.org/dc/terms/title>" : {
>>           "rdf:value" : "Annas hjemmeside",
>>           "rdf:datatype" : "http://www.w3.org/2001/__XMLSchema#string
>>    <http://www.w3.org/2001/XMLSchema#string>",
>>           "xml:lang" : "da"
>>           }
>>         }
>>    }
>> 
>>    Here, rdf:value is akin to RDF/JSON "value." It and it alone is NOT
>>    an attribute on the Predicate (it is the "text content" of the
>>    equivalent XML element), but all other key:value pairs are
>>    interpreted as Predicate attributes. rdf:datatype is akin to
>>    RDF/JSON's "datatype," but there is no need to introduce a new and
>>    reserved key word: the RDF/XML attribute assumes the role immediately.
>> 
>>    This simple form--that RDF Objects are JSON Objects with a
>>    syntactical placement of RDF/XML attributes--yields an immediate and
>>    consistent extension for Objects as resources (URIs and bnodes):
>> 
>>    1c. Objects as resources (URIs and bnodes):
>> 
>>    {
>>       "S" : { "P" :
>>         {
>>           "rdf:resource" : "O",
>>           ...
>>           }
>>         }
>>    }
>> 
>>    Compound example:
>> 
>>    {
>>       "http://example.org/about" : {
>> 
>>         "http://purl.org/dc/terms/__title
>>    <http://purl.org/dc/terms/title>" : [
>> 
>>           "Anna's Homepage",
>> 
>>           {
>>             "rdf:value" : "Annas hjemmeside",
>>             "rdf:datatype" : "http://www.w3.org/2001/__XMLSchema#string
>>    <http://www.w3.org/2001/XMLSchema#string>",
>>             "xml:lang" : "da"
>>           } ],
>> 
>>           "http://xmlns.com/foaf/0.1/__homepage
>>    <http://xmlns.com/foaf/0.1/homepage>" : { "rdf:resource" :
>>    "http://example.org/anna" },
>> 
>>           "http://purl.org/dc/terms/__creator
>>    <http://purl.org/dc/terms/creator>" : "_:anna"
>> 
>>         }
>>    }
>> 
>>    At first it may not seem that the above proposal differs much in
>>    substance from RDF/JSON, but it does in a number of ways. It retains
>>    the essence of { "S" : { "P" : "O" } } model, but simplifies the
>>    serialization for simple cases, and aligns more complex cases with a
>>    transliteration of RDF/XML attributes. This requires no actual
>>    knowledge of RDF as a re-serializer.
>> 
>>    The model also lends itself "naturally" to QName support [3], thus
>>    becoming closer to being informationally lossless with respect to
>>    RDF/XML. We support Qnames by noting the "xmlns" attribute on the
>>    rdf:RDF "Subject"; viz.:
>> 
>>    {
>> 
>>       "rdf:RDF" : {
>> 
>>           "xmlns:rdf"  : "http://www.w3.org/1999/02/22-__rdf-syntax-ns#
>>    <http://www.w3.org/1999/02/22-rdf-syntax-ns#>",
>>           "xmlns:xsd"  : "http://www.w3.org/2001/__XMLSchema#
>>    <http://www.w3.org/2001/XMLSchema#>",
>> 
>>           "xmlns:"     : "http://example.org/",
>>           "xmlns:dc"   : "http://purl.org/dc/terms/",
>>           "xmlns:foaf" : "http://xmlns.com/foaf/0.1"
>>       },
>> 
>>       ":about" : {
>>         ...
>>       }
>>    }
>> 
>>    We bootstrap the definition of the rdf: namespace within the rdf:RDF
>>    construct. We make the implicit assumption that the token "rdf:RDF"
>>    can never itself be the valid Subject of a user-defined payload--a
>>    topic we discuss further in section 4. below.
>> 
>>    We can achieve a slight clean-up in presentation by recognizing
>>    "xmlns" as a keyword, but we do this only as "syntactical sugar" on
>>    the underlying model of XML attributes on Subject entries; e.g.:
>> 
>>    {
>>         "xmlns" : {
>>           ""     : "http://example.org/",
>>           "dc"   : "http://purl.org/dc/terms/",
>>           "foaf" : "http://xmlns.com/foaf/0.1"
>>         },
>> 
>>       ":about" : {
>>         ...
>>       }
>>    }
>> 
>>    RDF requires that all Subjects are resources: either URIs or bnodes.
>>    Resources can be lexically written in four variants:
>> 
>>    Absolute URIs; e.g., http://example.org/about, urn:example:about
>>    QName with prefix (namespace); e.g., dc:title
>>    QName with reserved underscore (_) for bnode; e.g., _:anna
>>    QName with user-defined default namespace; e.g., ":myTerm"
>> 
>>    Notably, RDF does not allow relative URIs for Subjects or Predicates
>>    [5]. Thus "a", "5", "a/b/c", are all valid (relative) URIs, but are
>>    lexically illegal as RDF Subjects. Thus we note that lexically, all
>>    valid Subjects and Predicates necessarily always contain a colon
>>    (:). Thus we can unambiguously allow the keyword "xmlns" (or
>>    "@xmlns") to appear in the "S" place and overload it with special
>>    meaning as a document directive. In a similar manner we can use
>>    "?xml" to preserve record of the XML document encoding that may
>>    appear on the first line of an RDF/XML document. In so doing we are
>>    not stating that 'this' document has the encoding; we are stating
>>    that this document, if transliterated from, or to, XML, has the
>>    encoding:
>> 
>>    {
>> 
>>         "?xml" : {
>>           "version" : "1.0",
>>           "encoding" : "UTF-8"
>>         },
>> 
>>         "xmlns" : {
>>           "rdf"  : "http://www.w3.org/1999/02/22-__rdf-syntax-ns#
>>    <http://www.w3.org/1999/02/22-rdf-syntax-ns#>",
>>           "xsd"  : "http://www.w3.org/2001/__XMLSchema#
>>    <http://www.w3.org/2001/XMLSchema#>",
>>           ""     : "http://example.org/",
>>           "dc"   : "http://purl.org/dc/terms/",
>>           "foaf" : "http://xmlns.com/foaf/0.1"
>>         },
>> 
>>       ":about" : {
>> 
>>         "dc:title" : [
>>           "Anna's Homepage",
>>           {
>>             "rdf:value" : "Annas hjemmeside",
>>             "rdf:datatype" : "xsd:string",
>>             "xml:lang" : "da"
>>           } ],
>> 
>>           "foaf:homepage" : { "rdf:resource" : ":anna" },
>> 
>>           "dc:creator" : "_:anna"
>> 
>>         },
>> 
>>       "_:anna" : {
>>         "foaf:name" : "Anna",
>>         "foaf:homepage" : { "rdf:resource" : "http://example.org/anna" }
>>         }
>>    }
>> 
>>    Note in the above the use of (source) doc encoding, prefixes,
>>    default namespace, QNames, absolute URIs, bnodes, untyped literals,
>>    and typed literals. This could have been serialized from an RDF data
>>    model, or transliterated syntactically from RDF/XML. Our rules are
>>    still simple and consistent: almost the same as RDF/JSON, with the
>>    extension that object "metadata" is analogous to RDF/XML attributes
>>    and bundled inside a JSON object using existing rdf: namespace
>>    predicates.
>> 
>> 
>>    Form 2. Support the disbursement of statements throughout a
>>    document, for example as applicable when stream transliterating
>>    RDF/XML -> JSON. This currently cannot be done in RDF/JSON, but is
>>    quite simple to do:
>> 
>>    [
>> 
>>       { "?xml" : {
>>           "version" : "1.0",
>>           "encoding" : "UTF-8"
>>         }
>>       },
>> 
>>       { "xmlns" : {
>>           "rdf"  : "http://www.w3.org/1999/02/22-__rdf-syntax-ns#
>>    <http://www.w3.org/1999/02/22-rdf-syntax-ns#>",
>>           "xsd"  : "http://www.w3.org/2001/__XMLSchema#
>>    <http://www.w3.org/2001/XMLSchema#>",
>>           ""     : "http://example.org/",
>>           "dc"   : "http://purl.org/dc/terms/",
>>           "foaf" : "http://xmlns.com/foaf/0.1"
>>         }
>>       },
>> 
>>       { ":about" :
>>         {
>>           "dc:title" : "Anna's Homepage",
>>           "dc:creator" : "_:anna"
>>         }
>>       },
>> 
>>       { "_:anna" : {
>>         "foaf:name" : "Anna",
>>         "foaf:homepage" : { "rdf:resource" : "http://example.org/anna" }
>>         }
>>       },
>> 
>>       { ":about" : {
>>         "dc:title" : {
>>             "rdf:value" : "Annas hjemmeside",
>>             "rdf:datatype" : "xsd:string",
>>             "xml:lang" : "da"
>>           },
>>         "foaf:homepage" : { "rdf:resource" : ":anna" }
>>         }
>>       }
>> 
>>    ]
>> 
>>    (Note the repetition of :about). All the previous rules apply. We
>>    simply note that { "S" : { "P" : "O" } } used in the earlier
>>    examples was just a simplification of a larger, more encompassing
>>    model: [ { "S" : { "P" : "O" } }, { "S" : { "P" : "O" } }, ... ].
>>    This reads "naturally:" an array of JSON objects, each making
>>    statements about an RDF Subject, with no restriction that successive
>>    Subjects be unique (Because each is enclosed in its own {}
>>    construct). The embracing opening and closing JSON array []
>>    construct (Form 2) "communicates" the chosen serialization to the
>>    parser that it may NOT now assume that all statements about a given
>>    Subject are known, until it processes through the End-Of-File. If
>>    the serializer chooses to group all statements for all subjects
>>    (Form 1), then it can easily do this too by not using the opening
>>    JSON array [] construct and building JSON objects per the earlier
>>    examples above. Thus the "spec" does not bais towards parsers or
>>    serializers (it lets the producer decide). The spec supports
>>    short-circuiting for both streaming serializers and streaming
>>    parsers: just write/read the first non-whitespace character as a '['
>>    or '{' and proceed accordingly.
>> 
>> 
>>    3. RDF/XML has short-hand notation for rdf:type statements that
>>    allows concise "declarations" at the beginning of a document. These
>>    declarations can aid parsers. For example, OWL models can be aided
>>    by knowing if a property is an owl:ObjectProperty or an
>>    owl:DatatypeProperty when it is first *used* (i.e., when it first
>>    occurs as a resource in a statement). Because the serialization of
>>    RDF does not place restrictions on the ordering within a document of
>>    resource definitions and type statements, a predicate's use may
>>    precede its declaration and definition (if any). The RDF/XML
>>    "declaration" short-hand looks like this:
>> 
>>    <owl:Class rdf:about="http://mySite.org/__MyClass
>>    <http://mySite.org/MyClass>"/>
>>    <mySite:MyClass rdf:about="http://mySite.org/__MyThing
>>    <http://mySite.org/MyThing>"/>
>>    <owl:DatatypeProperty
>>    rdf:about="http://mySite.org/__myDatatypeProperty
>>    <http://mySite.org/myDatatypeProperty>"/>
>>    <owl:DatatypeProperty
>>    rdf:about="http://mySite.org/__myOtherDatatypeProperty
>>    <http://mySite.org/myOtherDatatypeProperty>"/>
>>    ....
>> 
>>    and is semantically equivalent to more verbose rdf:type statements
>>    about each of the resources.
>> 
>>    Now note that the { "S" : { "P" : "O" } } construct leaves two other
>>    constructs undefined; namely:
>> 
>>       { "S" : "T" } and
>>       { "S" : [ "T", ... ] }
>> 
>>       where "T" is some text (a string).
>> 
>>    Thus we can define the use of these constructs to support concise
>>    rdf:type declarations in a manner similar to RDF/XML:
>> 
>>    {
>>       "owl:Class" : "mySite:myClass",
>>       "mySite:MyClass" : "mySite:myThing",
>>       "owl:DatatypeProperty" : [ "mySite:myDatatypeProperty",
>>    "mySite:__myOtherDatatypeProperty" ]
>>       ...
>>    }
>> 
>>    The meaning of the above is that the JSON objects (or array
>>    elements) are each rdf:type of the JSON subject. There is no
>>    ambiguity in how to interpret the above because none of the
>>    constructs are of the form "S" : { ... }. This aligns nicely with
>>    RDF/XML declarations. Full example is below in 4.
>> 
>> 
>>    4. Semantic serialization and parsing. RDF/JSON is presumably a sole
>>    RDF -> JSON serialization. It need know nothing about RDF/XML
>>    (though clearly here I advocate changing that to a tighter linkage
>>    to informationally lossless transliteration of RDF/XML). But it
>>    seems that the more that RDF/JSON differentiates itself as something
>>    more than "one more ad hoc way of representing RDF in JSON" (of
>>    which there are many such competing proposals), the more it could
>>    position itself as an important and distinct addition to the W3C
>>    toolbox.
>> 
>>    One way to do this is to more tightly embrace RDF as the underlying
>>    W3C Semantic Web technology and then use knowledge of those
>>    semantics to improve the serialization; i.e., RDF/JSON would be a
>>    "smart," semantically-aware JSON serialization of W3C Semantic Web
>>    technologies.
>> 
>>    We immediately distinguish here between "semantic serialization and
>>    parsing" and "inference." Various implicit forms of semantic parsing
>>    are already done by many parsers and interpreters--for example, a
>>    scripting language interpreter may assume from 'var x = 1' that x is
>>    an integer variable, even though it has not been declared as being
>>    of that type. The goal of semantic serialization and parsing is to
>>    improve and effect the serialization and parsing while neither
>>    adding nor removing any new knowledge. For example, with semantic
>>    parsing this:
>> 
>>    {
>> 
>>       "owl:DatatypeProperty" : ":myProperty",
>> 
>>       ":mySubject" : {
>>         ":myProperty" : {
>>           "rdf:resource" : "http://example.org/anna"
>>         }
>>       }
>>    }
>> 
>>    is equivalent to, and could be replaced by, this:
>> 
>>    {
>> 
>>       "owl:DatatypeProperty" : ":myProperty",
>> 
>>       ":mySubject" : { ":myProperty" : "http://example.org/anna" }
>>    }
>> 
>>    The token "http://example.org/anna" is necessarily a resource, not a
>>    literal. The line between semantic serialization and parsing and
>>    inference is subtle. The former is concerned with preservation of
>>    explicit statements of knowledge (or their absence) while using ex
>>    situ knowledge in a manner that improves the serialization or
>>    parsing; the latter is concerned with making statements explicit
>>    that may otherwise be necessarily-true yet only implicit (not
>>    stated). Our focus is on the former. (If a serialization is missing
>>    statements, we want to preserve that absence, since the action of
>>    serialization should maintain input->output data integrity [for
>>    example, cases of purposely "broken" data models for the purpose of
>>    testing]).
>> 
>>    A side-effect of the above is that in order to support streaming
>>    parsers, the order of statements in the document can be important
>>    (e.g., in the above example, if the declaration of myProperty
>>    occurred after its assignment, then the value
>>    "http://example.org/anna" would be considered a string literal, not
>>    a resource). This can be an issue, because RDF -> RDF/XML
>>    serializers may not give users control of the ordering of
>>    statements, nor even guarantee deterministic representations on
>>    successive invocations, thus RDF -> RDF/XML -> RDF/JSON -> RDF could
>>    fail to be informationally lossless. There are ways to address this,
>>    but at a minimum semantic serialization and parsing should be
>>    carefully weighed.
>> 
>>    If we accept due diligence on a dependency of statement ordering in
>>    the document, then we can outline at least four ways to support
>>    semantic serialization and parsing:
>> 
>>    1. Recognize "rdf:RDF", "xmlns", etc. when they appear in the
>>    Subject position as document directives, not user-defined Subjects
>>    (see above).
>> 
>>    2. Predefine the xmlns namespaces rdf, rdfs, xsd, and owl (require
>>    no explicit assignments).
>> 
>>    3. Recognize the semantics of rdf:type, rdfs:range, rdfs:domain,
>>    rdfs:subClassOf, rdfs:subPropertyOf, etc.: the RDF Object of those
>>    predicates must be a resource (cannot be a literal). An exception
>>    and special semantics apply when the object is an XSD datatype
>>    (e.g., "rdfs:range xsd:integer").
>> 
>>    4. Allow the preservation of ex situ RDF comments with the keyword
>>    "comment" (or "@comment" or "//" or "#"). For example, if
>>    transliterating in RDF/XML, then the comments would be re-serialized
>>    as XML comments (<!-- -->). But if translating into N3, then the
>>    comments would be re-serialized as # comments.
>> 
>>    Example:
>> 
>>    {
>> 
>>       "?xml" : {
>>         "version" : "1.0",
>>         "encoding" : "UTF-8"
>>       },
>> 
>>       "xmlns" : {
>>         ""       : "http://example.org/",
>>         "dc"     : "http://purl.org/dc/terms/",
>>         "foaf"   : "http://xmlns.com/foaf/0.1",
>>         "mySite" : "http://mySite.org/myTerms/"
>>       },
>> 
>>       "//" : "This is a comment",
>> 
>>       "rdf:Property" : [ "dc:title", "dc:creator" ],
>> 
>>       "owl:DatatypeProperty" : "mySite:aDatatypeProperty",
>> 
>>       "owl:ObjectProperty" : "mySite:hasHomepage",
>> 
>>       "owl:Class" : [ "mySite:myClass", "mySite:anotherClass" ],
>> 
>>       "mySite:aDatatypeProperty" : {
>>           "rdfs:range" : "xsd:string"
>>       },
>> 
>>       "mySite:anObjectProperty" : {
>>         "rdfs:range" : "mySite:myClass"
>>       },
>> 
>>       "mySite:anotherObjectProperty" : {
>>         "rdfs:subPropertyOf" : "mySite:anObjectProperty",
>>         "rdfs:domain" : "mySite:myClass"
>>       },
>> 
>>       ":about" : {
>>           "dc:title" : "Anna's Homepage",
>>           "dc:creator" : "_:anna",
>>           "mySite:hasHomepage" : "http://example.org/anna",
>>           "rdfs:comment" : [
>>             "This comment is an explicit property of the subject :about",
>>             "So is this one"
>>             ],
>>           "//" : [
>>             "This is not a property of the subject.",
>>             "It is equivalent to two XML comments <!-- --> within the
>>    :about element block when re-serialized as RDF/XML"
>>             ]
>>         }
>> 
>>    }
>> 
>> 
>>    I believe the above will allow the informationally lossless
>>    transliteration of thousands (millons?) of extant RDF/XML documents
>>    into RDF/JSON--though a more thorough analysis is first warranted.
>>    The mere proliferation of said documents conforming to RDF/JSON
>>    should aid in its adoption. And of course, de novo RDF -> RDF/JSON
>>    is also satisfied.
>> 
>> 
>>    Summary:
>> 
>>    There are many candidates for serializing RDF as JSON. If we want
>>    anything more than the null model of a array of triples, then we
>>    should identify the goals and prioritize the trade-offs. The
>>    proposal here attempts the following goals:
>> 
>>    1. RDF/JSON should enable RDF -> JSON serialization independent any
>>    other RDF serialization (specifically, one should be able to go
>>    directly from an RDF data model into RDF/JSON without any
>>    intervening serialization).
>> 
>>    2. RDF/JSON should be able to be implemented as a streaming
>>    re-serializer on legacy RDF/XML without the need for building a
>>    complete, in-memory RDF data model. The special attention to RDF/XML
>>    is because it is already the W3C recommended serialization for RDF.
>> 
>> 
>> I don't understand why this needs to be a goal. I also did not
>> understand how your proposal enables it, as your examples do not explore
>> the full range of legal RDF/XML document syntax trees, some of which are
>> unnecessarily complex and really do not need to be replicated in any
>> other RDF serialisation. It may be useful to be able to transliterate
>> *from* RDF/XML to something else, although the usecase would be very
>> thin, but there is no reason to be able to support reserialising back to
>> RDF/XML once you have gone away from it, so you don't need to preserve
>> the XMLisms in JSON.
>> 
>>    3. RDF/JSON should allow the enablement of short-circuit parsing, if
>>    the provider chooses to serialize content so as to support it.
>> 
>> 
>> I am a little confused as to how the format you propose, which is not
>> really the simple Talis RDF/JSON anymore after the changes, could be
>> structured to *not* support short-circuit parsing anymore. The JSON
>> model does not allow repeated keys within an object, so there is no
>> simple way to use subjects as keys in any other way and I am not sure
>> what the other alternative is from your proposal.
>> 
>> In general though, I am a little confused about the need to ever do
>> short-circuit parsing. What documents are so large that you cannot pay
>> the cost of parsing an entire document to the RDF abstract model?
>> 
>>    4. RDF/JSON should be informationally lossless with respect to both
>>    RDF and to transliterations of RDF/XML.
>> 
>> 
>> Any RDF serialisation must not be informationally lossless with respect
>> to RDF. Some serialisations support structures that cannot be translated
>> back to RDF triples (ie, any quads format, JSON-LD with relaxed use of
>> blank nodes, and N3 with its extensions), but all of them are otherwise
>> only defined based on the RDF format, not on another syntax.
>> 
>> I fail to see what the benefit would be to having a consistent
>> transliteration from the huge variety of possible RDF/XML structures
>> without going through an RDF model.
>> 
>>    5. RDF/JSON should reflect a "natural" JSON representation: simple
>>    things should be "simply serialized" and complex things should be
>>    built from simple things. If one knows JSON, but doesn't really know
>>    RDF, then one should feel comfortable that JSON constructs are being
>>    used in intuitive, "natural" ways without the need for syntactic
>>    convolutions.
>> 
>> 
>> I think you would be more comfortable using JSON-LD, as it is designed
>> based on many of your goals, except for the RDF/XML transliteration
>> goal, and includes many of the features that you propose, except for
>> comments.
>> 
>>    6. As a proposed W3C recommendation, RDF/JSON should leverage RDF,
>>    RDFS, XSD, and OWL semantics when it can do so either without
>>    compromise to the above goals, or with clear and prioritized
>>    compromise (for example, identifying cases where reliance on
>>    statement ordering is acceptable).
>> 
>> 
>> Of the RDF serialisations, only N3, with its non-RDF extensions,
>> attempts to do anything other than provide a container for simple RDF
>> triples or quads.  How would your proposed format encode anything above
>> RDF triples while staying consistent with RDF?
>> 
>> Don't get me wrong, you could have a niche format for your own purposes.
>> However, I think the usecases, which heavily rely on being able to
>> represent a literal RDF/XML document in JSON, are very thin and would
>> not be of interest to many people who will simply pay the cost of
>> parsing an entire document to memory. Alternatively, parsing RDF/XML to
>> N-Triples can be done while streaming from disk to disk, and sorting the
>> document can be done easily with a fixed memory cost, before parsing it
>> and serialising to Talis RDF/JSON in a streaming method. This is all
>> possible without the legacy XML-specific information that will not be
>> practically useful to anyone using the JSON document, and they will not
>> want to preserve it, in general, just to support a translation back to
>> the exact XML document that was originally used to create it.
>> 
>> Of your proposed changes to Talis RDF/JSON, the namespace extensions
>> would be of most interest to me, although I would definitely not relate
>> it to the XML QName specification which is far too limited to be of any
>> use in a modern format.
>> 
>> Prior to that, if W3C is interested in continuuing at all with RDF/JSON
>> standardisation, I will be proposing to add Graph/Quads support to the
>> specification based on the extension that Joshua Shinavier made to the
>> format for the Sesame RDF/JSON parser/writer. It adds an extra "graph"
>> key with an Array of URIs, added to the Object position, and is fairly
>> backwards compatible with the current Talis RDF/JSON specification as
>> long as parsers do not fault on the unrecognised "graph" key.
>> 
>> Peter
> 
> 

Received on Friday, 17 May 2013 21:31:07 UTC