Re: JSON-LD - experimenting with universal Linked Data markup for Web Services

On 05/30/2010 04:56 AM, Toby Inkster wrote:
> On Sat, 29 May 2010 21:06:31 -0400
> Manu Sporny <msporny@digitalbazaar.com> wrote:
> 
>> http://rdfa.digitalbazaar.com/specs/source/json-ld/
> 
> The document at several times uses the term "unambiguous", but I don't
> think it is. For example, it says:
> 
>     In order to differentiate between plain text and IRIs, the < and >
>     are used around IRIs.
> 
> But what about plain text that happens to start with "<" and end with
> ">"?

Escape characters:

http://rdfa.digitalbazaar.com/specs/source/json-ld/#escape-character

I'll need to beef up that section, but the general idea is that any
special characters like "<", ">", and "^" MUST be escaped to not be
interpreted as IRIs or TypedLiterals.

Really, "<" for establishing IRIs only needs to be escaped if it's at
the beginning of a string. ">" only needs to be escaped for IRIs if it's
at the end of a string, and "^^" needs to be escaped if it is at the
beginning of a string /and/ is the second element in an array.

I haven't quite worked through whether or not these values should
/always/ be escaped, or only in those instances. The key, though, is
that as long as they're escaped, the markup is unambiguous.

> For example:
> 
>     {
>       "dc:abstract" : "A discussion of the abbreviations in HTML.",
>       "dc:title" : "<abbr> versus <acronym>"
>     }

This would be:

{
   "dc:abstract" : "A discussion of the abbreviations in HTML.",
   "dc:title" : "\\<abbr\\> versus \\<acronym\\>"
}

or (if we employ some more involved escaping rules):

{
   "dc:abstract" : "A discussion of the abbreviations in HTML.",
   "dc:title" : "\\<abbr> versus <acronym\\>"
}

or

{
   "dc:abstract" : "A discussion of the abbreviations in HTML.",
   "dc:title" : "\\<abbr> versus <acronym>"
}

> Also, if you imagine the following two RDFa snippets, with different
> meanings, they seem to have the same representation in JSON-LD:
> 
>     <!-- snippet 1 -->
>     <div typeof="">
>       <span property="dc:modified"
>             datatype="xsd:dateTime">2010-05-29T14:17:39+02:00</span>
>     </div>
> 
>     <!-- snippet 2 -->
>     <div typeof="">
>       <span property="dc:modified">2010-05-29T14:17:39+02:00</span>
>       <span property="dc:modified">^^xsd:dateTime</span>
>     </div>
> 
> Both are represented as:
> 
>     {
>       "dc:modified" : ["2010-05-29T14:17:39+02:00", "^^xsd:dateTime"]
>     }

Not if they're escaped... properly encoding the values is up to the
application, but the first would be:

{
   "dc:modified" : ["2010-05-29T14:17:39+02:00", "^^xsd:dateTime"]
}

and the second would be:

{
   "dc:modified" : ["2010-05-29T14:17:39+02:00", "\\^\\^xsd:dateTime"]
}

> This could possibly be addressed by representing datatyped values like
> this (i.e. similarly to RDF/JSON):
> 
>     {
>       "dc:modified" : {
>         "value"    : "2010-05-29T14:17:39+02:00",
>         "datatype" : "xsd:dateTime",
>         }
>     }

One of the goals of JSON-LD is being as terse as possible. The primary
issue I have with RDF/JSON is that it is incredibly verbose. That
verboseness turns most developers away because the JSON ends up being
huge for real-world uses. We tried using RDF/JSON for our web services
at one point and it ballooned the data sent to API calls by 200%-500%.

So JSON-LD asserts the following lessons learned:
   1. Deeply nested structures are very bad.
   2. Terseness improves readability and reduces data size requirements.

The key concept that makes JSON-LD stick out of the pack is "The
Context". The Context makes compression of the JSON data possible.

> How language tags are represented is not mentioned in the document, but
> they could perhaps be handled similarly to datatypes.

Yeah, I haven't put enough thought into that yet, but this may be where
we end up:

{
   "dc:title" : ["Abbreviations in HTML", "@en"],
}

or even:

{
   "dc:title" : ["Abbreviations in HTML@en"],
}

The second is unambiguous if you use this algorithm:

1. Check the last 4 characters of the string
2. If it starts with "\", then it's a PlainLiteral.
3. If it starts with "@" it is a PlainLiteral with a language.

> It seems pretty far out of scope for HTMLWG. Perhaps SWIG?

SWIG can't publish REC-track documents, IIRC.

>From HTML WG Charter (Scope):

"""Data and canvas are reasonable areas of work for the group."""

HTML WG is also the group that is publishing HTML+RDFa /and/ Microdata.
Doesn't hurt to ask... and one only needs 3 supporters to publish a
document via HTML WG. :)

WebApps may be another option... from WebApps Charter (Scope):

"""The scope of the Web Applications Working Group covers the
technologies related to developing client-side applications on the Web,
including both markup vocabularies for describing and controlling
client-side application behavior .... Additionally, server-side APIs for
support of client-side functionality will be defined as needed."""

W3C really needs a Working Group to create POSIX for the Web. A set of
Web APIs and calling conventions to enable websites to expose common
APIs (like login, logout, certificate registration, sign up, etc.).

-- manu

-- 
Manu Sporny (skype: msporny, twitter: manusporny)
President/CEO - Digital Bazaar, Inc.
blog: Bitmunk 3.2.2 - Good Relations and Ditching Apache+PHP
http://blog.digitalbazaar.com/2010/05/06/bitmunk-3-2-2/2/

Received on Sunday, 30 May 2010 15:58:10 UTC