- From: Manu Sporny <msporny@digitalbazaar.com>
- Date: Sun, 30 May 2010 11:57:40 -0400
- To: RDFa Community <public-rdfa@w3.org>
On 05/30/2010 04:56 AM, Toby Inkster wrote:
> On Sat, 29 May 2010 21:06:31 -0400
> Manu Sporny <msporny@digitalbazaar.com> wrote:
>
>> http://rdfa.digitalbazaar.com/specs/source/json-ld/
>
> The document at several times uses the term "unambiguous", but I don't
> think it is. For example, it says:
>
> In order to differentiate between plain text and IRIs, the < and >
> are used around IRIs.
>
> But what about plain text that happens to start with "<" and end with
> ">"?
Escape characters:
http://rdfa.digitalbazaar.com/specs/source/json-ld/#escape-character
I'll need to beef up that section, but the general idea is that any
special characters like "<", ">", and "^" MUST be escaped to not be
interpreted as IRIs or TypedLiterals.
Really, "<" for establishing IRIs only needs to be escaped if it's at
the beginning of a string. ">" only needs to be escaped for IRIs if it's
at the end of a string, and "^^" needs to be escaped if it is at the
beginning of a string /and/ is the second element in an array.
I haven't quite worked through whether or not these values should
/always/ be escaped, or only in those instances. The key, though, is
that as long as they're escaped, the markup is unambiguous.
> For example:
>
> {
> "dc:abstract" : "A discussion of the abbreviations in HTML.",
> "dc:title" : "<abbr> versus <acronym>"
> }
This would be:
{
"dc:abstract" : "A discussion of the abbreviations in HTML.",
"dc:title" : "\\<abbr\\> versus \\<acronym\\>"
}
or (if we employ some more involved escaping rules):
{
"dc:abstract" : "A discussion of the abbreviations in HTML.",
"dc:title" : "\\<abbr> versus <acronym\\>"
}
or
{
"dc:abstract" : "A discussion of the abbreviations in HTML.",
"dc:title" : "\\<abbr> versus <acronym>"
}
> Also, if you imagine the following two RDFa snippets, with different
> meanings, they seem to have the same representation in JSON-LD:
>
> <!-- snippet 1 -->
> <div typeof="">
> <span property="dc:modified"
> datatype="xsd:dateTime">2010-05-29T14:17:39+02:00</span>
> </div>
>
> <!-- snippet 2 -->
> <div typeof="">
> <span property="dc:modified">2010-05-29T14:17:39+02:00</span>
> <span property="dc:modified">^^xsd:dateTime</span>
> </div>
>
> Both are represented as:
>
> {
> "dc:modified" : ["2010-05-29T14:17:39+02:00", "^^xsd:dateTime"]
> }
Not if they're escaped... properly encoding the values is up to the
application, but the first would be:
{
"dc:modified" : ["2010-05-29T14:17:39+02:00", "^^xsd:dateTime"]
}
and the second would be:
{
"dc:modified" : ["2010-05-29T14:17:39+02:00", "\\^\\^xsd:dateTime"]
}
> This could possibly be addressed by representing datatyped values like
> this (i.e. similarly to RDF/JSON):
>
> {
> "dc:modified" : {
> "value" : "2010-05-29T14:17:39+02:00",
> "datatype" : "xsd:dateTime",
> }
> }
One of the goals of JSON-LD is being as terse as possible. The primary
issue I have with RDF/JSON is that it is incredibly verbose. That
verboseness turns most developers away because the JSON ends up being
huge for real-world uses. We tried using RDF/JSON for our web services
at one point and it ballooned the data sent to API calls by 200%-500%.
So JSON-LD asserts the following lessons learned:
1. Deeply nested structures are very bad.
2. Terseness improves readability and reduces data size requirements.
The key concept that makes JSON-LD stick out of the pack is "The
Context". The Context makes compression of the JSON data possible.
> How language tags are represented is not mentioned in the document, but
> they could perhaps be handled similarly to datatypes.
Yeah, I haven't put enough thought into that yet, but this may be where
we end up:
{
"dc:title" : ["Abbreviations in HTML", "@en"],
}
or even:
{
"dc:title" : ["Abbreviations in HTML@en"],
}
The second is unambiguous if you use this algorithm:
1. Check the last 4 characters of the string
2. If it starts with "\", then it's a PlainLiteral.
3. If it starts with "@" it is a PlainLiteral with a language.
> It seems pretty far out of scope for HTMLWG. Perhaps SWIG?
SWIG can't publish REC-track documents, IIRC.
>From HTML WG Charter (Scope):
"""Data and canvas are reasonable areas of work for the group."""
HTML WG is also the group that is publishing HTML+RDFa /and/ Microdata.
Doesn't hurt to ask... and one only needs 3 supporters to publish a
document via HTML WG. :)
WebApps may be another option... from WebApps Charter (Scope):
"""The scope of the Web Applications Working Group covers the
technologies related to developing client-side applications on the Web,
including both markup vocabularies for describing and controlling
client-side application behavior .... Additionally, server-side APIs for
support of client-side functionality will be defined as needed."""
W3C really needs a Working Group to create POSIX for the Web. A set of
Web APIs and calling conventions to enable websites to expose common
APIs (like login, logout, certificate registration, sign up, etc.).
-- manu
--
Manu Sporny (skype: msporny, twitter: manusporny)
President/CEO - Digital Bazaar, Inc.
blog: Bitmunk 3.2.2 - Good Relations and Ditching Apache+PHP
http://blog.digitalbazaar.com/2010/05/06/bitmunk-3-2-2/2/
Received on Sunday, 30 May 2010 15:58:10 UTC