Re: JSON Schema (json-schema.org) support? from David I. Lehn on 2013-08-15 (public-linked-json@w3.org from August 2013)

From: David I. Lehn <dil@lehn.org>
Date: Thu, 15 Aug 2013 16:24:43 -0400
To: Edwin Shao <eshao@eshao.es>
Cc: Linked JSON <public-linked-json@w3.org>
Message-ID: <CADcbRRN2y-Gx27HN2sav3va6_53UDYo4CK0i5QmXdQJ7iPZ84w@mail.gmail.com>

On Thu, Aug 15, 2013 at 6:54 AM, Edwin Shao <eshao@eshao.es> wrote:
> It strikes me that JSON-LD and JSON Schema are quite complementary. The
> first provides context, metadata, and a standard graph traversal mechanism.
> The second provides a way to describe and validate a given JSON object.
>

Although they may seem to work well together at first, there are some
considerable limitations in using JSON Schema as a long term solution
for JSON-LD description and validation.  Despite this, our PaySwarm
server code currently uses JSON Schema for validation so I'm familiar
with the idea and how it can work with some limitations.

The main issue is that JSON Schema describes and validates JSON with a
known structure.  But JSON-LD is a flexible serialization of graph
data.  In the general sense, this makes the two somewhat incompatible.
 There are many ways to serialize JSON-LD data which are all
equivalent at a low level.  But (any sane use of) JSON Schema only
works if the data is serialized with a certain structure.  In order to
properly validate arbitrary JSON-LD data with JSON Schema, you first
need to make a pass with something like the framing algorithm that is
a work-in-progress spec.  That would give you a structure that you
could then validate.  I'm not sure the framing algorithm or code was
optimized for this sort of use but maybe could be.

If you are, say, getting JSON-LD data via a web service POST call, you
could document that the JSON-LD data MUST be formatted in a certain
way and MUST use a certain context in order to be valid.  That is a
rather unfortunate limitation given how powerful this technology could
be.  For what it's worth, PaySwarm basically works like that
currently.

A better solution would be to leverage some of the RDF and OWL schema
work.  The first step would be to create a proper schema for your
semantic data using RDF, OWL, or similar.  Then a hypothetical web
service could take input as JSON-LD (in any structural form), n-quads,
n3, turtle, rdfa, etc, convert it to a low-level normalized form (such
as triples), and then run a validator on that data with the semantic
schema.  This is an interesting approach to take since you are now
validating the semantic data without concern for its format or
presented structure.  Once validated, you can run the JSON-LD framing
algorithm on the data to get it into a known structure that is easy to
internally process.

> ...
> If there is no canonical way currently (which seems to be the case), I would
> suggest including one in the upcoming spec, perhaps creating a new @schema
> keyword.
>

A keyword seems like overkill for this. As Markus said, there may be
other better mechanisms.  At one point we were discussing extension
methods.  Did that get forgotten?  Something like a @context: {@meta:
{...}} object that you could throw custom key/value pairs into for
custom processing.  That sort of thing would let you add
{"http://json-schema.org/schema": "http://exampe.com/foo.json"} as
metadata for processors that want to support it.  I suppose that could
just be in the raw data too but might be crufty.

-dave

Received on Thursday, 15 August 2013 20:25:11 UTC