Forms and principles of the JSOD-LD context

Hi all!

In the last telecon we discussed changing @coerce to use the terms as
keys. AFAIK this is now agreed upon. This brought on a short
discussion about the current form of contexts.

As Manu explained, originally the current form was made for brevity,
but we now believe that contexts can turn out to be fairly large
anyway and will commonly be linked to as external documents. Thus, the
case for brevity is lessened.

I then went on to explore some options we may consider regarding how
contexts work right now. I promised to do some tests and raise this on
the mailing list. My delay in sending this has been since I've wavered
a lot in evaluating the results. It's a bit tricky to pick principles
against which to evaluate the role and form of the context definition.

This is thus not so much a suggestion for a change as an attempt at
discussing what we want the context to be. I believe that the strife
to make things dead simple and as flat as possible are very important
design goals. I also think that the current form of JSON-LD contexts
adhere to these quite well. But I'd like to illuminate what the
options are, and articulate the reasons for (and possibilities of)
various forms.

We may express some statements about the role and scope of contexts:

* It's more important to easily read contexts than to write them.

* A context has the following roles:
  - It maps terms to IRIs (including terms used as prefixes)
  - It can map terms to special processing keys (@iri, @type etc.)
  - It can define a default term base using @vocab
  - It can map a term to a coercion rule (defining how to interpret a value)
  - It can define a default language (requiring plain string values to
be explicitly coerced)

So lets look at an alternative to the current context. It is about
combining the declaration of the IRI for a term and an optional
coercion. Consider this example in the current form of a context
(using the new @coerce form mentioned above):

    "@context": {
        "@vocab": "http://purl.org/dc/terms/",
        "label": "http://www.w3.org/2000/01/rdf-schema#label",
        "Document": "http://xmlns.com/foaf/0.1/Document",
        "primaryTopic": "http://xmlns.com/foaf/0.1/primaryTopic",
        "@coerce":  {
            "created": "dateTime",
            "creator": "@iri",
            "identifier": "string"
            "issued": "date",
            "updated": "dateTime",
            "primaryTopic": "@iri"
        }
    }

We could instead use a form where values can either be the IRI string
for the term, or an object defining both the @iri (or none to resolve
it to @vocab) and a @coerce rule. Like:

    "@context": {
        "@vocab": "http://purl.org/dc/terms/",
        "created": {
            "@coerce": "dateTime"
        },
        "creator": {
            "@coerce": "@iri"
        },
        "identifier": {
            "@coerce": "string"
        },
        "issued": {
            "@coerce": "date"
        },
        "updated": {
            "@coerce": "dateTime"
        },
        "label": "http://www.w3.org/2000/01/rdf-schema#label",
        "Document": "http://xmlns.com/foaf/0.1/Document",
        "primaryTopic": {
            "@iri": "http://xmlns.com/foaf/0.1/primaryTopic",
            "@coerce": "@iri"
        }
    }

Now, I see that this is contentious. While it does make it immediate
for a reader what both the IRI of and coercion for a term is, it isn't
a given that those questions have to be answered at once. The current
form (terms used as keys in "@coerce") is possibly superior for a
reader only looking for what datatype is used for a term.

For the (current) first form above to be better, this can be the principle:

* When interpreting a context, users are expected to only look at one
aspect at a time.

A counter-principle could be:

* When reading a context, everything applicable to the term should be
immediate at once.

Considering that a context defines a term by mapping it to an IRI and
potentially a coercion rule, it might be more cohesive to merge the
declarations. (Consider also e.g. the case where you want to define
two terms with the same IRI but different coercion, such as
creatorName for dc:creator as string and creator for dc:creator as
@iri.)

However, if contexts were to use this richer term definition object,
are we on a path to defining a schema language? Are we opening the
door for more complexities? In support of the change, while not yet
addressed, there are also one or two things asked for which do not fit
squarely into coerce; like support for inverse terms and declaring if
there will be a single value or a set (as a JSON list). For those
features, this syntax may be more convenient (albeit not a necessity).

In any case I still believe that the extent to which contexts resemble
schemas must be limited to the minimum needed to map JSON syntax to
RDF abstract syntax. From there on RDFS and OWL can give thorough
descriptions of properties and classes. (That means JSON-LD contexts
should reasonably not support e.g. the cardinality and syntactic
constraints of JSON-schema, nor any of the advanced concepts of
schemas, like the logical descriptive features of OWL.)

(To get a feel for these forms, I've put a gist with variations of the
context for the project I work with at:
<https://gist.github.com/1326420>. (The most glaring thing to me there
is the repetition of vocabulary bases (in both forms). But that's
another issue (possibly solved using CURIEs).))

Thoughts?

Best regards,
Niklas

Received on Sunday, 30 October 2011 21:53:18 UTC