Compact forms of language literals

Hi all!

The majority of scenarios where I use RDF contain lots of
language-tagged literals. Although there is support for this using the
"@language + "@literal" object construct, I find that this is often
cumbersome to use in practice. Since I believe that language-literals
are very common in general, I'd like to suggest some options.


1. Add JSON-LD support for declaring a document-wide language
directive (within the context or at the top level, i.e. where
"@context" and the optional "@base" appear). This is for scenarios
where there is one dominant language for a given resource.

This would enable us to use @language like this:

  {
    "@context": ...,
    "@language": "en",
    "@subject": "http://example.org/",
    "title": "The Example"
  }

yielding:

  <http://example.org/> :title "The Example"@en .

Multiple values would work as well, so:

  {
    ...,
    "keyword": "Example", "Draft"
  }

would yield:

  <http://example.org/> :keyword "The Example"@en, "Draft"@en .


To make this play well with any plain literals (i.e. xsd:string
literals in RDF 1.1), there are two alternatives:

A) If "@language" is given at the top level, every plain string value
in the JSON would become language-tagged unless it's coerced (i.e. the
property used for a string is defined in a "@coerce" directive). Plain
literals in the RDF 1.0 sense could be coerced using an empty key (or
perhaps "@plain") in coerce if necessary.

Of course any literal given in "expanded form" would, just like in
RDFa, not be parsed using the top level language. I.e. it would
naturally not have any effect on literals with an explicit "@datatype"
(or empty "@language").

B) If the majority of values in a document are expected to be
non-datatyped (and non-coerced) plain literals, it would be more
reasonable to provide a special coercion token, say "@langliteral" (or
perhaps "@localized"), which cause values for terms declared with this
coercion to be tagged with the top-level language. In the example
above, that'd mean adding:

  "@coerce": {
    "@langliteral" ["title", "keyword"]
  }

to the context.


2. There could also be a mechanism to make data where there are
literals in several languages easier to use. We could define a special
coercion called "@langmap", which expects values to be objects with
the languages as keys. Thus this:

  {
    "@context: {
      ...,
      "@coerce: {
        "@langmap": [title]
      },
    },
    "@subject": "http://example.org/",
    "title": {"en": "The Example", "sv": "Exemplet"}
  }

would yield:

  <http://example.org/> :title "The Example"@en, "Exemplet"@sv .

This mechanism would of course also work when there is only one
language for the value of a property.

Multiple values per language would look like:

  {
    ...,
    "keyword": {"en": ["Example", "Draft"], "sv": ["Exemplet", "Utkast"]}
  }

yielding:

  <http://example.org/> :keyword "The Example"@en, "Draft"@en,
"Exemplet"@sv, "Utkast"@sv .


(Of course I think that the current expanded form is necessary at
times as well. But the suggestions I present here are to facilitate a
compact form, similar to the existing "@coerce" feature; both to be
used for simplifying data into a more JSON-"native" form. I have come
across needs for both of these forms when exposing data in web
services aimed for non-RDF-savvy developers (and e.g. when creating
JSON from RDF to index in ElasticSearch).)

If I had to pick one, I'd say #2 is more versatile. But I've found #1
very desirable to get the simplest kind of JSON out (in web services).
In any case, this issue is important for me.

Best regards,
Niklas

Received on Wednesday, 7 September 2011 21:43:37 UTC