API design principles - HTMLXML literals from John Walker on 2015-10-09 (public-hydra@w3.org from October 2015)

From: John Walker <john.walker@semaku.com>
Date: Fri, 9 Oct 2015 13:20:15 +0200 (CEST)
To: public-hydra@w3.org, public-linked-json@w3.org
Message-ID: <1963306494.227484.1444389615398.JavaMail.open-xchange@oxweb01.eigbox.net>

On a forthcoming project we need to transfer information/data/content from one
system to another.
I have questions about the best way to deal with HTML/XML markup.

My assumption is we will expose an API of some kind, most likely JSON (ideally
JSON-LD).
The question is if it is good/best practice to
a. include HTML/XML markup in literal values, or
b. refer out to a separate resource for these

An example of first approach:

{
  "@content": "http://schema.org/",
  "@id": "#id",
  "@type": "Product",
  "mpn": "ABC123",
  "name": "ACME thingamyjig",
  "description": "the ACME thingamyjig is our <b>new</b> wonderful product with
some <sub>subscript</sub> stuff.<br/>A new line"
}

For me this is bad because the "description" is a string, but contains HTML
markup <br/>.
How is a client to know how to process this?
Should the "<br/>" be displayed or rendered as a line break?
What if the content contains < characters (common for technical products),
should these be escaped as HTML entities &lt;?
Of course one could add the datatype rdf:HTML for this literal to indicate it is
HTML.

In our case these literals could be quite large and contain extensive markup.
Additionally, if we had these literals directly on the product entities, there
would be significant repetition as many products have the same content (DRY).

The second option would be to refer to some external resource.


{
  "@content": "http://schema.org/",
  "@id": "#id",
  "@type": "Product",
  "mpn": "ABC123",
  "name": "ACME thingamyjig",
  "description": <content/4y7dh2>
}

This could support conneg allowing to serve multiple representations on a single
URL (e.g. HTML, DITA and plain text).
Would also reduce repetition and allow for client side caching of these
resources.
Also would potentially play nicely with things like HTML Imports [4] [5].

IMHO from a principled/architectural perspective the second option is best.
However we do not see this second option as a widely deployed pattern.
Why is that?

To go to other extreme, why not inline images as data URIs in the RDF?
Clearly this is possible, but quite uncommon.

Clearly developers are comfy with the idea of images as resources, but not
textual content.
Is that a step too far, is the support lacking in programming
languages/libraries?

Thoughts/opinions welcome?

John

[4] http://www.w3.org/TR/html-imports/
[5] http://www.html5rocks.com/en/tutorials/webcomponents/imports/

Received on Friday, 9 October 2015 11:20:48 UTC