W3C home > Mailing lists > Public > public-linked-json@w3.org > October 2015

Re: API design principles - HTMLXML literals

From: Dave Longley <dlongley@digitalbazaar.com>
Date: Fri, 9 Oct 2015 10:49:56 -0400
To: John Walker <john.walker@semaku.com>, public-hydra@w3.org, public-linked-json@w3.org
Message-ID: <5617D414.8030507@digitalbazaar.com>
On 10/09/2015 07:20 AM, John Walker wrote:
> On a forthcoming project we need to transfer information/data/content
> from one system to another. I have questions about the best way to
> deal with HTML/XML markup.
> My assumption is we will expose an API of some kind, most likely JSON
> (ideally JSON-LD). The question is if it is good/best practice to a.
> include HTML/XML markup in literal values, or b. refer out to a
> separate resource for these
> An example of first approach:
> { "@content": "http://schema.org/", "@id": "#id", "@type":
> "Product", "mpn": "ABC123", "name": "ACME thingamyjig", 
> "description": "the ACME thingamyjig is our <b>new</b> wonderful
> product with some <sub>subscript</sub> stuff.<br/>A new line" }
> For me this is bad because the "description" is a string, but
> contains HTML markup <br/>. How is a client to know how to process
> this? Should the "<br/>" be displayed or rendered as a line break? 
> What if the content contains < characters (common for technical
> products), should these be escaped as HTML entities &lt;? Of course
> one could add the datatype rdf:HTML for this literal to indicate it
> is HTML.
> In our case these literals could be quite large and contain extensive
> markup. Additionally, if we had these literals directly on the
> product entities, there would be significant repetition as many
> products have the same content (DRY).
> The second option would be to refer to some external resource.
> { "@content": "http://schema.org/", "@id": "#id", "@type":
> "Product", "mpn": "ABC123", "name": "ACME thingamyjig", 
> "description": <content/4y7dh2> }
> This could support conneg allowing to serve multiple representations
> on a single URL (e.g. HTML, DITA and plain text). Would also reduce
> repetition and allow for client side caching of these resources. Also
> would potentially play nicely with things like HTML Imports [4] [5].

IMO, neither option is the best approach, but the second is better.

I think you have two better options:

1. Provide the description as a URL and let clients decide the
presentation they want through content negotiation when they request it.

2. Provide the description as plain text and then include another
property that means "the value is a URL that specifies an HTML template
and/or `partial`" and then put a URL to your presentation-specific
display there.

I suppose it doesn't have to be a template that requires some processing
to get your description into it, but that could make it more reusable
and more cleanly separate presentation from data. You could also specify
other information that the client could use to understand how to consume
those templates. Anything else very strongly ties the data to a specific
application and/or presentation of it.

Due to the nature of HTML, how particular elements are rendered is
largely dependent on the context of the document and a number of other
independent style inputs. By directly embedding it in your data, your
data is no longer consumable by any client, but rather, a client must
adhere to your overall presentation style to do anything with the data.
Either don't make that part of your data linked or present the data in a
way that other clients could at least reasonably do something with it.

> IMHO from a principled/architectural perspective the second option is
> best. However we do not see this second option as a widely deployed
> pattern. Why is that?

This is a hard problem. It bleeds into Web Components design and other
areas. I can imagine Web Components being described as Linked Data --
because it's easier to insert them into presentations and theme them,
etc. But you need a lot more information than an HTML snippet to
construct a proper Web Component that you could reasonably insert into
and make useful in a page.

> To go to other extreme, why not inline images as data URIs in the
> RDF? Clearly this is possible, but quite uncommon.
> Clearly developers are comfy with the idea of images as resources,
> but not textual content. Is that a step too far, is the support
> lacking in programming languages/libraries?
> Thoughts/opinions welcome?
> John
> [4] http://www.w3.org/TR/html-imports/ [5]
> http://www.html5rocks.com/en/tutorials/webcomponents/imports/

Dave Longley
Digital Bazaar, Inc.
Received on Friday, 9 October 2015 14:50:28 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:18:46 UTC