Re: API design principles - HTMLXML literals from Karol Szczepański on 2015-10-09 (public-hydra@w3.org from October 2015)

From: Karol Szczepański <karol.szczepanski@gmail.com>
Date: Fri, 9 Oct 2015 20:05:00 +0200
To: "John Walker" <john.walker@semaku.com>, <public-hydra@w3.org>, <public-linked-json@w3.org>, "Dave Longley" <dlongley@digitalbazaar.com>
Message-ID: <4001C2B83BF54ED8847FC5AF032A328B@Alien>
There is also a possibility of using a typed literal. RDF 1.1 spec 
(http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/) mentions a rdf:HTML 
(or rdf:XMLLiteral) datatype which can be applied to literals and it denotes 
that the literal contains a HTML markup. This is not as clean as a separate 
content-negotiated referenced resource, but it's an option. Still it doesn't 
solve issues like "I'd like also to have a BB or markdown description"

Best

Karol

-----Oryginalna wiadomość----- 
From: Dave Longley
Sent: Friday, October 9, 2015 4:49 PM
To: John Walker ; public-hydra@w3.org ; public-linked-json@w3.org
Subject: Re: API design principles - HTMLXML literals

On 10/09/2015 07:20 AM, John Walker wrote:
> On a forthcoming project we need to transfer information/data/content
> from one system to another. I have questions about the best way to
> deal with HTML/XML markup.
>
> My assumption is we will expose an API of some kind, most likely JSON
> (ideally JSON-LD). The question is if it is good/best practice to a.
> include HTML/XML markup in literal values, or b. refer out to a
> separate resource for these
>
> An example of first approach:
>
> { "@content": "http://schema.org/", "@id": "#id", "@type":
> "Product", "mpn": "ABC123", "name": "ACME thingamyjig",
> "description": "the ACME thingamyjig is our <b>new</b> wonderful
> product with some <sub>subscript</sub> stuff.<br/>A new line" }
>
> For me this is bad because the "description" is a string, but
> contains HTML markup <br/>. How is a client to know how to process
> this? Should the "<br/>" be displayed or rendered as a line break?
> What if the content contains < characters (common for technical
> products), should these be escaped as HTML entities &lt;? Of course
> one could add the datatype rdf:HTML for this literal to indicate it
> is HTML.
>
> In our case these literals could be quite large and contain extensive
> markup. Additionally, if we had these literals directly on the
> product entities, there would be significant repetition as many
> products have the same content (DRY).
>
> The second option would be to refer to some external resource.
>
>
> { "@content": "http://schema.org/", "@id": "#id", "@type":
> "Product", "mpn": "ABC123", "name": "ACME thingamyjig",
> "description": <content/4y7dh2> }
>
> This could support conneg allowing to serve multiple representations
> on a single URL (e.g. HTML, DITA and plain text). Would also reduce
> repetition and allow for client side caching of these resources. Also
> would potentially play nicely with things like HTML Imports [4] [5].

IMO, neither option is the best approach, but the second is better.

I think you have two better options:

1. Provide the description as a URL and let clients decide the
presentation they want through content negotiation when they request it.

2. Provide the description as plain text and then include another
property that means "the value is a URL that specifies an HTML template
and/or `partial`" and then put a URL to your presentation-specific
display there.

I suppose it doesn't have to be a template that requires some processing
to get your description into it, but that could make it more reusable
and more cleanly separate presentation from data. You could also specify
other information that the client could use to understand how to consume
those templates. Anything else very strongly ties the data to a specific
application and/or presentation of it.

Due to the nature of HTML, how particular elements are rendered is
largely dependent on the context of the document and a number of other
independent style inputs. By directly embedding it in your data, your
data is no longer consumable by any client, but rather, a client must
adhere to your overall presentation style to do anything with the data.
Either don't make that part of your data linked or present the data in a
way that other clients could at least reasonably do something with it.

>
> IMHO from a principled/architectural perspective the second option is
> best. However we do not see this second option as a widely deployed
> pattern. Why is that?

This is a hard problem. It bleeds into Web Components design and other
areas. I can imagine Web Components being described as Linked Data --
because it's easier to insert them into presentations and theme them,
etc. But you need a lot more information than an HTML snippet to
construct a proper Web Component that you could reasonably insert into
and make useful in a page.

>
> To go to other extreme, why not inline images as data URIs in the
> RDF? Clearly this is possible, but quite uncommon.
>
> Clearly developers are comfy with the idea of images as resources,
> but not textual content. Is that a step too far, is the support
> lacking in programming languages/libraries?
>
> Thoughts/opinions welcome?
>
> John
>
> [4] http://www.w3.org/TR/html-imports/ [5]
> http://www.html5rocks.com/en/tutorials/webcomponents/imports/
>


-- 
Dave Longley
CTO
Digital Bazaar, Inc.
Received on Friday, 9 October 2015 18:05:04 UTC