Re: API design principles - HTMLXML literals

Hi John,

see below.

Am 09.10.2015 um 13:20 schrieb John Walker:
> On a forthcoming project we need to transfer information/data/content from one
> system to another.
> I have questions about the best way to deal with HTML/XML markup.
>
> My assumption is we will expose an API of some kind, most likely JSON (ideally
> JSON-LD).
> The question is if it is good/best practice to
> a. include HTML/XML markup in literal values, or
> b. refer out to a separate resource for these
>
> An example of first approach:
>
> {
>    "@content": "http://schema.org/",
>    "@id": "#id",
>    "@type": "Product",
>    "mpn": "ABC123",
>    "name": "ACME thingamyjig",
>    "description": "the ACME thingamyjig is our <b>new</b> wonderful product with
> some <sub>subscript</sub> stuff.<br/>A new line"
> }
>
> For me this is bad because the "description" is a string, but contains HTML
> markup <br/>.
> How is a client to know how to process this?
> Should the "<br/>" be displayed or rendered as a line break?
> What if the content contains < characters (common for technical products),
> should these be escaped as HTML entities &lt;?
> Of course one could add the datatype rdf:HTML for this literal to indicate it is
> HTML.
>
> In our case these literals could be quite large and contain extensive markup.
> Additionally, if we had these literals directly on the product entities, there
> would be significant repetition as many products have the same content (DRY).
>
> The second option would be to refer to some external resource.
>
>
> {
>    "@content": "http://schema.org/",
>    "@id": "#id",
>    "@type": "Product",
>    "mpn": "ABC123",
>    "name": "ACME thingamyjig",
>    "description": <content/4y7dh2>
> }
>
> This could support conneg allowing to serve multiple representations on a single
> URL (e.g. HTML, DITA and plain text).
> Would also reduce repetition and allow for client side caching of these
> resources.
> Also would potentially play nicely with things like HTML Imports [4] [5].

We had exactly the same problem plus we wanted to support other textual 
formats like markdown.
We are using the second option you described with a JSON-LD based 
resource with the following context:

{
     schema    http://schema.org/
     rdf    http://www.w3.org/ns/org#
     rdfs    http://www.w3.org/2000/01/rdf-schema#
     type    dct:format
     fragment    schema:WebPageElement
}

and an actual fragment may look like this:

{
     type text/html
     fragment   <some IRI>
     ...
}


Also I don't agree with what Dave Longley wrote regarding this in his 
response.
HTML (or other formats like Markdown) can be consumed and processed as 
fragments
(as opposed to full documents). And there are good reasons in doing so,
mostly because it's a DSL for text with embedded formatting.
The fact that the embedding context might interfere
is just a technical aspect but should not prevent this.
That's why there are ways of overcome this like the HTML shadow DOM.

>
> IMHO from a principled/architectural perspective the second option is best.
> However we do not see this second option as a widely deployed pattern.
> Why is that?
I think the only reason is because most Web pages are still rendered on 
the server side
so they don't need to serve textual fragments via a "Web-Interface".
>
> To go to other extreme, why not inline images as data URIs in the RDF?
> Clearly this is possible, but quite uncommon.
People do this a lot with HTML/CSS to squeeze out the last drop of 
performance
and certainly not for the sake of a clean architecture/ design.
So if performance is of uttermost importance, I would even do this with 
RDF based data.
>
> Clearly developers are comfy with the idea of images as resources, but not
> textual content.
> Is that a step too far, is the support lacking in programming
> languages/libraries?
>
> Thoughts/opinions welcome?
>
> John
>
> [4] http://www.w3.org/TR/html-imports/
> [5] http://www.html5rocks.com/en/tutorials/webcomponents/imports/
>

Received on Friday, 9 October 2015 19:34:43 UTC