Re: API design principles - HTMLXML literals from John Walker on 2015-10-10 (public-linked-json@w3.org from October 2015)

From: John Walker <john.walker@semaku.com>
Date: Sat, 10 Oct 2015 19:24:33 +0200 (CEST)
To: public-hydra@w3.org, Dave Longley <dlongley@digitalbazaar.com>, public-linked-json@w3.org
Message-ID: <358758550.245430.1444497873051.JavaMail.open-xchange@oxweb01.eigbox.net>
Hi Dave,

> On October 9, 2015 at 4:49 PM Dave Longley <dlongley@digitalbazaar.com> wrote:
>
>
> On 10/09/2015 07:20 AM, John Walker wrote:
> > On a forthcoming project we need to transfer information/data/content
> > from one system to another. I have questions about the best way to
> > deal with HTML/XML markup.
> >
> > My assumption is we will expose an API of some kind, most likely JSON
> > (ideally JSON-LD). The question is if it is good/best practice to a.
> > include HTML/XML markup in literal values, or b. refer out to a
> > separate resource for these
> >
> > An example of first approach:
> >
> > { "@content": "http://schema.org/", "@id": "#id", "@type":
> > "Product", "mpn": "ABC123", "name": "ACME thingamyjig",
> > "description": "the ACME thingamyjig is our <b>new</b> wonderful
> > product with some <sub>subscript</sub> stuff.<br/>A new line" }
> >
> > For me this is bad because the "description" is a string, but
> > contains HTML markup <br/>. How is a client to know how to process
> > this? Should the "<br/>" be displayed or rendered as a line break?
> > What if the content contains < characters (common for technical
> > products), should these be escaped as HTML entities &lt;? Of course
> > one could add the datatype rdf:HTML for this literal to indicate it
> > is HTML.
> >
> > In our case these literals could be quite large and contain extensive
> > markup. Additionally, if we had these literals directly on the
> > product entities, there would be significant repetition as many
> > products have the same content (DRY).
> >
> > The second option would be to refer to some external resource.
> >
> >
> > { "@content": "http://schema.org/", "@id": "#id", "@type":
> > "Product", "mpn": "ABC123", "name": "ACME thingamyjig",
> > "description": <content/4y7dh2> }
> >
> > This could support conneg allowing to serve multiple representations
> > on a single URL (e.g. HTML, DITA and plain text). Would also reduce
> > repetition and allow for client side caching of these resources. Also
> > would potentially play nicely with things like HTML Imports [4] [5].
>
> IMO, neither option is the best approach, but the second is better.
>
> I think you have two better options:
>
> 1. Provide the description as a URL and let clients decide the
> presentation they want through content negotiation when they request it.
 
Uhhh... isn't this the same as my 2nd option?

>
> 2. Provide the description as plain text and then include another
> property that means "the value is a URL that specifies an HTML template
> and/or `partial`" and then put a URL to your presentation-specific
> display there.
 
The content cannot be accurately represented as plain text, it requires *some*
kind
of markup be it HTML, Markdown or whatever.
 
> I suppose it doesn't have to be a template that requires some processing
> to get your description into it, but that could make it more reusable
> and more cleanly separate presentation from data. You could also specify
> other information that the client could use to understand how to consume
> those templates. Anything else very strongly ties the data to a specific
> application and/or presentation of it.
 
For the most part presentation is not the primary concern here, the markup
in the content conveys semantics/meaning, not style/presentation.

> Due to the nature of HTML, how particular elements are rendered is
> largely dependent on the context of the document and a number of other
> independent style inputs. By directly embedding it in your data, your
> data is no longer consumable by any client, but rather, a client must
> adhere to your overall presentation style to do anything with the data.
> Either don't make that part of your data linked or present the data in a
> way that other clients could at least reasonably do something with it.
 
I would not consider this content as structured data and, as such, would not
try to model it in RDF. So we can either include it as a blob in the RDF, or
pull
it out to a separate resource.

> >
> > IMHO from a principled/architectural perspective the second option is
> > best. However we do not see this second option as a widely deployed
> > pattern. Why is that?
>
> This is a hard problem. It bleeds into Web Components design and other
> areas. I can imagine Web Components being described as Linked Data --
> because it's easier to insert them into presentations and theme them,
> etc. But you need a lot more information than an HTML snippet to
> construct a proper Web Component that you could reasonably insert into
> and make useful in a page.
>
> >
> > To go to other extreme, why not inline images as data URIs in the
> > RDF? Clearly this is possible, but quite uncommon.
> >
> > Clearly developers are comfy with the idea of images as resources,
> > but not textual content. Is that a step too far, is the support
> > lacking in programming languages/libraries?
> >
> > Thoughts/opinions welcome?
> >
> > John
> >
> > [4] http://www.w3.org/TR/html-imports/ [5]
> > http://www.html5rocks.com/en/tutorials/webcomponents/imports/
> >
>
>
> --
> Dave Longley
> CTO
> Digital Bazaar, Inc.
 
John
Received on Saturday, 10 October 2015 17:25:04 UTC