Re: API design principles - HTMLXML literals

The approach that we have taken in the Web Annotation Working Group [1]
(and elsewhere) is to have an embedded resource with value, language and
format properties:

{
  "@type": "EmbeddedContent",
  "value": "<span>This is some <b>marked up</b> content.</span>",
  "language": "en",
  "format": "text/html"
}

As RDF 1.1 does not allow both language and format to be associated with a
literal value, this is the best that we could do.

Hope that helps,

Rob

[1] http://www.w3.org/TR/annotation-model/#body-and-target-metadata


On Sat, Oct 10, 2015 at 10:24 AM, John Walker <john.walker@semaku.com>
wrote:

> Hi Dave,
>
> > On October 9, 2015 at 4:49 PM Dave Longley <dlongley@digitalbazaar.com>
> wrote:
> >
> >
> > On 10/09/2015 07:20 AM, John Walker wrote:
> > > On a forthcoming project we need to transfer information/data/content
> > > from one system to another. I have questions about the best way to
> > > deal with HTML/XML markup.
> > >
> > > My assumption is we will expose an API of some kind, most likely JSON
> > > (ideally JSON-LD). The question is if it is good/best practice to a.
> > > include HTML/XML markup in literal values, or b. refer out to a
> > > separate resource for these
> > >
> > > An example of first approach:
> > >
> > > { "@content": "http://schema.org/", "@id": "#id", "@type":
> > > "Product", "mpn": "ABC123", "name": "ACME thingamyjig",
> > > "description": "the ACME thingamyjig is our <b>new</b> wonderful
> > > product with some <sub>subscript</sub> stuff.<br/>A new line" }
> > >
> > > For me this is bad because the "description" is a string, but
> > > contains HTML markup <br/>. How is a client to know how to process
> > > this? Should the "<br/>" be displayed or rendered as a line break?
> > > What if the content contains < characters (common for technical
> > > products), should these be escaped as HTML entities &lt;? Of course
> > > one could add the datatype rdf:HTML for this literal to indicate it
> > > is HTML.
> > >
> > > In our case these literals could be quite large and contain extensive
> > > markup. Additionally, if we had these literals directly on the
> > > product entities, there would be significant repetition as many
> > > products have the same content (DRY).
> > >
> > > The second option would be to refer to some external resource.
> > >
> > >
> > > { "@content": "http://schema.org/", "@id": "#id", "@type":
> > > "Product", "mpn": "ABC123", "name": "ACME thingamyjig",
> > > "description": <content/4y7dh2> }
> > >
> > > This could support conneg allowing to serve multiple representations
> > > on a single URL (e.g. HTML, DITA and plain text). Would also reduce
> > > repetition and allow for client side caching of these resources. Also
> > > would potentially play nicely with things like HTML Imports [4] [5].
> >
> > IMO, neither option is the best approach, but the second is better.
> >
> > I think you have two better options:
> >
> > 1. Provide the description as a URL and let clients decide the
> > presentation they want through content negotiation when they request it.
>
> Uhhh... isn't this the same as my 2nd option?
>
> >
> > 2. Provide the description as plain text and then include another
> > property that means "the value is a URL that specifies an HTML template
> > and/or `partial`" and then put a URL to your presentation-specific
> > display there.
>
> The content cannot be accurately represented as plain text, it requires
> *some* kind
> of markup be it HTML, Markdown or whatever.
>
> > I suppose it doesn't have to be a template that requires some processing
> > to get your description into it, but that could make it more reusable
> > and more cleanly separate presentation from data. You could also specify
> > other information that the client could use to understand how to consume
> > those templates. Anything else very strongly ties the data to a specific
> > application and/or presentation of it.
>
> For the most part presentation is not the primary concern here, the markup
> in the content conveys semantics/meaning, not style/presentation.
>
> > Due to the nature of HTML, how particular elements are rendered is
> > largely dependent on the context of the document and a number of other
> > independent style inputs. By directly embedding it in your data, your
> > data is no longer consumable by any client, but rather, a client must
> > adhere to your overall presentation style to do anything with the data.
> > Either don't make that part of your data linked or present the data in a
> > way that other clients could at least reasonably do something with it.
>
> I would not consider this content as structured data and, as such, would
> not
> try to model it in RDF. So we can either include it as a blob in the RDF,
> or pull
> it out to a separate resource.
>
> > >
> > > IMHO from a principled/architectural perspective the second option is
> > > best. However we do not see this second option as a widely deployed
> > > pattern. Why is that?
> >
> > This is a hard problem. It bleeds into Web Components design and other
> > areas. I can imagine Web Components being described as Linked Data --
> > because it's easier to insert them into presentations and theme them,
> > etc. But you need a lot more information than an HTML snippet to
> > construct a proper Web Component that you could reasonably insert into
> > and make useful in a page.
> >
> > >
> > > To go to other extreme, why not inline images as data URIs in the
> > > RDF? Clearly this is possible, but quite uncommon.
> > >
> > > Clearly developers are comfy with the idea of images as resources,
> > > but not textual content. Is that a step too far, is the support
> > > lacking in programming languages/libraries?
> > >
> > > Thoughts/opinions welcome?
> > >
> > > John
> > >
> > > [4] http://www.w3.org/TR/html-imports/ [5]
> > > http://www.html5rocks.com/en/tutorials/webcomponents/imports/
> > >
> >
> >
> > --
> > Dave Longley
> > CTO
> > Digital Bazaar, Inc.
>
> John
>



-- 
Rob Sanderson
Information Standards Advocate
Digital Library Systems and Services
Stanford, CA 94305

Received on Sunday, 11 October 2015 09:28:30 UTC