Re: API design principles - HTMLXML literals from John Walker on 2015-10-12 (public-hydra@w3.org from October 2015)

From: John Walker <john.walker@semaku.com>
Date: Mon, 12 Oct 2015 10:56:10 +0200 (CEST)
To: Robert Sanderson <azaroth42@gmail.com>
Cc: public-hydra@w3.org, Dave Longley <dlongley@digitalbazaar.com>, Linked JSON <public-linked-json@w3.org>
Message-ID: <654813132.4627.1444640170769.JavaMail.open-xchange@oxweb01.eigbox.net>
Hi Rob,

> On October 11, 2015 at 11:28 AM Robert Sanderson <azaroth42@gmail.com> wrote:
> 
> 
>  The approach that we have taken in the Web Annotation Working Group [1] (and
> elsewhere) is to have an embedded resource with value, language and format
> properties:
>   
>  {
>    "@type": "EmbeddedContent",
>    "value": "<span>This is some <b>marked up</b> content.</span>",
>    "language": "en",
>    "format": "text/html"
>  }
>   
>  As RDF 1.1 does not allow both language and format to be associated with a
> literal value, this is the best that we could do.
>   
>  Hope that helps,
> 

Thanks for the input.
Very relevant as we also need to deal with multilingual content.
Did you consider to put the lang="en" attribute in the HTML?
If so, what was the reason to go for chosen approach?

Brings up some interesting questions about if we might look at language-based
content negotiation.
Would be nice in theory, but not sure how widely this is supported.
Also considering the translation processes, the different languages could well
be based on different
versions of the primary content, how to deal with this in a clean manner?


>   
>  Rob 
>   
>  [1] http://www.w3.org/TR/annotation-model/#body-and-target-metadata
>   
> 
>  On Sat, Oct 10, 2015 at 10:24 AM, John Walker <john.walker@semaku.com
> <mailto:john.walker@semaku.com> > wrote:
>    > >    Hi Dave,
> > 
> >    > On October 9, 2015 at 4:49 PM Dave Longley <dlongley@digitalbazaar.com
> >    > <mailto:dlongley@digitalbazaar.com> > wrote:
> >    >
> >    >
> >    > On 10/09/2015 07:20 AM, John Walker wrote:
> >    > > On a forthcoming project we need to transfer information/data/content
> >    > > from one system to another. I have questions about the best way to
> >    > > deal with HTML/XML markup.
> >    > >
> >    > > My assumption is we will expose an API of some kind, most likely JSON
> >    > > (ideally JSON-LD). The question is if it is good/best practice to a.
> >    > > include HTML/XML markup in literal values, or b. refer out to a
> >    > > separate resource for these
> >    > >
> >    > > An example of first approach:
> >    > >
> >    > > { "@content": "http://schema.org/", "@id": "#id", "@type":
> >    > > "Product", "mpn": "ABC123", "name": "ACME thingamyjig",
> >    > > "description": "the ACME thingamyjig is our <b>new</b> wonderful
> >    > > product with some <sub>subscript</sub> stuff.<br/>A new line" }
> >    > >
> >    > > For me this is bad because the "description" is a string, but
> >    > > contains HTML markup <br/>. How is a client to know how to process
> >    > > this? Should the "<br/>" be displayed or rendered as a line break?
> >    > > What if the content contains < characters (common for technical
> >    > > products), should these be escaped as HTML entities &lt;? Of course
> >    > > one could add the datatype rdf:HTML for this literal to indicate it
> >    > > is HTML.
> >    > >
> >    > > In our case these literals could be quite large and contain extensive
> >    > > markup. Additionally, if we had these literals directly on the
> >    > > product entities, there would be significant repetition as many
> >    > > products have the same content (DRY).
> >    > >
> >    > > The second option would be to refer to some external resource.
> >    > >
> >    > >
> >    > > { "@content": "http://schema.org/", "@id": "#id", "@type":
> >    > > "Product", "mpn": "ABC123", "name": "ACME thingamyjig",
> >    > > "description": <content/4y7dh2> }
> >    > >
> >    > > This could support conneg allowing to serve multiple representations
> >    > > on a single URL (e.g. HTML, DITA and plain text). Would also reduce
> >    > > repetition and allow for client side caching of these resources. Also
> >    > > would potentially play nicely with things like HTML Imports [4] [5].
> >    >
> >    > IMO, neither option is the best approach, but the second is better.
> >    >
> >    > I think you have two better options:
> >    >
> >    > 1. Provide the description as a URL and let clients decide the
> >    > presentation they want through content negotiation when they request
> >    > it.
> >     
> >    Uhhh... isn't this the same as my 2nd option?
> > 
> >    >
> >    > 2. Provide the description as plain text and then include another
> >    > property that means "the value is a URL that specifies an HTML template
> >    > and/or `partial`" and then put a URL to your presentation-specific
> >    > display there.
> >     
> >    The content cannot be accurately represented as plain text, it requires
> > *some* kind
> >    of markup be it HTML, Markdown or whatever.
> >     
> >    > I suppose it doesn't have to be a template that requires some
> >    > processing
> >    > to get your description into it, but that could make it more reusable
> >    > and more cleanly separate presentation from data. You could also
> >    > specify
> >    > other information that the client could use to understand how to
> >    > consume
> >    > those templates. Anything else very strongly ties the data to a
> >    > specific
> >    > application and/or presentation of it.
> >     
> >    For the most part presentation is not the primary concern here, the
> > markup
> >    in the content conveys semantics/meaning, not style/presentation.
> > 
> >    > Due to the nature of HTML, how particular elements are rendered is
> >    > largely dependent on the context of the document and a number of other
> >    > independent style inputs. By directly embedding it in your data, your
> >    > data is no longer consumable by any client, but rather, a client must
> >    > adhere to your overall presentation style to do anything with the data.
> >    > Either don't make that part of your data linked or present the data in
> >    > a
> >    > way that other clients could at least reasonably do something with it.
> >     
> >    I would not consider this content as structured data and, as such, would
> > not
> >    try to model it in RDF. So we can either include it as a blob in the RDF,
> > or pull
> >    it out to a separate resource.
> > 
> >    > >
> >    > > IMHO from a principled/architectural perspective the second option is
> >    > > best. However we do not see this second option as a widely deployed
> >    > > pattern. Why is that?
> >    >
> >    > This is a hard problem. It bleeds into Web Components design and other
> >    > areas. I can imagine Web Components being described as Linked Data --
> >    > because it's easier to insert them into presentations and theme them,
> >    > etc. But you need a lot more information than an HTML snippet to
> >    > construct a proper Web Component that you could reasonably insert into
> >    > and make useful in a page.
> >    >
> >    > >
> >    > > To go to other extreme, why not inline images as data URIs in the
> >    > > RDF? Clearly this is possible, but quite uncommon.
> >    > >
> >    > > Clearly developers are comfy with the idea of images as resources,
> >    > > but not textual content. Is that a step too far, is the support
> >    > > lacking in programming languages/libraries?
> >    > >
> >    > > Thoughts/opinions welcome?
> >    > >
> >    > > John
> >    > >
> >    > > [4] http://www.w3.org/TR/html-imports/ [5]
> >    > > http://www.html5rocks.com/en/tutorials/webcomponents/imports/
> >    > >
> >    >
> >    >
> >    > --
> >    > Dave Longley
> >    > CTO
> >    > Digital Bazaar, Inc.
> >     
> >    John
> >  > 
> 
>   
>  --
>  Rob Sanderson
>  Information Standards Advocate
>  Digital Library Systems and Services
>  Stanford, CA 94305
> 

John
Received on Monday, 12 October 2015 08:56:54 UTC