- From: John Walker <john.walker@semaku.com>
- Date: Sat, 10 Oct 2015 19:24:33 +0200 (CEST)
- To: public-hydra@w3.org, Dave Longley <dlongley@digitalbazaar.com>, public-linked-json@w3.org
- Message-ID: <358758550.245430.1444497873051.JavaMail.open-xchange@oxweb01.eigbox.net>
Hi Dave, > On October 9, 2015 at 4:49 PM Dave Longley <dlongley@digitalbazaar.com> wrote: > > > On 10/09/2015 07:20 AM, John Walker wrote: > > On a forthcoming project we need to transfer information/data/content > > from one system to another. I have questions about the best way to > > deal with HTML/XML markup. > > > > My assumption is we will expose an API of some kind, most likely JSON > > (ideally JSON-LD). The question is if it is good/best practice to a. > > include HTML/XML markup in literal values, or b. refer out to a > > separate resource for these > > > > An example of first approach: > > > > { "@content": "http://schema.org/", "@id": "#id", "@type": > > "Product", "mpn": "ABC123", "name": "ACME thingamyjig", > > "description": "the ACME thingamyjig is our <b>new</b> wonderful > > product with some <sub>subscript</sub> stuff.<br/>A new line" } > > > > For me this is bad because the "description" is a string, but > > contains HTML markup <br/>. How is a client to know how to process > > this? Should the "<br/>" be displayed or rendered as a line break? > > What if the content contains < characters (common for technical > > products), should these be escaped as HTML entities <? Of course > > one could add the datatype rdf:HTML for this literal to indicate it > > is HTML. > > > > In our case these literals could be quite large and contain extensive > > markup. Additionally, if we had these literals directly on the > > product entities, there would be significant repetition as many > > products have the same content (DRY). > > > > The second option would be to refer to some external resource. > > > > > > { "@content": "http://schema.org/", "@id": "#id", "@type": > > "Product", "mpn": "ABC123", "name": "ACME thingamyjig", > > "description": <content/4y7dh2> } > > > > This could support conneg allowing to serve multiple representations > > on a single URL (e.g. HTML, DITA and plain text). Would also reduce > > repetition and allow for client side caching of these resources. Also > > would potentially play nicely with things like HTML Imports [4] [5]. > > IMO, neither option is the best approach, but the second is better. > > I think you have two better options: > > 1. Provide the description as a URL and let clients decide the > presentation they want through content negotiation when they request it. Uhhh... isn't this the same as my 2nd option? > > 2. Provide the description as plain text and then include another > property that means "the value is a URL that specifies an HTML template > and/or `partial`" and then put a URL to your presentation-specific > display there. The content cannot be accurately represented as plain text, it requires *some* kind of markup be it HTML, Markdown or whatever. > I suppose it doesn't have to be a template that requires some processing > to get your description into it, but that could make it more reusable > and more cleanly separate presentation from data. You could also specify > other information that the client could use to understand how to consume > those templates. Anything else very strongly ties the data to a specific > application and/or presentation of it. For the most part presentation is not the primary concern here, the markup in the content conveys semantics/meaning, not style/presentation. > Due to the nature of HTML, how particular elements are rendered is > largely dependent on the context of the document and a number of other > independent style inputs. By directly embedding it in your data, your > data is no longer consumable by any client, but rather, a client must > adhere to your overall presentation style to do anything with the data. > Either don't make that part of your data linked or present the data in a > way that other clients could at least reasonably do something with it. I would not consider this content as structured data and, as such, would not try to model it in RDF. So we can either include it as a blob in the RDF, or pull it out to a separate resource. > > > > IMHO from a principled/architectural perspective the second option is > > best. However we do not see this second option as a widely deployed > > pattern. Why is that? > > This is a hard problem. It bleeds into Web Components design and other > areas. I can imagine Web Components being described as Linked Data -- > because it's easier to insert them into presentations and theme them, > etc. But you need a lot more information than an HTML snippet to > construct a proper Web Component that you could reasonably insert into > and make useful in a page. > > > > > To go to other extreme, why not inline images as data URIs in the > > RDF? Clearly this is possible, but quite uncommon. > > > > Clearly developers are comfy with the idea of images as resources, > > but not textual content. Is that a step too far, is the support > > lacking in programming languages/libraries? > > > > Thoughts/opinions welcome? > > > > John > > > > [4] http://www.w3.org/TR/html-imports/ [5] > > http://www.html5rocks.com/en/tutorials/webcomponents/imports/ > > > > > -- > Dave Longley > CTO > Digital Bazaar, Inc. John
Received on Saturday, 10 October 2015 17:25:05 UTC