- From: Robert Sanderson <azaroth42@gmail.com>
- Date: Sun, 11 Oct 2015 02:28:01 -0700
- To: John Walker <john.walker@semaku.com>
- Cc: public-hydra@w3.org, Dave Longley <dlongley@digitalbazaar.com>, Linked JSON <public-linked-json@w3.org>
- Message-ID: <CABevsUHfpWCibT360Y_p9zMorcKTQmsbRe=pt=Q7Ncd4LRcqdQ@mail.gmail.com>
The approach that we have taken in the Web Annotation Working Group [1] (and elsewhere) is to have an embedded resource with value, language and format properties: { "@type": "EmbeddedContent", "value": "<span>This is some <b>marked up</b> content.</span>", "language": "en", "format": "text/html" } As RDF 1.1 does not allow both language and format to be associated with a literal value, this is the best that we could do. Hope that helps, Rob [1] http://www.w3.org/TR/annotation-model/#body-and-target-metadata On Sat, Oct 10, 2015 at 10:24 AM, John Walker <john.walker@semaku.com> wrote: > Hi Dave, > > > On October 9, 2015 at 4:49 PM Dave Longley <dlongley@digitalbazaar.com> > wrote: > > > > > > On 10/09/2015 07:20 AM, John Walker wrote: > > > On a forthcoming project we need to transfer information/data/content > > > from one system to another. I have questions about the best way to > > > deal with HTML/XML markup. > > > > > > My assumption is we will expose an API of some kind, most likely JSON > > > (ideally JSON-LD). The question is if it is good/best practice to a. > > > include HTML/XML markup in literal values, or b. refer out to a > > > separate resource for these > > > > > > An example of first approach: > > > > > > { "@content": "http://schema.org/", "@id": "#id", "@type": > > > "Product", "mpn": "ABC123", "name": "ACME thingamyjig", > > > "description": "the ACME thingamyjig is our <b>new</b> wonderful > > > product with some <sub>subscript</sub> stuff.<br/>A new line" } > > > > > > For me this is bad because the "description" is a string, but > > > contains HTML markup <br/>. How is a client to know how to process > > > this? Should the "<br/>" be displayed or rendered as a line break? > > > What if the content contains < characters (common for technical > > > products), should these be escaped as HTML entities <? Of course > > > one could add the datatype rdf:HTML for this literal to indicate it > > > is HTML. > > > > > > In our case these literals could be quite large and contain extensive > > > markup. Additionally, if we had these literals directly on the > > > product entities, there would be significant repetition as many > > > products have the same content (DRY). > > > > > > The second option would be to refer to some external resource. > > > > > > > > > { "@content": "http://schema.org/", "@id": "#id", "@type": > > > "Product", "mpn": "ABC123", "name": "ACME thingamyjig", > > > "description": <content/4y7dh2> } > > > > > > This could support conneg allowing to serve multiple representations > > > on a single URL (e.g. HTML, DITA and plain text). Would also reduce > > > repetition and allow for client side caching of these resources. Also > > > would potentially play nicely with things like HTML Imports [4] [5]. > > > > IMO, neither option is the best approach, but the second is better. > > > > I think you have two better options: > > > > 1. Provide the description as a URL and let clients decide the > > presentation they want through content negotiation when they request it. > > Uhhh... isn't this the same as my 2nd option? > > > > > 2. Provide the description as plain text and then include another > > property that means "the value is a URL that specifies an HTML template > > and/or `partial`" and then put a URL to your presentation-specific > > display there. > > The content cannot be accurately represented as plain text, it requires > *some* kind > of markup be it HTML, Markdown or whatever. > > > I suppose it doesn't have to be a template that requires some processing > > to get your description into it, but that could make it more reusable > > and more cleanly separate presentation from data. You could also specify > > other information that the client could use to understand how to consume > > those templates. Anything else very strongly ties the data to a specific > > application and/or presentation of it. > > For the most part presentation is not the primary concern here, the markup > in the content conveys semantics/meaning, not style/presentation. > > > Due to the nature of HTML, how particular elements are rendered is > > largely dependent on the context of the document and a number of other > > independent style inputs. By directly embedding it in your data, your > > data is no longer consumable by any client, but rather, a client must > > adhere to your overall presentation style to do anything with the data. > > Either don't make that part of your data linked or present the data in a > > way that other clients could at least reasonably do something with it. > > I would not consider this content as structured data and, as such, would > not > try to model it in RDF. So we can either include it as a blob in the RDF, > or pull > it out to a separate resource. > > > > > > > IMHO from a principled/architectural perspective the second option is > > > best. However we do not see this second option as a widely deployed > > > pattern. Why is that? > > > > This is a hard problem. It bleeds into Web Components design and other > > areas. I can imagine Web Components being described as Linked Data -- > > because it's easier to insert them into presentations and theme them, > > etc. But you need a lot more information than an HTML snippet to > > construct a proper Web Component that you could reasonably insert into > > and make useful in a page. > > > > > > > > To go to other extreme, why not inline images as data URIs in the > > > RDF? Clearly this is possible, but quite uncommon. > > > > > > Clearly developers are comfy with the idea of images as resources, > > > but not textual content. Is that a step too far, is the support > > > lacking in programming languages/libraries? > > > > > > Thoughts/opinions welcome? > > > > > > John > > > > > > [4] http://www.w3.org/TR/html-imports/ [5] > > > http://www.html5rocks.com/en/tutorials/webcomponents/imports/ > > > > > > > > > -- > > Dave Longley > > CTO > > Digital Bazaar, Inc. > > John > -- Rob Sanderson Information Standards Advocate Digital Library Systems and Services Stanford, CA 94305
Received on Sunday, 11 October 2015 09:28:31 UTC