W3C home > Mailing lists > Public > public-schemaorg@w3.org > June 2015

Re: HTML Entities and Escaping in JSON-LD Literals

From: Thad Guidry <thadguidry@gmail.com>
Date: Fri, 19 Jun 2015 09:07:09 -0500
Message-ID: <CAChbWaMgRnBOhua7xOtgja4xLAOxaow-E6cG8pLc7Sx5unnB-Q@mail.gmail.com>
To: "mfhepp@gmail.com" <mfhepp@gmail.com>
Cc: "schema.org Mailing List" <public-schemaorg@w3.org>, W3C Web Schemas Task Force <public-vocabs@w3.org>, Manu Sporny <msporny@digitalbazaar.com>
Per the JSON spec.... http://json.org/  (btw, at the bottom of that page is
a whole listing of useful libraries. :)

A *string* is a sequence of zero or more Unicode characters, wrapped in
double quotes, using backslash escapes. A character is represented as a
single character string. A string is very much like a C or Java string.

Inside the value...you cannot use " double-quotes or \ reverse solidus
without escaping them (since those characters are reserved)  Other than
that...you should be fine for HTML encoding values.


Thad
+ThadGuidry <https://www.google.com/+ThadGuidry>

On Fri, Jun 19, 2015 at 5:01 AM, mfhepp@gmail.com <mfhepp@gmail.com> wrote:

> Dear all:
>
> I think we need to clarify in the documentation of schema.org whether
> HTML entities and UTF numerical HTML encoding of an Unicode character in
> literals, namely text, should/can be kept as they are or need to be
> unescaped inside JSON-LD values. I assume the answer might be different for
>
> a) stand-alone JSON-LD documents and
> b) when JSON-LD is embedded inside HTML via <script> elements.
>
> In particular, I would like to know whether they must, should, and can be
> left in their HTML-encoded forms.
>
> Literals provided by backend databases will often be encoded for HTML
> environments and e.g. contain HTML entity encodings like &amp; for the
> ampersand character or UTF numerical HTML encoding of an Unicode character,
> like &#160; for a non-breaking space.
>
> Developers will often face the task of reusing a template variable that
> contains such escaped characters in JSON-LD code in <script> elements.
>
> The Google Structured Data Testing Tools seems pretty tolerant with this,
> but I would like to know the proper way of encoding text in JSON-LD values
>
> The only guidance I found online was the simple statement
>
>     "Depending on how the HTML document is served, certain strings may
> need to be escaped."
>
> in
>
>     http://www.w3.org/TR/json-ld/
>
> To make things more complicated, it seems that JSON-LD introduces novel
> escaping requirements for <, >, @ and ^:
>
>     http://json-ld.org/spec/ED/json-ld-syntax/20100529/#escape-character
>
> Does anybody know a definite reference for this?
>
> Best wishes
>
> Martin
>
> -----------------------------------
> martin hepp  http://www.heppnetz.de
> mhepp@computer.org          @mfhepp
>
>
>
>
>
>
>
>
>
Received on Friday, 19 June 2015 14:07:37 UTC

This archive was generated by hypermail 2.3.1 : Friday, 19 June 2015 14:07:37 UTC