Re: What problem is this task force trying to solve and why?

One additional point about <foreignContent> - this opens up the option of
embedding JSON content, YAML or other similar content into the HTML without
this information rendering the content. It provides a way of embedding
metadata such as RDF. It could even be used for embedding binhex or similar
content. It's semantically neutral - the browser doesn't need to supply any
processing to it. It's an escape hatch for HTML, something that provides a
way to extend the language if necessary but to do so in a consistent manner.

Kurt Cagle
XML Architect
*Lockheed / US National Archives ERA Project*

On Tue, Jan 4, 2011 at 4:00 PM, Kurt Cagle <> wrote:

> One other possibility that comes to mind is simply to create a
> <foreignContent> element in HTML5. SVG has a similar element (usually for
> holding HTML, oddly enough). This would simply tell the processor to not
> display the content in question, not to parse it, not to do anything with
> it. From the standpoint of HTML5, it's non-displayed text. It would be the
> responsibility of the web developer to parse this content into something
> meaningful, and if it breaks, then it breaks.
> Yes, it's a data island. If the HTML5 working group feels so strongly about
> the purity of the language, a data island is the minimal subset necessary to
> ensure some form of extension. Throwing XML content into a <script block as
> "application/xml" id="foo"> works better, because it performs parsing of
> corresponding documents, but either way, embedding XML in HTML is not a hard
> problem. The only hard problem is getting past this phobia about XML content
> ending up in HTML.
> Kurt Cagle
> XML Architect
> *Lockheed / US National Archives ERA Project*
> On Tue, Jan 4, 2011 at 2:51 PM, John Cowan <> wrote:
>> Henri Sivonen scripsit:
>> > On Dec 20, 2010, at 17:50, David Carlisle wrote:
>> > It sure has. Hixie ran an analysis over a substantial quantity of
>> > Web pages in Google's index and found existing text/html content that
>> > contained an <svg> tag or a <math> tag. The justification is making
>> > the algorithm not break a substantial quantity of pages like that.
>> A number would be nice.  One person's "substantial" is another person's
>> "trivial", unfortunately.
>> > Web authors do all sorts of crazily bizarre things. It's really not
>> > useful to try to apply logic to try to reason what kind of existing
>> > content there should be.
>> Amen.
>> > > Editing tools also use nsgmls (perhaps just in the background)
>> > > It isn't really true to say it is "just the w3c validator".
>> >
>> > Which tools? Is the plural really justified or is this about one
>> > Emacs mode?
>> You are confusing nsgmls itself with the Emacs mode (which employs
>> nsgmls).  Nsgmls is a stand-alone SGML validator that outputs an
>> ESIS equivalent to the document being validated.  ESIS is a textual
>> representation of SAX-style events, one line per event.  It's the core
>> of any reasonably modern SGML system.
>> > More precisely, my (I'm hesitant to claim this as a general HTML5
>> > world view) world view says that using vocabularies that the receiving
>> > software doesn't understand is a worse way of communicating than using
>> > vocabularies that the receiving software understands. (And if you
>> > ship a JavaScript or XSLT program to the recipient, the interface
>> > of communication to consider isn't the input to your program but
>> > its output. For example, if you serve FooML with <?xml-stylesheet?>
>> > that transforms it to HTML, you are effectively communicating with
>> > the recipient in HTML--not in FooML.)
>> This argument strikes me as a defense of putting arbitrary XML on the
>> wire, since it is not (in your sense of the term) the interface of
>> communication.
>> Once that is accepted, it seems plausible to allow mixtures of HTML and
>> arbitrary XML as well.
>> > This is a bit of a dirty open secret of HTML5. We pretend in rhetoric
>> > that #1 is true, but in practice, if you consider elements introduced
>> > by HTML5 and how they behaved in pre-HTML5 browsers, #2 is true.
>> Thanks for the explanation.  I will henceforth disregard claims of #1.
>> > More to the point, DocBook is not XHTML+MathML in we consider that
>> > to mean "XHTML and MathML and nothing more". If you aren't allowed to
>> > dump DocBook content as a child of an HTML element, it doesn't really
>> > make sense to enable dumping it inside annotation-xml.
>> However, if the day came in which DocBook was an equal-partner vocabulary
>> (unlikely as that may seem, stranger things have already happened),
>> we would have to add yet another hack to make it work inside MathML.
>> It is one thing to say it's not valid HTML to incorporate a foreign
>> vocabulary inside MathML-in-HTML annotations.  It's another thing to
>> ensure that such vocabularies are already broken.
>> > There are security incentives that work against starting to repair
>> > broken JavaScript where "broken" is what's broken per ES3. However, I
>> > wouldn't be at all surprised if we ended up in a situation where every
>> > vendor has an incentive not to enforce the ES5 Strict Mode in order to
>> > "work" with more Web content than a competing product that halts on
>> > Strict Mode violations and the ES5 Strict Mode effort collapsed.
>> Strict mode is a programmer choice, not an implementer choice.  The code
>> has to contain a "use strict" directive.
>> --
>> John Cowan
>> Original line from The Warrior's Apprentice by Lois McMaster Bujold:
>> "Only on Barrayar would pulling a loaded needler start a stampede toward
>> one."
>> English-to-Russian-to-English mangling thereof: "Only on Barrayar you risk
>> to
>> lose support instead of finding it when you threat with the charged
>> weapon."

Received on Tuesday, 4 January 2011 21:16:02 UTC