Re: Use cases from Benjamin Hawkes-Lewis on 2011-01-03 (public-html-xml@w3.org from January 2011)

From: Benjamin Hawkes-Lewis <bhawkeslewis@googlemail.com>
Date: Mon, 3 Jan 2011 02:14:42 +0000
To: John Cowan <cowan@mercury.ccil.org>
Cc: Julian Reschke <julian.reschke@gmx.de>, Norman Walsh <ndw@nwalsh.com>, public-html-xml@w3.org
Message-ID: <AANLkTimHf-Fpxc2pmwtY0qGnn=fOUPe7Bs7iqZkbOSGE@mail.gmail.com>
On Sun, Jan 2, 2011 at 8:07 PM, John Cowan <cowan@mercury.ccil.org> wrote:
> Benjamin Hawkes-Lewis scripsit:
>
>> But the implementors of the client software of the world wide web
>
> If we are going to be accurate, let us say "a few of the implementors of
> a small fraction of the client software of the WWW".

Measured as bits of independent software, sure.

But:

1. All implementors are welcome to participate.

2. The implementors involved cover a broad range of categories
(including web servers, authoring tools, search engines, screen readers,
email clients, conformance tools, desktop browsers, mobile browsers,
speaking browsers).

2. The implementors involved produce the software employed by most
end-users and targeted by most web authors.

I think that's as good a safeguard of the uniform interface as we're
ever going to get.

>> >> Also, in controlled environments you can just use other media types
>> >> including all the text/html vocabularies if you want arbitrary
>> >> XML vocabularies, so this *cannot* be a use case for adding such
>> >> functionality to text/html.
>> >
>> > Just because there's more than one way doesn't mean the other way
>> > "can't" be used.
>>
>> No need has been demonstrated.

[snip]

> So far, HTML has had two vocabularies of the many possible ones that may
> be useful incorporated into it by ad hoc means (not the root elements,
> which can be made systematic, but the magic parsing properties whereby
> apparent HTML elements within the SVG or MathML islands are surfaced).
> Without a general means for incorporating new vocabularies, evolution
> to include them will be at best slow and difficult, at worst no longer
> possible because of unbreakable backward-compatibility constraints.

I object to the implicit insertion of XML into the phrase "means for
incorporating new vocabularies". ;)

We need to distinguish the following capabilities:

1. The ability for W3C to expand the text/html vocabulary in the future.
The processing model for unknown elements allows for this already.

2. The ability for third parties to experiment with expanding the
text/html vocabulary, with a view to future standardization.
Vendor-specific attributes allow for this already.

3. The ability for third parties to expand the text/html vocabulary and
claim to be serving conforming text/html. The clause allowing other
applicable specifications to define conforming text/html allows for this
already.

XML is just one potential source of vocabularies; other sources include
RDF, microformats, and formats from beyond the world of the W3C like
abc, vCard, and SRT. It's not obvious to me that we need to optimize for
the XML case.

Selectively HTMLizing external vocabularies is arguably a better option
than importing them entire, as it produces a language with less
duplication and inconsistency.

IIRC the main reason militating against this in the case of SVG and
MathML was the desire to be interoperable with lots of existing tools.
This may not always apply.

Anyways, assuming any parsing rules we concoct are compatible with the
existing web corpus (this is a big if), I have no strong objections to
supporting the introduction of additional XML vocabularies by W3C as
part of meeting capability (1).

By supporting the introduction of additional XML vocabularies by W3C we
would also be ensuring that gobbledygook served as text/html is
consistently treated at the parsing level, even if it produces
inaccessible garbage content at the human level.

I do object to supporting vendor-specific elements (because
attribute-based experimentation is more future-proof) and to making
arbitrary elements conforming beyond the existing "applicable
specifications" clause. I don't understand the apparent
interest of gobbledygook producers in the HTML validity badge.

>> That suggests we sometimes need to expand the core vocabulary by means
>> of the standards process, not that we need to bypass the standards
>> process.
>
> De facto, the vocabulary is expanded first and the standards process
> follows.  I happen to think that's a Good Thing: I much prefer
> retrospective to prospective standardization.

I prefer a third way: people developing features in a gradually widening
sandbox in a process of ongoing refinement in continual dialog with
their peers until they are either dropped or standardized.

> But standards should define a clean, non-ad-hoc way to expand the
> vocabulary rather than letting it happen under the table, which has
> been the history of HTML and has made it the collection of hacks we
> see today.

Adding the mess of XML dialects to the mess of HTML just makes a bigger mess.

Sometimes that mess may be worth it.

>> I disagree that the information that might be represented via these
>> vocabularies *cannot* be represented, albeit sometimes cumbersomely,
>> with text marked up with today's generic text/html semantics.
>
> Well, that's trivially true.  Any information whatsoever *can* be
> represented with plain text, or for that matter with strings of 1s and 0s.
> (Oh, wait, it *is* represented with strings of 1s and 0s.)  The point
> is not what's possible but what's expressive.

I think text/html is already reasonably expressive for the typical
problem domains of such vocabularies, even if it could (and over the
long term will) be improved to be more expressive.

>> Resources that could be described using 3D graphics can also be
>> projected in SVG and described using text.
>
> Only with loss of information: no 2-D projection of 3-D information can
> do otherwise.  Google for "Ames room".

Cool reference. :)

Converting between media, translation, mark up, translation,
description, reading, looking all involve human selection of what
information is key, and is necessarily a lossy process. I think the
uniform interface is worth that level of information degradation.
Higher-grade information can always be made available via HTML's
excellent linking facilities.

--
Benjamin Hawkes-Lewis
Received on Monday, 3 January 2011 02:15:16 UTC