Re: NU’s polyglot possibilities (Was: The non-polyglot elephant in the room) from Alex Russell on 2013-01-26 (www-tag@w3.org from January 2013)

From: Alex Russell <slightlyoff@google.com>
Date: Fri, 25 Jan 2013 20:50:28 -0500
To: David Sheets <kosmo.zb@gmail.com>
Cc: "www-tag@w3.org List" <www-tag@w3.org>, public-html WG <public-html@w3.org>, Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>, "Michael[tm] Smith" <mike@w3.org>
Message-ID: <CANr5HFWbmeccJj2ffFHYL0JLKmNs9rVaPyxWGqMZsMh0=NicoA@mail.gmail.com>
On Jan 25, 2013 7:36 PM, "David Sheets" <kosmo.zb@gmail.com> wrote:
>
> On Fri, Jan 25, 2013 at 2:11 PM, Alex Russell <slightlyoff@google.com>
wrote:
> >
> > On Fri, Jan 25, 2013 at 4:16 PM, David Sheets <kosmo.zb@gmail.com>
wrote:
> >>
> >> On Fri, Jan 25, 2013 at 11:48 AM, Alex Russell <slightlyoff@google.com>
> >> wrote:
> >> > On Thu, Jan 24, 2013 at 11:46 PM, David Sheets <kosmo.zb@gmail.com>
> >> > wrote:
> >> >>
> >> >> On Thu, Jan 24, 2013 at 4:44 PM, Alex Russell <
slightlyoff@google.com>
> >> >> wrote:
> >> >> > On Thu, Jan 24, 2013 at 6:29 PM, David Sheets <kosmo.zb@gmail.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> On Thu, Jan 24, 2013 at 2:14 PM, Alex Russell
> >> >> >> <slightlyoff@google.com>
> >> >> >> wrote:
> >> >> >> > I find myself asking (without an obvious answer): who benefits
> >> >> >> > from
> >> >> >> > the
> >> >> >> > creation of polyglot documents?
> >> >> >>
> >> >> >> Polyglot consumers benefit from only needing an HTML parser *or*
an
> >> >> >> XML parser for a single representation.
> >> >> >
> >> >> > That's just a tautology. "People who wish to consume a set of
> >> >> > documents
> >> >> > known to be in a single encoding only need one decoder". It
doesn't
> >> >> > illuminate any of the questions about the boundaries between
> >> >> > producers/consumers that I posed.
> >> >>
> >> >> "People who wish to consume a set of documents known to
simultaneously
> >> >> be in multiple equivalent encodings only need one of several
> >> >> decoders."
> >> >>
> >> >> That doesn't appear tautological to me. Check your cardinality. The
> >> >> Axiom of Choice comes to mind.
> >> >
> >> > It appears to me that you've skipped a step ahead of answering my
> >> > question
> >> > and are dismissing it on an assumption I'm not making (hence you
think
> >> > it's
> >> > not a tautology).
> >>
> >> Let us find our common misunderstanding and resolve it.
> >>
> >> > You posit a group of consumers who have one preference or another (a
> >> > hard
> >> > preference, at that) and wish me to treat this binary-seprable group
as
> >> > uniform. You then posit a producer who would like to address this
group
> >> > of
> >> > consumers. You further wish me (AFAICT) wish me to assume that these
> >> > demanding consumers are fully aware of the polyglot nature of the
> >> > producer's
> >> > content through unspecified means.
> >>
> >> Suppose you are publishing technical documentation. You already have a
> >> toolchain constructed to ensure very specific invariants on your
> >> output documents. Your consumers are savvy and may wish to script
> >> against your documentation. You perceive that for a small cost
> >> (reading polyglot spec and tweaking to emit it), you can simplify
> >> consumption for your user base.
> >
> > This works with a single producer and consumer who have a fixed
contract.
>
> This works with any number of producers and consumers who have a
> "fixed contract". For simplicity, let's called this "fixed contract" a
> "standard".
>
> > That's sort of the definition of a closed system...and it's not the web.
>
> Any strictly standardized communication format is a closed system? The
> internet isn't standardized? The web is closed? Clearly in every
> large-scale system some emitters will be in error and some consumers
> will be lenient. That doesn't obviate the need for standards or excuse
> lack of quality control.
>
> > Why aren't they just publishing as one or the other?
>
> Why must they pick between broad compatibility and automation if both
> are possible, trivially, in a single representation?
>
> > And if the tweaks are so small (but necessary), why isn't this a job
for software?
>
> The tweaks are small because of a shared heritage which allows
> significant intersection in conforming representations. After
> publication of a Polyglot Recommendation, new systems which elect to
> conform will not need tweaking.
>
> Why do people write portable C? Why not write platform-specific C and
> then write some software to make the small tweaks?
>
> > Consumers who want to process more than a single producer's content
either have to:
> >
> > Have a reliable way to know that what they consume isn't going to be
broken
> > (as HTML in XML parsing is)
> > Have a way of consuming a superset of any individual publisher's formats
> >
> > Either work, but polyglot precludes #1 on the basis that #2 shouldn't
have
> > to happen, against all the evidence of how this sort of thing is sorted
out
> > every day by real world software.
>
> I think you are mistaken in your belief that polyglot precludes #1.
> This is like saying that writing portable C makes a strictly
> conforming C compiler impossible or worthless.
>
> There is lots of real world software working every day that supports a
> superset of any individual publisher's format in various XML
> vocabularies. You have some evidence of consuming systems that strive
> to be maximally general. This evidence does not negate the evidence
> that there are systems that produce and consume content that strictly
> adheres to standards. Not everyone is an HTML parser implementor or
> has easy access to a plug-in HTML parser. Not everyone wants or needs
> to deal with broken representations. Not everyone holds their
> consumers in such contempt as to force them to adopt HTML parsers.
>
> Can the web have sub-communities using document standards or is it
> Google's "good enough" way only?
>
> Should W3C remain silent on how their standards interact?
>
> >> > What I'm asking is this: does this happen in the real world?
> >>
> >> Yes.
> >>
> >> > Under what circumstances?
> >>
> >> Structured document repositories
> >> Legal case files
> >> Digital archives
> >> Database views
> >> Email repositories
> >> Software specifications
> >> Anything projecting well-defined data structures into HTML
> >
> > So "programs writing programs for programs".
>
> HTML documents are programs now? I thought you were just arguing that
> they shared nothing with free software?
>
> And is the concept of "programs writing data for programs for humans"
> so foreign to require indignation? Doesn't this describe essentially
> every standard data format ever devised?
>
> >> > How frequently?
> >>
> >> Every time a programmatic producer wishes to serve an XML consumer and
> >> an HTML consumer with fewer special cases.
> >>
> >> > On the open web (where I expect that the
> >> > contract about what is and isn't XML are even more important), or
inside
> >> > closed systems and organizations?
> >>
> >> When you publicly publish something and declare your intent, you are
> >> on the "open web".
> >
> > I think you'll struggle to get most W3C members to accept that
definition.
>
> What definition do you suggest "most W3C members" would accept for
> "open web"? Does "open web" exclude some transports? Some formats?
> Perhaps we have different ideas on what "open" means.
>
> >> > I don't see that the TAG has any duty to the latter, so it's an
honest
> >> > question.
> >>
> >> Even "closed" systems export data and use off-the-shelf browsers.
> >> Furthermore, many of these "closed" systems will be opening up in the
> >> future. The TAG has a responsibility to guide publishers and
> >> implementors who wish to support W3C standard formats in their systems
> >> that do or may interact with the web.
> >
> > Our job is not to sell the web to a possible new audience -- it doesn't
need
> > our help and we're the last group I can imagine being effective as
> > salespeople
>
> You are responding to a figment. I mentioned nothing of sales or
> marketing. The publishers and implementors are already sold and "wish
> to support W3C standard formats in their systems that do or may
> interact with the web".
>
> > -- it's to help publishers understand how the rules work so that
> > they can join it and to help spec authors make sure the rules are sane
in
> > the long-run.
>
> I believe that we agree here.
>
> Do you feel that the polyglot document does not help publishers
> understand the (X)HTML syntax?

This is the core question worth answering, and as far as I can tell from
the responses here, I have no reason to think that polyglot markup aids
understanding. It lacks the signage that clearly html or XML documents
provide and without that clarity, its value seems washed out beyond
recognition.

> I believe that the polyglot document serves precisely this purpose.
>
> Do you feel that the polyglot document hurts long-term viability of
> the standards?

I don't know, but it does impose constraints without demonstrated value.

> I believe that the polyglot document decreases fragmentation and
> guides spec authors to more sane rules.

We disagree.

> Do you feel that the unelected, top-down structure of HTML
> standardization should be given greater leeway to further fragment
> implementations and introduce special cases? On what grounds?

Yes. Because HTML is what gives the web value.

> >> > My personal experience leads me away from assuming that this is
common.
> >>
> >> Mine as well. I didn't realize that only the most common case deserves
> >> attention. What is your threshold for consideration?
> >>
> >> > I'm looking for countering evidence in order to be able to form an
> >> > informed
> >> > opinion. So the question is open (ISTM): who are the consumers that
do
> >> > not
> >> > adapt to publishers?
> >>
> >> Why cannot publishers decide to publish content with maximal
> >> compatibility?
> >
> > Why can't I publish a binary stream of bytes that's both a PNG and a
BMP?
>
> You probably can but probably not so that the representation is
> simultaneously conforming. These formats are much farther apart than
> HTML and XHTML. The HTML5 specification defines both HTML and XHTML
> syntaxes in a single document with many overlapping concepts.
>
> What do you find objectionable about publishers leveraging this fact?
>
> > I'm honestly trying to understand the real-world harm in giving up on
> > polyglot.
>
> What is the real-world harm in giving up on XML?
>
> Wasted labor, fragmented syntax, requisite reimplementation, no
> suitable replacement...
>
> > So far I don't sense that there's much to be lost that can't be
> > easily won again through common and well-understood strategies -- the
sorts
> > of things that browsers and all sorts of other off-the-shelf software
> > already do.
>
> You are trading away a very cheap improvement that yields simplicity
> benefits to some consumers for an expensive, global improvement and
> the general adoption of data format fatalism and software to support
> it. Why are you taking away publishers' choice?
>
> >> If niche publishers assume that consumers will adapt, they may find
> >> that the hassle of adaptation has hurt their reach.
> >
> > What hassle? Seriously, if you're consuming from a single fixed producer
> > *you know what you're getting* and can build your software accordingly.
>
> Until the producer changes their output and your hacky regexes don't
> work or your assumption about their page structure becomes invalid. If
> several producers instead say "here is a standard method that we
> encourage you to use and share among our community" and they then
> stick to this promise (through, say, publishing software that enforces
> it), who are you to tell them "no"?
>
> > From
> > the producer's side, of course you're going to publish for the maximum
reach
> > and capability *across the existing population of consumers*. If
transcoding
> > is needed and can be automated (which it can here)...why is this an
issue?
>
> When the publisher's need to serve their consumers can be guaranteed
> to be met by a single format, why necessitate transcoding?
>
> >> If it costs a publisher 1 hour of labor to tweak their systems to
> >> output polyglot and this offers their consumers access to a new
> >> ecosystem of tools and libraries, is it not worth it?
> >
> > If they could spend that hour slotting in a transcoder that publishes
in the
> > other one, addressing that same new market, is it not worth it?
>
> Why increase the number of representations published? Why deal with a
> transcoder? If they can have a single pipeline that serves all their
> users' needs, why force them to support multiple representations?
>
> >> Should each consumer adapt individually? Should the producer generate
> >> and disseminate 2x the documents for XML vs. HTML consumers? A subset
> >> of the syntax and semantics are provably compatible.
> >>
> >> Suppose a niche publisher has 10 consumers. It costs the publisher k
> >> to ensure polyglot invariants on their product. It costs each consumer
> >> in excess of k to wire together a lenient parser. How is that
> >> efficient?
> >>
> >> I don't understand: how does polyglot burden you?
> >
> > That's the the bar to be met. The question is: what's the value to the
web
> > of demanding that we add it as a constraint on the development of HTML?
>
> How does it unreasonably constrain the development of HTML?
>
> What's the value to the web of throwing away thousands of man-years of
> effort into XML tooling?
>
> What's the value to the web of mandating an increasingly baroque
> language with an increasingly idiosyncratic and complex parser?
>
> Who benefits from mandating complexity barriers? Is it the independent
> developer or the public corporation?
>
> >> How is it
> >> detrimental? If there is detriment, does it exceed the harmless desire
> >> of some producers to produce maximally compatible content?
> >>
> >> > I observe many consumers that adapt and few producers who do
> >> > (particularly
> >> > granted the time-shifted nature of produced content and the
availability
> >> > of
> >> > more transistors every year).
> >>
> >> And so we must reinforce the status quo by vetoing publication of
> >> guidelines for maximal compatibility?
> >
> > I'm not saying what i *wish* would happen, I'm saying this *does* happen
> > over and above the objections of system authors who loathe the
additional
> > complexity and all the rest.
>
> And you are using this generalization of "the wild" to justify a
> formal stance of "striving for quality is pointless so we shouldn't
> tell producers how" which will undermine existing communities with
> minimal resources which, for purposes of self-preservation, already
> favor adherence to standards.
>
> Is there any place for those who wish to adhere to standards? Why does
> HTML5 specify authoring constraints that are stricter than what
> conformant HTML5 parsers will accept?
>
> How are you so certain that we must dissuade producers from publishing
> polyglot documents? What clear and present danger does Recommendation
> of polyglot present to the technical architecture of the WWW?
>
> >> >> >> Polyglot producers benefit from only needing to produce a single
> >> >> >> representation for both HTML and XML consumers.
> >> >> >
> >> >> > What's the value to them in this? Yes, producers want to enable
wide
> >> >> > consumption of their content, but nearly ever computer sold can
parse
> >> >> > both
> >> >> > HTML and XML with off-the-shelf software. The marginal gain
> >> >> > is...what?
> >> >>
> >> >> 1. Smaller library dependency in software consumers
> >> >
> >> > But evidence suggests that valuable content is transformed by eager
> >> > producers, not rejected. Consuming code that yields more value (can
> >> > consume
> >> > more content) does better in the market.
> >>
> >> A significant fraction of consuming code is not on the market.
> >>
> >> > How is the value manifested for users of this code?
> >>
> >> Invariants are preserved and can be relied on.
> >>
> >> Interpreted languages typically provide invariants regarding machine
> >> security that native executables do not. Declarative representations
> >> provide invariants regarding interpretation (termination) that
> >> imperative representations do not.
> >>
> >> Likewise, adherence to XML's syntax provides guarantees that
> >> interpretability by an HTML parser does not. This guarantee has value
> >> for consumers in the form of broader choice and faster time to
> >> construct consuming software.
> >
> > So this is about welcoming our XML overlords?
>
> You are free not to use anything related to XML.
>
> That said, I'd rather have the *option* of XML overlords than
> corporate overlords deciding which guidelines for interoperability
> between standards may be published.
>
> > I think that ship sailed (and sank).
>
> Perhaps in the mass market that is true. Many niche communities still
> find significant value in a standard set of useful tools, derision by
> pop culture programmers notwithstanding.
>
> Do you have a replacement for XML you would like to offer? What harm
> will you experience if polyglot becomes a REC?
>
> David
Received on Saturday, 26 January 2013 01:50:58 UTC