- From: Alex Russell <slightlyoff@google.com>
- Date: Fri, 25 Jan 2013 17:11:56 -0500
- To: David Sheets <kosmo.zb@gmail.com>
- Cc: "Michael[tm] Smith" <mike@w3.org>, Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>, public-html WG <public-html@w3.org>, "www-tag@w3.org List" <www-tag@w3.org>
- Message-ID: <CANr5HFX-hvaJyfiGWkoZa1aU+UDMpwZ1-Y4JW=R-HeYMdexxNA@mail.gmail.com>
On Fri, Jan 25, 2013 at 4:16 PM, David Sheets <kosmo.zb@gmail.com> wrote: > On Fri, Jan 25, 2013 at 11:48 AM, Alex Russell <slightlyoff@google.com> > wrote: > > On Thu, Jan 24, 2013 at 11:46 PM, David Sheets <kosmo.zb@gmail.com> > wrote: > >> > >> On Thu, Jan 24, 2013 at 4:44 PM, Alex Russell <slightlyoff@google.com> > >> wrote: > >> > On Thu, Jan 24, 2013 at 6:29 PM, David Sheets <kosmo.zb@gmail.com> > >> > wrote: > >> >> > >> >> On Thu, Jan 24, 2013 at 2:14 PM, Alex Russell < > slightlyoff@google.com> > >> >> wrote: > >> >> > I find myself asking (without an obvious answer): who benefits from > >> >> > the > >> >> > creation of polyglot documents? > >> >> > >> >> Polyglot consumers benefit from only needing an HTML parser *or* an > >> >> XML parser for a single representation. > >> > > >> > That's just a tautology. "People who wish to consume a set of > documents > >> > known to be in a single encoding only need one decoder". It doesn't > >> > illuminate any of the questions about the boundaries between > >> > producers/consumers that I posed. > >> > >> "People who wish to consume a set of documents known to simultaneously > >> be in multiple equivalent encodings only need one of several > >> decoders." > >> > >> That doesn't appear tautological to me. Check your cardinality. The > >> Axiom of Choice comes to mind. > > > > It appears to me that you've skipped a step ahead of answering my > question > > and are dismissing it on an assumption I'm not making (hence you think > it's > > not a tautology). > > Let us find our common misunderstanding and resolve it. > > > You posit a group of consumers who have one preference or another (a hard > > preference, at that) and wish me to treat this binary-seprable group as > > uniform. You then posit a producer who would like to address this group > of > > consumers. You further wish me (AFAICT) wish me to assume that these > > demanding consumers are fully aware of the polyglot nature of the > producer's > > content through unspecified means. > > Suppose you are publishing technical documentation. You already have a > toolchain constructed to ensure very specific invariants on your > output documents. Your consumers are savvy and may wish to script > against your documentation. You perceive that for a small cost > (reading polyglot spec and tweaking to emit it), you can simplify > consumption for your user base. This works with a single producer and consumer who have a fixed contract. That's sort of the definition of a closed system...and it's not the web. Why aren't they just publishing as one or the other? And if the tweaks are so small (but necessary), why isn't this a job for software? Consumers who want to process more than a single producer's content either have to: 1. Have a reliable way to know that what they consume isn't going to be broken (as HTML in XML parsing is) 2. Have a way of consuming a superset of any individual publisher's formats Either work, but polyglot precludes #1 on the basis that #2 shouldn't have to happen, against all the evidence of how this sort of thing is sorted out every day by real world software. > > What I'm asking is this: does this happen in the real world? > > Yes. > > > Under what circumstances? > > Structured document repositories > Legal case files > Digital archives > Database views > Email repositories > Software specifications > Anything projecting well-defined data structures into HTML > So "programs writing programs for programs". > > How frequently? > > Every time a programmatic producer wishes to serve an XML consumer and > an HTML consumer with fewer special cases. > > > On the open web (where I expect that the > > contract about what is and isn't XML are even more important), or inside > > closed systems and organizations? > > When you publicly publish something and declare your intent, you are > on the "open web". I think you'll struggle to get most W3C members to accept that definition. > > I don't see that the TAG has any duty to the latter, so it's an honest > question. > > Even "closed" systems export data and use off-the-shelf browsers. > Furthermore, many of these "closed" systems will be opening up in the > future. The TAG has a responsibility to guide publishers and > implementors who wish to support W3C standard formats in their systems > that do or may interact with the web. Our job is not to sell the web to a possible new audience -- it doesn't need our help and we're the last group I can imagine being effective as salespeople -- it's to help publishers understand how the rules work so that they can join it and to help spec authors make sure the rules are sane in the long-run. > > My personal experience leads me away from assuming that this is common. > > Mine as well. I didn't realize that only the most common case deserves > attention. What is your threshold for consideration? > > > I'm looking for countering evidence in order to be able to form an > informed > > opinion. So the question is open (ISTM): who are the consumers that do > not > > adapt to publishers? > > Why cannot publishers decide to publish content with maximal compatibility? > Why can't I publish a binary stream of bytes that's both a PNG and a BMP? I'm honestly trying to understand the real-world harm in giving up on polyglot. So far I don't sense that there's much to be lost that can't be easily won again through common and well-understood strategies -- the sorts of things that browsers and all sorts of other off-the-shelf software already do. > If niche publishers assume that consumers will adapt, they may find > that the hassle of adaptation has hurt their reach. > What hassle? Seriously, if you're consuming from a single fixed producer *you know what you're getting* and can build your software accordingly. >From the producer's side, of course you're going to publish for the maximum reach and capability *across the existing population of consumers*. If transcoding is needed and can be automated (which it can here)...why is this an issue? > If it costs a publisher 1 hour of labor to tweak their systems to > output polyglot and this offers their consumers access to a new > ecosystem of tools and libraries, is it not worth it? > If they could spend that hour slotting in a transcoder that publishes in the other one, addressing that same new market, is it not worth it? > Should each consumer adapt individually? Should the producer generate > and disseminate 2x the documents for XML vs. HTML consumers? A subset > of the syntax and semantics are provably compatible. > > Suppose a niche publisher has 10 consumers. It costs the publisher k > to ensure polyglot invariants on their product. It costs each consumer > in excess of k to wire together a lenient parser. How is that > efficient? > > I don't understand: how does polyglot burden you? That's the the bar to be met. The question is: what's the value to the web of demanding that we add it as a constraint on the development of HTML? > How is it > detrimental? If there is detriment, does it exceed the harmless desire > of some producers to produce maximally compatible content? > > > I observe many consumers that adapt and few producers who do > (particularly > > granted the time-shifted nature of produced content and the availability > of > > more transistors every year). > > And so we must reinforce the status quo by vetoing publication of > guidelines for maximal compatibility? I'm not saying what i *wish* would happen, I'm saying this *does* happen over and above the objections of system authors who loathe the additional complexity and all the rest. > >> >> Polyglot producers benefit from only needing to produce a single > >> >> representation for both HTML and XML consumers. > >> > > >> > What's the value to them in this? Yes, producers want to enable wide > >> > consumption of their content, but nearly ever computer sold can parse > >> > both > >> > HTML and XML with off-the-shelf software. The marginal gain is...what? > >> > >> 1. Smaller library dependency in software consumers > > > > But evidence suggests that valuable content is transformed by eager > > producers, not rejected. Consuming code that yields more value (can > consume > > more content) does better in the market. > > A significant fraction of consuming code is not on the market. > > > How is the value manifested for users of this code? > > Invariants are preserved and can be relied on. > > Interpreted languages typically provide invariants regarding machine > security that native executables do not. Declarative representations > provide invariants regarding interpretation (termination) that > imperative representations do not. > > Likewise, adherence to XML's syntax provides guarantees that > interpretability by an HTML parser does not. This guarantee has value > for consumers in the form of broader choice and faster time to > construct consuming software. So this is about welcoming our XML overlords? I think that ship sailed (and sank). > > And are we supposed to assume that disk space is more > > limited year-over-year (vs the historical trend)? > > "Smaller" in terms of complexity/cost/availability. It is cheaper for > consumers to select one of XML or HTML vs. HTML only by definition. If > the consumer wants to do front-end transformation as you describe, > they now require BOTH XML and HTML which is larger and more > complicated than either parser in isolation. > > >> 2. Wider interoperability with deployed systems > > > > But that hinges on assuming that consumers do not adapt, but rather that > > producers do (and retroactively!?) > > Why should deployed consumers be forced to adapt if a producer can > anticipate this need? I don't understand what you mean by > retroactively in this context. > > >> 3. Choice of internal data model, choice of parse strategy > > > > Who is this valuable to? And isn't that value preserved by > transformation? > > It is valuable to some consumer implementors. Requiring transformation > denies consumers the ability to choose their transformation parse > strategy which in turn denies consumers the ability to choose their > intermediate representation. If your target repository gives you > invariants, why increase the complexity of your intake system? > > >> > Again, is this about production in a closed system or between > >> > systems/groups/organizations? > >> > >> Nothing is closed. Communication requires two parties. It should not > >> be assumed that those parties co-operate. This applies even in a > >> "closed" system. Send and receive systems evolve independently. Your > >> distinction lacks a difference. > > > > I don't think it does. Cooperating parties are more likely to settle on > > stricter, more complete contracts (even if only though shared, unstated > > assumptions). Parties further away in space and time must find ways to > > adapt. > > Producers can anticipate consumers needs in the future. Not all > producers are careless to document quality. > > > I'm noting that this has led most systems that scale beyond one > > "sphere of control" to be more forgiving about what they accept over > time, > > not less. > > I do not deny this. This does not legitimize enforcing ignorance of > maximally compatible techniques for those producers who wish to use > them. > > > Here at Google we run MASSIVE systems that communicate over very fiddly > > protocols. > > That's nice. > > > We can do this because we control the entire ecosystem in which > > these systems live...in theory. But even as we've evolved them, we've > found > > that we must build multiple parsers into our binaries for even > "restrictive" > > data encodings. It just seems to happen, no matter intention or policy. > > I understand this systemic tendency. Ideally, a consumer delegates > this parsing concern to a module that handles discrepancies. Judging > by the complexity of the complete HTML5 vs. the XML parser, XML > parsers are easier to construct even with their quirks and input > discrepancies. This suggests that XML parsers will be more widely > available, of higher quality, and more standard in their output. Do > you have evidence to suggest otherwise? > > >> > If the content is valuable, it is consumers who invariably adapt. > >> > >> Free software is often valuable but consumers do not "invariably > >> adapt" due to practical barriers. In much the same way, publishers may > >> have user bases that are best served by providing additional > >> guarantees on the well-formedness (resp. ease-of-use) of their > >> documents. > > > > I'm trying to understand what the real-world costs are. > > Costs of what? Adopting polyglot? They are minimal in my experience. > Cost of consumer adaptation due to arbitrary documents? They are > higher than the cost of consumer adaptation to well-formed documents > as well-formed documents are a subset of arbitrary documents. > > > Free software isn't comprable, as it's not content per sae. > > I disagree. Free software provides a description of some useful > computation or coordination. Similarly, many classes of content > provide descriptions of structured data to be consumed by both humans > and machines. You appear to be advocating for making it harder to > produce structured documents to be easily consumed by machines > (programmed by time-constrained humans). > > > A book or movie might be. Does the free software make it easier to read > the book or movie? That's the analog. > > Is the body of US law a "book" or "free software"? It appears to share > traits with both. Would publication of US law in a strict format make > it easier or harder to consume by people and bots? > > >> > This is how the incentives and rewards in time-delayed consumption are > >> > aligned. > >> > >> Your market has perfect information? > > > > My questions are all about how information-deprived consumers will get > through the day. > > And mine are all about why you feel it is necessary to deprive > producers of the knowledge to provide maximally compatible content to > those voracious consumers. > > >> Consumers experience no switching costs? Nobody has > >> lock-in or legacy? No field deployment? No corporate hegemony games? > >> Are you advocating O(N) where N = number of consumers adaptations > >> instead of O(1) where 1 = producer adaptation? > >> > >> Or perhaps you regard O(N) = O(1) because the agency of the *average* > >> End User has been reduced to a choice between a handful of > >> general-purpose browsers? > > > > I think at this point you've convinced me that you're not interested in > > answering the question > > That's odd. My answering of your question should demonstrate my > interest. If my answer is not to your satisfaction or you do not > understand some tenet on which I base my answer, that is a different > matter. > > > and, perhaps frustratingly for both of us, helped me > > understand that Polyglot isn't a real-world concern > > Maximal compatibility is not a concern? > > > (although, do feel free > > to convince me otherwise with better arguments and data...I'm keenly > > interested to see them). > > I cannot force you to understand the utility of high-quality content. > What kind of data and arguments would you find more convincing? How > much evidence is necessary before it becomes OK to tell people about > techniques for maximal compatibility? > > >> > Keep in mind that Postel's Law isn't a nice-to-have, it's a > description > >> > of > >> > invariably happens when any system hits scale. > >> > >> Great! Why are you advocating censoring how to be more conservative in > >> what you emit? We have "hit scale" and, for some publishers, that > >> includes allowing for consumers which only understand XML or only > >> understand HTML. > >> > >> "Be conservative in what you send, liberal in what you accept." > >> > >> It takes both halves to make it work. > > > > I was inaccurate. The first half of the law *is* a nice-to-have (by > > definition). The second is a description of what happens when systems hit > > scale, invariable. I should have been clearer. Apologies. > > The first half of the law is what producers who want to be maximally > compatible do. These producers take communication failure very > seriously and, to them, it is not simply "nice-to-have", it is a > requirement. Just because general consumers must accept liberal input > does not legitimize denying producers the information required to > produce conservative output. > > David >
Received on Friday, 25 January 2013 22:12:54 UTC