Re: Another viewpoint on validation from Arved Sandstrom on 2012-04-02 (public-ppl@w3.org from April 2012)

From: Arved Sandstrom <asandstrom2@eastlink.ca>
Date: Mon, 02 Apr 2012 20:54:45 -0300
To: public-ppl@w3.org
Message-id: <4F7A3C45.9050105@eastlink.ca>
My first attempt seems not to have gone through...

On 12-03-31 03:23 AM, Dave Pawson wrote:

> > On 31 March 2012 03:45, Arved Sandstrom <asandstrom2@eastlink.ca> wrote:
>> >> I sort of touched on this in a previous post (I think), but I'll recast
>> >> it. Rather than focus solely on schema-based validation of XSL-FO XML,
>> >> why not consider that running a conforming processor (drawn from
>> >> http://www.w3.org/community/ppl/wiki/XSL-FO_Processors that Tony
>> >> posted), or running a suitable small set of processors, on the XSL-FO
>> >> XML in question, is perhaps the most practical validation method?
> > Time to run such a check? Especially with jvm load times.
Performance consideration, Dave. In other words, don't worry about it
until it's shown to be an issue.

In any case, with modern JVMs (and I mean going back years now) initial
startup is not usually a concern. And the situation is improving year by
year. In the final analysis you avoid all of that for medium or high
throughput by having the JVM up and running continuously, which will
frequently be the default case for a variety of reasons.

> > What if 2 processors disagree?
I wasn't thinking of fault tolerance or NASA style computer voting when
I mentioned multiple processors. The idea there was that the more
processors that all independently confirm that some XSL-FO is OK that
the more assurance you've got. In such a scenario I'd say that
disagreement simply means that one or more processors failed the XSL-FO:
I'd abort the processing.

>> >> A lot of other technology areas have reference implementations. We don't
>> >> have that per se for XSL-FO but we do have a situation where many of the
>> >> processors do publish compliance/conformance information, which can be
>> >> vetted. As a user I think I would be substantially less concerned about
>> >> the theoretical validity of an XSL-FO file than I would be in (1)
>> >> whether a processor nominally supports the formatting objects I need and
>> >> (2) if so does this processor support them correctly?
> >  As Tony noted, many users are interested in 'is featire X supported
> > by Y processor'
As am I. And "support" means that feature X is "validly" supported, that
is, according to spec. Maybe incompletely, e.g. processor Y doesn't
support all the properties (by spec) for a given FO, but you know that
the properties you can use on that FO with processor Y are correctly
implemented.

This again gets back to what I'm saying. The available processors are in
fact all we've got and all we'll ever have, to do the real work. In the
case of FO or CSS the XML+XSLT combo is just phase one, producing the
input for the browser or the FO processor. But as users what we really
care about is the output of the browser or FO processor: people have
always cared more about what real web browsers do with
HTML/XHTML+JavaScript+CSS than they do in the theoretical validity of
the input.

How many people actually validate their XHTML web pages before feeding
them to a web browser? I don't know anyone who does that. I never have,
and I've written thousands of web pages in dozens of web apps. So why
worry so much about validating XSL-FO before it feeds to an FO
processor? Let's just maybe concentrate on the processors as the source
of validity information.

> >
>> >> I realize this may sound somewhat heretical. I use XML in real life
>> >> quite a lot on various jobs for clients, and more often than not
>> >> validation is essential - it allows applications to dispense with a
>> >> great deal of checking that would otherwise have to be done in code, and
>> >> concentrate on happy-path processing of XML input that is known to
>> >> correctly conform to a schema. In the case of FO processors, though -
>> >> correct me if my dated assumptions are wrong - I believe that they are
>> >> effectively written as if they were validators as well. After all you
>> >> still design and implement primarily off the specification. So a given
>> >> module of code that deals with a given FO has expectations as to what it
>> >> will see for attributes and child elements: if things aren't right you
>> >> get a planned error or exception in a well-written processor. That *is*
>> >> validation.
> > Yes, for their definition/ interpretation of the spec. We know there
> > have been disagreements, dual interpretations.
Absolutely, which is maybe where we can contribute. In any case, is it
not true that if there is a disagreement about the interpretation of the
spec - and I certainly remember a few from my day :-) - that this also
affects an approach that relies on schema-based validation?

> >
>> >> I'd be interested in hearing the arguments for validation separate from
>> >> simply running a good FO processor on the input document in question.
>> >> For example, I am professionally interested in using XSL-FO in the IBM
>> >> FileNet DITA publishing space, but I am still not convinced of the
>> >> practical utility of separately running a schema-based validator on the
>> >> input documents first. Not when I can simply attempt the processing and
>> >> catch exceptions: for my purposes I simply care that I could, or could
>> >> not, process the document.
> > If error handling were a part of the spec, I'd agree with this. #fail
> > isn't defined?
> >
> >
> > regards
> >
Not completely true that there is no error handling terminology in the
spec. You see language defining error conditions (e.g. "this is an
error") and often enough you'll see may/should error recovery procedures
defined in the spec.

I would agree that it's likely not comprehensive and air-tight from an
error-handling perspective. But I think #fail is there: I interpret
"this is an error" with no suggested recovery process as being a
full-stop abort of processing of that document.

Arved
Received on Monday, 2 April 2012 23:55:15 UTC