Re: XML and InkML from Simon St.Laurent on 2003-08-28 (www-multimodal@w3.org from August 2003)

From: Simon St.Laurent <simonstl@simonstl.com>
Date: Thu, 28 Aug 2003 16:49:55 -0400
To: www-multimodal@w3.org
Message-ID: <r02000000-1026-2DE62DA4D99911D79BF60003937A08C2@[192.168.124.11]>
mf@w3.org (Max Froumentin) writes:
>"Simon St.Laurent" <simonstl@simonstl.com> wrote:
>> I still have no way to tell what the channels are except on a format
>> by format basis.  It does basically nothing for human readability
>> and only simplfies the parse slightly.
>
>You make a valid point, although I think that it's more valid about
>the parsing than about the readability. If a human is going to read
>the file, I suppose it'll be for debugging and you can expect that
>they know enough of the format that it makes little difference.

It makes a heck of a difference to be able to hand someone an XML file
and say "use this".  InkML currently requires a lot of learning both for
humans and computers.

>> This is a long and old thread.  W3C XML Schema's datatypes are a
>> disaster for many reasons.
>
>Let's keep that permathread to xml-dev. My argument was only about the
>lexical representation. I could very well have wrote about
>xsd:decimal, which requires parsing according to ISO822 (or something)
>rules.

It's interesting to me that while this seems only a bit more complicated
than, for instance, the xs:datetime type, that it isn't being defined as
a datatype per se.  

>> I do think, however, that such types deserve to have an XML
>> representation (which Reg Frag is designed to provide), and that we
>> need to think very carefully before creating lexical types.
>
>I still think that we want only just one syntax, whether it's XML
>or a lexical one. 

If you stick to the lexical syntax, someone will quite likely produce an
XML representation of it.  I suspect that it might be better for interop
(not to mention PR) if the W3C was the organization that did that.

>> The CSS style attribute is probably the case most frequently
>> discussed. On the unfortunate side, it requires a separate parse,
>> sometimes has multiple layers of depth, and puts multiple layers of
>> content into a single attribute.
>
>CSS has its own syntax for reasons that don't apply to InkML. And the
>style attribute (which some people still regard as a bad thing) is
>merely a convenience.

True on both counts, but I think the comparison favors CSS quite
completely, leaving me still wondering what the advantage is to using a
lexical approach for InkML traces.

>> On the bright side, it's human-readable, and software for processing
>> it is very widely available.  InkML has neither of those advantages.
>
>Unlike HTML, SVG and others, InkML is not at all meant to be
>hand-written or even read by humans, apart for debugging purposes. And
>developers could claim that packing trace data into an element using a
>compact syntax makes the rest of the markup more visible (because
>whatever the syntax, a human can't make much of trace data, and no,
>there isn't traces in the syntax). 

I suspect this comes down to values.  From what I can gather, between
this conversation and the specifications, the InkML WG doesn't value
human readability very much.  To me, that raises a real question of why
you're bothering to use XML at all.  

>Also, I'm very much afraid of the
>size a full-markup DOM would have. 

Someone somewhere is likely to want an object model of their InkML.  I
don't think DOM is necessarily appropriate for that - I'd tend to do
this kind of work using SAX, most likely.  In any case, I think your
fears are misplaced.  If people are going to use InkML through the DOM
in any useful way, you'll either need extensions to let them access the
traces as objects, or watch as their code parses InkML into other forms
without the assistance you might have provided.

>And of course InkML isn't widely
>available. CSS wasn't available when it was developed, that didn't
>stop it having an non-SGML syntax.

I suspect that at this point, where XML processing is cheap and readily
available and InkML processing is not, that InkML might want to opt for
the XML processing rather than using "not widely available" as a
justification to go your its own way.

>> it's a lousy way to share information about the traces with a wider
>> variety of processors.
>
>Most processors (with the exception of converters) that will work on
>InkML are going to implement much more than just trace data
>parsing. Why wouldn't trace data parsing be a (small) part of that
>general application dependent process?

Because it's a pain in the ass, to put it bluntly.  InkML seems pretty
plainly a format meant to be consumed by other applications.  Why make
them work harder to get at the information you're so generously
providing?

>> I have a hard time seeing why the W3C should care about
>> vendor-dependent formats, while making ink traces readily
>> processable by many applications - without the need for extra code
>> libraries - seems like a more congenial project.
>
>It's not about vendor dependence, it's about application dependence.

I don't see much of a distinction between the costs of vendor dependence
and the costs of application dependence.  InkML's "write it any way you
like" approach to the traces suggests pretty strongly that this is
designed so that vendors (and yes, their applications) can use it with
minimal change to their existing structures.  

InkML sends a very confusing message about independence and dependence.
On the one hand, it claims to be XML, whose traditional benefit has been
application independence.  On the other hand, it uses a highly
configurable lexical syntax for the core of its information structures,
which seems to march directly toward application dependence and requires
more work of applications which hope to consume InkML.

>If you did mean application dependence, then fair enough, we're back
>to the original question, and again I'll ask the WG that they consider
>your arguments in our upcoming discussions.

That would be why I've spent several hours working on the messages I've
posted on this subject to the comments list.
Received on Thursday, 28 August 2003 16:49:57 UTC