Re: Namespaces and ixml from C. M. Sperberg-McQueen on 2022-05-04 (public-ixml@w3.org from May 2022)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Wed, 04 May 2022 20:54:45 +0200
To: Steven Pemberton <steven.pemberton@cwi.nl>
Cc: public-ixml@w3.org
Message-ID: <87tua5w2kq.fsf@blackmesatech.com>
Steven Pemberton writes:

> ixml is about taking implicitly structured (textual) data, recognising
> that implicit structure, and making it explicit in some way or another
> on output.
>
> XML is one of the targets for that explicit output, and currently the
> best for representing the abstractions. It need not be the only one,
> ...

I'm not quite sure what to do with an argument of this kind.

My understanding of hermeneutics (which can perhaps be thought of as the
study of what is involved in sentences of the form "X is about Y"), such
as it is, inclines me to believe that time will tell what ixml is about,
and it won't start telling until ixml is actually finished.  That does
not, of course, prevent us from attempting to interpret ixml now, in its
unfinished state.  But it does suggest that some caution is advisable.

My understanding of collaborative work makes me think that when X is the
prospective work product of a collective, "what X is about" is
determined by the group, and not by any one member of the group.  Again,
that does not prevent anyone in the group from trying to explain to
others what they think are the particular core ideas that make the
project worthwhile, in the hopes of persuading others to set an equally
high valuation on those ideas. But again, some caution may be advisable.
My understanding of collaborative work also tells me that I should not
resent it when other members of the group appear to discount the
collaborative nature of the effort and tell me what I should think
instead of attempting to persuade me of a position -- but being human I
fail often to do what I should or refrain from what I should not do, and
this may be a case in which I fall short of perfection. 

I will only observe that it does not appear that ixml has always been
understood thus -- or, at least, that it has not always been described
thus.  The paper Pemberton 2013, which I believe first introduced the
idea of ixml to the world, describes ixml this way:

    Is it not possible to combine the best of both worlds, and have
    authorable formats, that can still use the XML tool chain? Couldn't
    XML become the underlying format for everything?  

    The Approach

    The approach presented here is to add one more step to the XML
    processing chain, an initial one. This step takes any textual
    document, and a (reference to) a suitable syntax description, parses
    the document using the syntax description, and produces as output a
    parse tree that can be treated as an XML document with no further
    parsing necessary (or alternatively, the document can be serialised
    out to XML).

I see nothing here about making the structure visible "some way or
another", or about XML being "one of" multiple output formats.
Invisible-XML processors are described as a first step in an XML tool
chain, not as a broader more general replacement for yacc and lex and
similar tools.

> ... even though I recognise that some of you are involved only because
> of the XML aspect.

I have spent many happy parts of my career writing grammars, working
with parsers, and learning about parsing.  I might well be interested in
an effort to make general parsing more easily accessible.  But you are
probably right to suspect that as a historical fact ixml attracted my
interest because it was described as a step in an XML processing flow.


> 	input -> ixml -> output

> The real ixml is that middle bit.

> However, ixml is not XML, nor, contrary to what you may think, does it
> contain any XML-specific items:

I'm not quite sure what any of these sentences mean.  That "ixml is not
XML" appears to be self-evidently true, and not worth saying if it bears
that self-evident meaning.  ixml is a notation for context-free grammars
and a set of rules for using grammars in that notation to parse data
streams into XML. XML is a notation for documents which ensures that
documents have a particular set of properties, can trivially be parsed,
and can usefully be processed in various ways.  I infer that the
intended force of the utterance is to mean something different from the
surface meaning of the words "ixml is not XML".  But I do not know what
that something might be.  Nor do I understand what an "XML-specific"
item might be.

Perhaps you are saying that nothing in ixml is designed to work well
with XML.  I refer you to Pemberton 2013, which seems to me to say
something rather different.


>   ^ represents structured data, and was initially chosen because it
>   looks like a tree, and has the added benefit of looking like an XML
>  bracket on its side.

So ... not 'insertion', then.  That is a useful clarification, and makes
complete nonsense of the arguments recently given for using ^ for
injection of literals into the output.

>   @ represents data that is made unstructured on output (you could say
>   it is destructured). I had several candidates for the mark, such as
>  =, which looks flattened and un-tree-like, but in the end I chose @
> as the symbol used in XML for flat data.

> Namespaces are not a concept anywhere within ixml, nor do they map to
> any concept within ixml. It is purely a feature of XML, and one that
> was not even originally in the design of ixml ("not for generating any
> particular version of XML"). Adding explicit notation for namespaces
> somewhat fouls the ixml nest, making it specifically about a
> particular output format.

I think you made it about a particular output format when you described
IXML processing as producing XML documents.

The original description of the project could easily have been about
replacing yacc and lex with something more general.

It could easily have been about a general-purpose format conversion
tool, although in that case the proposal would have been exposed to
awkward questions about round-tripping, and it would be easy to predict
that the project would meet roughly the same fate as all the absolutely
general-purpose anything-to-anything format conversion projects I have
encountered over the last forty years.  (Hint: I cannot remember most of
their names.)

Defining ixml as a method of using context-free grammars to parse input
into XML made it concrete enough to be interesting and small enough to
be tractable.

If ixml is intended to produce XML output, I do not see why we should
spend so much time and space in the ixml spec (and CG) re-litigating so
many details of the XML specification, from name structure (nonterminal
names could just be defined as NCNames, but we seem to be attached to
the idea that it's better to design yet another identifier syntax,
because Lord knows the world doesn't have enough of them yet), to the
existence or non-existence of namespaces, to the existence or
non-existence of processing instructions.

> I proposed a way of doing namespaces using the existing mechanisms:
>
> 	data: @xmlns:iso, iso:date+.
>         @xmlns:iso: ^"http://example.com/ns/date".
> 	iso:date: ...etc...
>

Thank you for the reminder -- another topic to relitigate: structure of
qualified names.  XML has settled on allowing at most one colon.  Making
non-terminal names follow that rule would be trivially easy, but why
should we do that? Let's allow multiple colons!  Why not redesign XML
names yet again?  After all, none of us has anything better to do with
our time.

Oh, wait.

I do have better things to do.

I should go do them.


-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com
Received on Wednesday, 4 May 2022 18:55:08 UTC