Re: Intent of ER-XML from David Carlisle on 2012-02-26 (public-xml-er@w3.org from February 2012)

From: David Carlisle <davidc@nag.co.uk>
Date: Sun, 26 Feb 2012 21:16:07 +0000
To: public-xml-er@w3.org
Message-ID: <4F4AA117.5080603@nag.co.uk>
On 26/02/2012 16:16, David Lee wrote:
>
> So I'd like to discuss: What is the expected purpose/use case of an
> implementation of XML-ER?
>
> Possible answers ?
>
> A) XML-ER is a 'Processor'
I suspect it has to be this.
>
> 1) A drop-in replacement for an XML parser.
>
> --> Implies: It must do *everything* an XML parser does (plus the ER stuff)
Well no, it implies that it has to be a full parser spec, we could 
decide that it didn't do everything an xml parser would do in all 
circumstances. (We could define for example that it never fetched 
external entities (compatible with a configuration of xml 1.x) and 
always skipped entity definitions even in a local subset (not compatible 
with xml 1.x))

>
> --> Output: An API ? An abstract data model ? (INFOSET)

I'd say any kind of abstract tree model. Current draft uses the 
terminology of the DOM which isn't my favourite tree model but if we 
think DOM based browsers are a likely user of this spec, then using the 
terminology of the DOM (but saying somewhere any tree model is OK) makes 
some kind of sense to me.


>
> 2) A pre-processor for an XML parser.
>
> --> Input : "Stuff" (TBD)
>
> --> Output : Well-formed XML - defined as an abstract data model?
>
> --> Implies an XML parser then may be used to fill in the stuff that
> ER-XML doesn't define,

It only implies that if we constrain the "fixup" that can be performed
to fixup that doesn't require xml parsing. If a document is not well 
formed because entities or parameter entities are messed up, then either 
you need a full xml parse (more or less) to untangle the entities and 
fix what needs fixing, or you unconditionally remve all dtd references, 
or you can't guarantee the output is well formed.
I'm not necessarily opposed to this model of spec, but currently can't 
see how it would work.
>
> For example: parsing DTDs, external entities etc.
see above, if you don't parse the dtd during the fix up stage, do you 
remove the dtd, or just leave it unfixed?

 > ...

> --> Example: The Namespaces spec doesn't define a 'processor'.

true although many of of the problems of namespaces are arguably due to 
that fact. It tries to layer itself a layer above the xml parse (like a 
schema validator) but naming rules are fundamental and that layering 
violation shows through in all the rough edges around namespace 
declarations looking like attributes and being attributes in some models 
(DOM) but not in others (XDM),

>
> IMHO we need to clarify exactly what the XML-ER specification is
> intended for before we can make much more progress.

yes I guess so:-)

David

To make things concrete.
What would you _want_ the output from this to be?


<!DOCTYPE foo [
<!ENTITY a "a">
<!ENTITY b "<b>">
]>
<foo>
&a;&b;
</foo>



My suggestion is that doctype declarations only be parsed to the extent 
that they be skipped and that the only entity references used are the 
html/mathml ones so I'd suggest the output (whether you think of this as 
fix-up giving the xml document, or as a representation of the output 
tree of an xml-er processor)

<foo>
&amp;a;&amp;b;
</foo>

Note that I would suggest getting the _same_ result if the input were


<!DOCTYPE foo [
<!ENTITY a "a">
<!ENTITY b "b">
]>
<foo>
&a;&b;
</foo>


which, unlike the first document, is well formed.






>
Received on Sunday, 26 February 2012 21:16:28 UTC