[Prev][Next][Index][Thread]

Re: Concrete syntax, character sets



Glad to see the discussion starting. Martin raised some interesting points and 
Tim's responses prompted me to reply.

Tim Bray wrote:

> XML should have *no* concept of quantities.  Names, nesting depths, whatever,
> can be as large as required to meet the requirements of the application.

This I like.

> One straightforward way to do this and preserve compatibility
> with SGML is to require an XML processor to have the capability of writing
> an appropriate SGML declaration to set the quantities high enough to make
> a particular XML DTD valid.

This I don't like. Requiring that XML processors have this capability (feature?) 
seems overly restrictive. Noting that it can be done would be sufficient for me. 
A reference application would be nice - perhaps something available from W3C.

> If you want to use anything but 7-bit ASCII in markup, use real SGML.
> XML should have the reference concrete syntax hardwired in.

I think we should recognize that 7-bit ASCII isn't sufficient for something that 
professes to be "World Wide". I'm not aware of large technical problems with 
other encodings for markup but do know 7-bit ASCII restrictions are an issue with 
many people. I'd like to see XML support other encodings in markup.

> *Good* point... with modern parsing and encoding technology, it seems like
> it would be easy, and it would certainly be desirable, for XML 
> data not to be limited to small old character sets.  On the other hand, with
> XML, ultimate flexibility is of less importance than ease of implementation;
> would it be thinkable to say that "all XML data is always in UTF8"?  It 
> seems this would break almost nothing and allow almost anything you'd want 
> to do.

I don't have a problem with UTF8 for data. Why not for markup as well?