[Prev][Next][Index][Thread]

Re: equivalent power in SGML and XML



James said:
> I just don't think [deleting OMITTAG] is practical.
> HTML relies OMITTAG (notably for P, LI, DT, DD).
> I don't see how we can possibly achieve our ease
> of implementation goals if we support OMITTAG in XML.

I can certainly see how we could make HoTMetaL Pro have an option to
save HTML files in XML.  It isn't that easy to suggest a general converter
using SGML tools, since most HTML documents are not valid according to
any particular HTML DTD.  But I can imagine public domain conversion utilities
that could be fairly painless.

Would that be too much of a compromise?

With no SHORTTAG and no OMITTAG and no EMPTY elements and no MIXED content,
XML is becoming about as far from HTML as HTML is from RTF :-)

It seems to me that limited versions of some of these things could usefully
be retained:
* allow an omitted end tag immediately before an end tag:
	<P><em>stuff</P> (is this worth it? it's easy to parse)
	
* allow </> (easy to parse, and safer than NET)

* allow mixed content with | but not with ",", so as to avoid the difficulties
  associated with whitespace in element context in a mixed content model
  being taken as PCDATA.  It is a mistake to think that using pseudo-elements
  solves very much, I think.  Consider:
  <P><text>here is my</text>
  <em>very</em>
  <text>important document</text>
  </P>
  where in fact the newlines should be inside the <text> elements, not
  between <text> and <em>.  This is hard to explain.
  If comments and processing instructions make this complex, delete them and
  use elements instead.  Heck, you can use elements instead of
  marked sections and attributes and entities and get a much cleaner syntax!

  Think of combining the wonderful work done by Tommie and Debbie on Pinnacles
  Reflections with the careful TEI WSD spec...  entities as they could have
  been :-) -- as elements, using ID/IDREF to `insert' them.
  
* consider a naming convention for EMPTY elements, if they are allowed at
  all -- e.g. <E.BR>, <E.HR> etc. -- if SGML is to be modified to allow
  end tags on EMPTY elements, this is probably irrelevant.
  
* allow default attributes to be omitted, so that arch forms can be used
  (HTML IMG has 22 attributes according to HoTMetaL Pro 3.0, and almost all
  of them have defailt values.  (a few are actually #IMPLIED I think).
  Having to put DIR="ltr" on eery element would be a pain, for example...

Lee