Re: Stupid NET Tricks

On Sun, 15 Sep 1996, Martin Bryan wrote:


> > ((again) We can still use un-minimized end tags. (/again)/)
>  
> Note that the above example is incorrect if you use TAGC on start-tags as
> well as end-tags, because it should read:
> 
> ((again)/) We can still use un-minimized end tags. (/again)/)

I don't think it is incorrect.  The idea is _every_ start tag uses NET
rather than TAGC, unless we want to make explicit that it is EMPTY.
My understanding is that an element which is begun with a NET enabling
start tag may be closed with an un-minimized close tag as well as a
NET.

I'd rather eliminate un-minimized end tags entirely.  The above
example was just to show that a simple paren matching algorithm would
not need any special case handling for their presence.

Let me clarify my intent in the exercise. I wanted to see if one could,
within today's 8879,

1) use a single character to terminate elements, and

2) enable the use of a simple character counting algorith to
   track your location in an XML document, and

3) solve Arjun Ray's problem with tag-soupers' spanning element
   tendency.

It turns out that NET can be that single character, and if you use a
sequence of two of its mate for STAGO, you have something that appears
to meet the three criteria.  As it happens, by defining a rule which
say's TAGC is only used on end tags and start tags with EMPTY content,
you have the flip-side of the trapeze act, and no need to alter 8879
to distinguish ETAGC and STAGC.

Now, just to be a bit more radical, if we do decide that un-minimized
end tags are more harmful than they're worth, and we decide it is not
necessary to identify EMPTY elements, then we no longer have any use
for either ETAGO or TAGC.
  

> I, for one, am not using ( and ) all over the place!
> I might consider:
> 
> [[again]We can still use un-minimized end tags (aka [[cite]ISO 8879:1986
> .... (SGML)[/cite]/]).[/again]/]


or, 
 [[again]We can still use un-minimized end tags (aka [[cite]ISO 8879:1986
 .... (SGML)]).]

You are right, there are problems with just about any delimiter
characters we choose.  They either are too common in text, or don't
offer the visual cues we'd like.  (I'm not too sure that <foo></foo>
really provides any visual cues beyond what we've trained ourselvees
to see.)

There are several ways around this within 8879, none particularly 
attractive:

1) sacrifice the visual cue, choosing a rarely encountered character.

2) require lots of character entity references.  How about &L; and &R;?
   Hey, we never said we had to make it easy on the typist. 

3) allow the declaration of an alternate concrete syntax. 

4) invent some stupid MSOCHAR/MSICHAR/MSSCHAR trick for escaping.

5) Use some non-text characters which resemble brackets when viewed,
   and teach our text editors to deal with 'em.
  (offered by someone who wishes to remain anonymous) 

Any others?

-Bill

Received on Sunday, 15 September 1996 16:30:33 UTC