Re: International chars in HTML files

Mike Meyer (mwm@contessa.phone.net)
Tue, 23 Jan 1996 08:29:03 PST


Subject:  Re: International chars in HTML files
In-Reply-To: <31045FEB.FBA@videodiscovery.com>
From: mwm@contessa.phone.net (Mike Meyer)
Date: Tue, 23 Jan 1996 08:29:03 PST
Message-Id: <19960123.73F7750.77F8@contessa.phone.net>
To: www-html@w3.org

> >2) Codes or names -must- be used to replace characters which would otherwise
> >be interpreted as mark-up. There are four [<>&"], and they conform to ISO
> >standards for their codes and names. Other codes or names from 8859-1 may
> >be used to avoid similar confusion, e.g, [/\-_].
> 
> Your phrase "otherwise be interpreted as mark-up" is the key, but it's also ambiguous. As
> far as I understand (and you may have meant this), only < needs to always be replaced by
> its entity (&lt;). The others [>&"] only need to be replaced by their entities (&gt; &amp;
> and &quot;) if they're inside a tag.

Note quite. & needs to be replaced outside if it's followed by a name
character or a '#' in any context where entities are recognized -
which means most contexts in HTML. For to present the string '&amp'
you need to use '&amp;amp'.

Greater than should only occur inside a tag if it's inside of quotes;
Some browsers incorrectly terminate the tag early if they see a '>'
inside of quotes and so require the replacement. So the replacement is
a good idea most of the time. Other browsers don't require the quotes
and require the replacement as well. Depending on a browser
interpreting an illegal construct a specific way is a bad idea.

Similar comments apply about double quotes, except that more browsers
don't recognize single quotes as a valid mechanism for quoting
attribute values.

If you want a simple phrasing that catches all required instances, "<&
anywhere they occur and >" inside of "'ed attribute values" would
do.

	<mike