Re: Validating XHTML5 with XML entities from Jeff Schiller on 2008-08-27 (public-html@w3.org from August 2008)

From: Jeff Schiller <codedread@gmail.com>
Date: Wed, 27 Aug 2008 12:49:42 -0500
To: "Robert J Burns" <rob@robburns.com>
Cc: "HTML WG" <public-html@w3.org>
Message-ID: <da131fde0808271049y8218507n7bf0affdbbba552c@mail.gmail.com>

Thanks Robert.

Can you share more thoughts and/or address my other question
concerning XHTML5 adopting all HTML entities?

Regards,
Jeff

On 8/27/08, Robert J Burns <rob@robburns.com> wrote:
> HI Jeff,
>
>  On Aug 27, 2008, at 4:14 PM, Jeff Schiller wrote:
>
>
> > Hi Robert,
> >
> > On Wed, Aug 27, 2008 at 2:04 AM, Robert J Burns <rob@robburns.com> wrote:
> >
> > >
> > > >
> > > > I'd appreciate some insight.  Yes, I can continue to hack on WordPress
> > > > and get it to emit "&#160;" instead of "&nbsp;" and then go through my
> > > > database and replace all instances for the last several years, but...
> > > >
> > >
> > > Can't you have WordPress emit U+00A0, or are you using a charset
> encoding
> > > other than a UTF encoding.
> > >
> > >
> >
> > Again, maybe I don't understand what you're suggesting.
> >
> > I'm using UTF-8.  I can go through the WordPress source and change all
> > their PHP files that use &nbsp; &raquo; and &laquo; to their
> > equivalent numeric references but there are over 100 instances of
> > this.
> >
> > I can create a ticket and submit a 100-line patch to the WP project,
> > but I'm worried that getting this accepted by the WordPress
> > powers-that-be will be challenging, especially considering my last few
> > patches that languished for months (and those patches prevented Yellow
> > Screens of Death - the XHTML equivalent of a 'segfault').  What are
> > the chances of a 100-line patch that has no observable user benefit
> > (since declaring these entities is a quick 3-line fix that can be done
> > by the theme creator)?
> >
> > So if that patch doesn't get accepted (or it takes a long chunk of
> > time), then next time I upgrade to the new version of WP (happens
> > every 6 months or so), I have to remember to manually search/replace
> > those three entities.
> >
>
>  Well, this isn't really the list to discuss WordPress development issues.
> However, this is a problem that should be solved by WordPress by emitting
> Unicode characters rather than named or numbered character entity
> references. The reason to use character entity references is to facilitate
> documents in non-UTF encodings (or perhaps where the author is concerned the
> document will be converted or round-tripped through non-UTF encodings). For
> pure UTF charset documents, it's advisable to simply use the literal
> characters (and not references to them). Some like the source readability of
> named character references, but that readability depends solely on the
> reader's familiarity with the characters. If I'm a reader of a Cyrillic
> script based language, I'm not going to find reading the source easier if
> all of the characters are replaced with named references to the characters.
>
>  In terms of your present problem, I don't know enough about WordPress. If
> it cannot be fixed through configuration tweaks, it still is something that
> is better handled in the long-term by WordPress through literal characters
> rather than references.
>
>  Take care,
>  Rob
>

Received on Wednesday, 27 August 2008 17:50:24 UTC