Re: Validating XHTML5 with XML entities

HI Jeff,

On Aug 27, 2008, at 4:14 PM, Jeff Schiller wrote:

> Hi Robert,
>
> On Wed, Aug 27, 2008 at 2:04 AM, Robert J Burns <rob@robburns.com>  
> wrote:
>>>
>>> I'd appreciate some insight.  Yes, I can continue to hack on  
>>> WordPress
>>> and get it to emit "&#160;" instead of "&nbsp;" and then go  
>>> through my
>>> database and replace all instances for the last several years,  
>>> but...
>>
>> Can't you have WordPress emit U+00A0, or are you using a charset  
>> encoding
>> other than a UTF encoding.
>>
>
> Again, maybe I don't understand what you're suggesting.
>
> I'm using UTF-8.  I can go through the WordPress source and change all
> their PHP files that use &nbsp; &raquo; and &laquo; to their
> equivalent numeric references but there are over 100 instances of
> this.
>
> I can create a ticket and submit a 100-line patch to the WP project,
> but I'm worried that getting this accepted by the WordPress
> powers-that-be will be challenging, especially considering my last few
> patches that languished for months (and those patches prevented Yellow
> Screens of Death - the XHTML equivalent of a 'segfault').  What are
> the chances of a 100-line patch that has no observable user benefit
> (since declaring these entities is a quick 3-line fix that can be done
> by the theme creator)?
>
> So if that patch doesn't get accepted (or it takes a long chunk of
> time), then next time I upgrade to the new version of WP (happens
> every 6 months or so), I have to remember to manually search/replace
> those three entities.

Well, this isn't really the list to discuss WordPress development  
issues. However, this is a problem that should be solved by WordPress  
by emitting Unicode characters rather than named or numbered character  
entity references. The reason to use character entity references is to  
facilitate documents in non-UTF encodings (or perhaps where the author  
is concerned the document will be converted or round-tripped through  
non-UTF encodings). For pure UTF charset documents, it's advisable to  
simply use the literal characters (and not references to them). Some  
like the source readability of named character references, but that  
readability depends solely on the reader's familiarity with the  
characters. If I'm a reader of a Cyrillic script based language, I'm  
not going to find reading the source easier if all of the characters  
are replaced with named references to the characters.

In terms of your present problem, I don't know enough about WordPress.  
If it cannot be fixed through configuration tweaks, it still is  
something that is better handled in the long-term by WordPress through  
literal characters rather than references.

Take care,
Rob

Received on Wednesday, 27 August 2008 17:22:41 UTC