- From: Robert J Burns <rob@robburns.com>
- Date: Wed, 27 Aug 2008 20:21:59 +0300
- To: "Jeff Schiller" <codedread@gmail.com>
- Cc: "HTML WG" <public-html@w3.org>
HI Jeff, On Aug 27, 2008, at 4:14 PM, Jeff Schiller wrote: > Hi Robert, > > On Wed, Aug 27, 2008 at 2:04 AM, Robert J Burns <rob@robburns.com> > wrote: >>> >>> I'd appreciate some insight. Yes, I can continue to hack on >>> WordPress >>> and get it to emit " " instead of " " and then go >>> through my >>> database and replace all instances for the last several years, >>> but... >> >> Can't you have WordPress emit U+00A0, or are you using a charset >> encoding >> other than a UTF encoding. >> > > Again, maybe I don't understand what you're suggesting. > > I'm using UTF-8. I can go through the WordPress source and change all > their PHP files that use » and « to their > equivalent numeric references but there are over 100 instances of > this. > > I can create a ticket and submit a 100-line patch to the WP project, > but I'm worried that getting this accepted by the WordPress > powers-that-be will be challenging, especially considering my last few > patches that languished for months (and those patches prevented Yellow > Screens of Death - the XHTML equivalent of a 'segfault'). What are > the chances of a 100-line patch that has no observable user benefit > (since declaring these entities is a quick 3-line fix that can be done > by the theme creator)? > > So if that patch doesn't get accepted (or it takes a long chunk of > time), then next time I upgrade to the new version of WP (happens > every 6 months or so), I have to remember to manually search/replace > those three entities. Well, this isn't really the list to discuss WordPress development issues. However, this is a problem that should be solved by WordPress by emitting Unicode characters rather than named or numbered character entity references. The reason to use character entity references is to facilitate documents in non-UTF encodings (or perhaps where the author is concerned the document will be converted or round-tripped through non-UTF encodings). For pure UTF charset documents, it's advisable to simply use the literal characters (and not references to them). Some like the source readability of named character references, but that readability depends solely on the reader's familiarity with the characters. If I'm a reader of a Cyrillic script based language, I'm not going to find reading the source easier if all of the characters are replaced with named references to the characters. In terms of your present problem, I don't know enough about WordPress. If it cannot be fixed through configuration tweaks, it still is something that is better handled in the long-term by WordPress through literal characters rather than references. Take care, Rob
Received on Wednesday, 27 August 2008 17:22:41 UTC