Re: newbie &nbsp representation question

On Tue, 18 Aug 2009, Irene Vatton wrote:
> Le lundi 03 août 2009 à 16:36 -0600, J. Waldram a écrit :
>> I have searched the archives and have seen only one related post from 2006
>> on an older version of Amaya.  Amaya 11.2, with the "Keep multiple
>> Spaces" option, the &nbsp is represented as a hex byte 0xa0 instead of
>>   or  .  Is this a mis-translation for the
>> &nbsp that is rendered by browsers?
>
> The nbsp character is represented as:
> 1) a hex byte 0xa0 when the document is utf-8 encoded
> 2)   when the document is not utf-8 encoded and there is a doctype
>    that declares the entity
> 3)   or   when the document is not utf-8 encoded and there is
>    not a doctype that declares the entity
>
> If Amaya generates a hex byte, the document is considered as utf-8
> encoded. If it's not, the bug is there.

I believe that a bug is present in the item 3) case.
Does not the   notation allow the renderer/browser to interpret the 
function without regard of character encoding?
Within the case where multiple spaces are preserved is that of multiple 
spaces (for formatting) that can be removed or allow a line break (by the 
browser).  An example is the two spaces before the beginning of this 
sentence.  I see the need for using nbsp and would suggest that a specific 
action for an insertion of a nbsp, as in the case of math symbols and 
<br/>, be used.  Alternate logic such as:
  If (preserve multiple spaces) then
   if (preserve multiple spaces as nbsp) then
    put &#xa0 in document
   else
    do not remove the multiple "regular" spaces and allow the browser to 
format the document as needed
   fi
else
   remove multiple spaces from document
fi
could be used.  I am not quite sure why removing multiple spaces is a good 
default action.  Is this some style suggestion/action?
>
>> Since XML does not allow &nbsp
>> and the &#xa0 is allowed could this be recoded as a text string not a hex
>> byte?  Amaya source code views show it as a "~" (tilde) which is less
>> obvious in coding and some browsers render it as a literal character.
>
> Amaya source code views show it as a "~" (tilde) to let know to the
> author that there is a nbsp which is not managed as a single space.
> A single space can be removed or allows a line break; this is not true
> for a nbsp.
>
>> This is in a
>> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
>>        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
>>
>> <meta http-equiv="content-type" content="text/html; charset=iso-8859-1" />
>>    <meta http-equiv="Content-Style-Type" content="text/css" />
>>
>> I did not want to muck with the source code.  I do not
>> know enough to see the ramifications of changes.
>> Are the relevant parts in translate.c the place to make changes?
>>
>> Thank you,
>>
>> -Jim Waldram
> -- 
> Irene Vatton <Irene.Vatton@inria.fr>
> INRIA

-Jim Waldram

Received on Wednesday, 19 August 2009 23:01:52 UTC