W3C home > Mailing lists > Public > www-xsl-fo@w3.org > July 2007

Processing characters, displayed as sharp sign (#)

From: Iskren Pushkarov <iskrensp@sirma.bg>
Date: Mon, 23 Jul 2007 10:34:41 +0300
To: <www-xsl-fo@w3.org>
Message-ID: <000c01c7ccfb$f19c7a10$d180a8c0@sirma.int>
Hi,

 

There are two types of these:

 

1)       the character is not found at the
http://xmlgraphics.apache.org/fop/fo/fonts.fo.pdf

2)       the character exists at above location

 

In case 1) - for example  &#x2010; (dash)

I found that String functions at
http://developer.marklogic.com/pubs/3.2/apidocs/StringBuiltins.html

don't work with such an entities (? or maybe don't know how to process it) -
fn:contains() for example

 

So, here is workaround sample code:

 

 

define function ormat-symbol($title as xs:string, $symbol as xs:integer,
$new-symbol as xs:integer) as xs:string {

    let $codes := fn:string-to-codepoints($title)

    let $index := fn:index-of($codes, $symbol)

    return

        if (fn:exists($index)) then (

            let $new-seq := fn:remove($codes, $index)

            return

                fn:codepoints-to-string(fn:insert-before($new-seq, $index,
$new-symbol))

        )

        else $title

}

 

For entity above the function call will be 

format-symbol(fn:data($title), 8208, 45)   10x to CQ

and the display will be OK: beforeString-afterString, not
beforeString#afterString

 

BUT:

-          this will not work on dynamic pdf generation, where you don't
know what's coming in

-          I implemented the above functionality only for the pdf chapter
title - the issue entity is met more than 50 times

      in xml content. 

      It will be rather an ugly approach with huge performance consequences
to process all the content with

      function like above, because there are hundred of thousands (or more)
characters in single xml.

 

 Do you have any idea for a better workaround ? 

 

 

In case 2) - to display correctly the entity, the font for it must be set
explicitly, which is also an issue at

Dynamic pdf generation. you just don't know what character is coming to set
the appropriate default font.

 

 

I use fop 0.93 and the project I'm working on is not "prove of concept", but
more like commercial one - it's completely 

Undesirable  # symbols to appear in final pdf.

 

I would appreciate any comment, help, workaround on the topic.

Thanks in advance.  
Received on Monday, 23 July 2007 23:29:39 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 3 October 2007 16:06:14 GMT