W3C home > Mailing lists > Public > www-svg@w3.org > June 2009

Re: Searching for text in an SVG image?

From: Doug Schepers <schepers@w3.org>
Date: Thu, 11 Jun 2009 18:30:03 -0400
Message-ID: <4A31856B.70907@w3.org>
To: Helder Magalhães <helder.magalhaes@gmail.com>
CC: "DuCharme, Bob" <BDuCharme@innodata-isogen.com>, www-svg@w3.org
Hi, Bob-

As Helder says, it would be very hard (and often inaccurate) to 
programmatically guess at the author's intent, and authoring tools 
should try to reflect that intent as accurately as they can, in the 
manner described by the specification (and Helder ^_^ ).

In addition, the SVG WG is writing an SVG spec that deals in part with 
accessibility, which should clarify this further for authoring tools.

However, all that said... if you could analyze the patterns used by 
major SVG authoring tools (like Inkscape, Illustrator, CorelDraw, XaraX, 
etc.) in laying out text, it might be possible to see how they decompose 
that and make judgments accordingly.  I would hope we could avoid that 
by improvements in authoring tools, though.

Regards-
-Doug Schepers
W3C Team Contact, SVG and WebApps WGs


Helder Magalhães wrote (on 6/9/09 7:47 AM):
> Hi Bob,
>
>
> First of all, note that the www-svg mailing list "is for technical
> discussion on Scalable Vector Graphics (SVG) and its specifications"
> [1]. For general SVG support there are more appropriate mailing lists
> such as svg-developers [2]. ;-)
>
>
>>  I know that when I see "hello" in an SVG file, it may have been put there
>>  with a single text element, but it may have been put there with 5 text
>>  elements, each placing a single letter in the image. The latter would
>>  obviously be more difficult to search for.
>
> Well, I'd say that placing 5 text elements would be semantically
> incorrect, as they would no longer be conceptually seen as a whole
> word or sentence; also, when placing glyphs separately, the only way
> (I can currently think of) to try linking them would be through
> position heuristics post rendering (or based in text coordinates after
> taken all transformations into consideration, character dimension
> etc.), which is basically reverse engineering to guess what the author
> meant... :-|
>
> Note that SVG has several interesting text layout features such as
> alignment properties and text on a path [3] which should help towards
> precise glyph placement in order to achieve the desired result. I take
> the opportunity to suggest a couple of interesting articles on "SVG
> and Typography" [4] [5]. ;-)
>
>
>>  Has anyone heard of a SVG
>>  programming library that makes such searches easier?
>
> No. I'm aware that some SVG implementations, such as Batik (Squiggle)
> [6], implement text search functionality (using Squiggle, though the
> "Edit" menu and choosing "Find..."), though I can imagine none has
> implemented the heuristics already described which you seem to be
> seeking.
>
> If you really need to go in that direction (for example, if you don't
> control the generated SVG input nor can change the SVG generation,
> whether by changing authoring habits whether by changing the SVG
> output of some tool) then you may want to take a look at the "Machine
> Accessibility" [7] section of the SVGIG wiki, with focus on the "XSLT
> File" subsection, which can be used as a starting point. :-)
>
>
> [below in the original message]
>>  Disclaimer:
> [...]
>
> I'd suggest not posting email disclaimers into mailing lists (the
> TortoiseSVN mailing list etiquette [8] has a "Note about e-Mail
> disclaimers"). I'm not sure what's the specific guidelines regarding
> this specific mailing list, but this is a general suggestion. ;-)
>
>
>>  thanks,
>>  Bob
>
> Hope this helps,
>   Helder
>
>
> [1] http://lists.w3.org/Archives/Public/www-svg/
> [2] http://tech.groups.yahoo.com/group/svg-developers/
> [3] http://www.w3.org/TR/SVG/text.html
> [4] http://www.xml.com/pub/a/2004/04/07/svgtype.html
> [5] http://www.xml.com/pub/a/2004/05/12/svg.html
> [6] http://xmlgraphics.apache.org/batik/
> [7] http://www.w3.org/Graphics/SVG/IG/wiki/Accessibility_Activity#Machine_Accessibility
> [8] http://tortoisesvn.tigris.org/list_etiquette.html
>
>

-- 
Received on Thursday, 11 June 2009 22:30:09 GMT

This archive was generated by hypermail 2.3.1 : Friday, 8 March 2013 15:54:42 GMT