Re: How is parsing element and attribute names a string matching problem?

When you parse a document, in addition to finding the various tokens (the "breaking into pieces" part), you must also match the tokens to element or attribute names defined in the markup language in order to tell if the document is valid or to form the document's structure (e.g. the DOM). Normalization affects the tokenization part (combing marks can, for example, interact with the angle brackets and such) as well as the matching part... which is what makes it a string matching problem.

Addison Phillips
Globalization Architect (Lab126)
Chair (W3C I18N WG)

Sent from my Kindle Fire HD

"Costello, Roger L." <costello@mitre.org> wrote:

Hi Folks,

In section 3.1.1 [1] of the document,

    Character Model for the World Wide Web 1.0: Normalization

it says:

    Examples of string matching abound: parsing
    element and attribute names in Web documents ...

How is parsing element and attribute names a string matching problem?

When I think of "parsing" I think of breaking up a string into parts: here's a start tag, here's content, here's an end tag. I don't see it as a string matching problem. Would you explain how parsing element and attribute names is a string matching problem please?

/Roger

[1] http://www.w3.org/TR/charmod-norm/#sec-WhyNormalization

Received on Sunday, 27 January 2013 18:12:18 UTC