RE: Issue JW24d (xml:lang)

> I think Jim already provided a good example, but RFC2277 has another
> similar one:  one might reasonably wish to do a large search for
> documents with the name of a specific tree in Norwegian.  The name of
> the tree is 'ask'.  It's useless to get all the English documents with
> the word 'ask' in response to that query.  If there *are* body or
> properties typed as Norwegian, then our search syntax must be able to
> specify that the search engine should match these first.

Though I'm no expert on ideographic languages, I think there might be cases
where the meaning of a specific UNICODE character might vary depending on
the language tag.

As for a specific proposal, here's the sketch of one:

           |   server can do lang     server cannot do
           |   specific searching     lang spec. searching
-----------+----------------------------------------------
xml:lang   |   perform search using   either:
present    |   xml:lang info          (a) reject request
           |                          (b) use default search
           |                          but inform client that
           |                          xml:lang was ignored
           |
xml:lang   |   perform search using   perform search using
not present|   server's default       server's default
           |   search technique       search technique
           |   (character-match,
           |   indep. of language)

I don't think it makes sense to make use/non-use of language information
discoverable.

- Jim

Received on Tuesday, 14 January 2003 12:13:28 UTC