Re: A question for RDF parser implementers - whitespace

/ Graham Klyne <> was heard to say:
| I've just noticed something the the RDF syntax which has me wondering
| how RDF parser implementers are dealing with whitespace in literals.

Goldfarb's Law: If there's a problem in a text processing system, it
involves whitespace.

| The RDF syntax spec
| (, section
| 6.1.9 on typed literals, mentions "In XML Schema (part 1)
| [XML-SCHEMA1], white space normalization occurs during validation
| according to the value of the whiteSpace facet. The syntax mapping
| used in this document occurs after this, so the whiteSpace facet
| formally has no further effect."

Fair enough. The RDF spec is using the schema-normalized-value of
typed literals.

| But, given that RDF/XML an open-ended tag set, schema validation of
| RDF/XML doesn't make a lot of sense.

Well, I'm not sure that follows. If you invent an RDF/XML vocabulary,
you could write a W3C XML Schema for it.

| Further, I don't have an XML
| schema processor to do such validation.

Ah, well, that's a different issue :-)

| How are other implementers dealing with this?  My inclination is to
| pass the original literal, with whitespace intact, and allow the
| subsequent datatype processing to treat it with the same effect as if
| the whiteSpace had been eliminated by schema validation.  For this
| purpose, the whitespace facet is implicitly part of the datatype.
| Does anyone have any other ideas on this?

I think if you are parsing a typed literal and you know you're parsing
a typed literal, you should collapse the whitespace before passing the
value on to down-stream applications.

Given that the RDF spec says that whitespace is eliminated by
validation, I can easily imagine writing an application that assumes
typed values like integers and URIs won't have insignificant
whitespace around them.

                                        Be seeing you,

Norman.Walsh@Sun.COM / XML Standards Architect / Sun Microsystems, Inc.
NOTICE: This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged information.
Any unauthorized review, use, disclosure or distribution is prohibited.
If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.

Received on Friday, 9 July 2004 07:57:05 UTC