- From: Seaborne, Andy <andy.seaborne@hp.com>
- Date: Mon, 01 Aug 2005 13:38:37 +0100
- To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Bjoern Hoehrmann wrote: > Dear RDF Data Access Working Group, > > http://www.w3.org/TR/2005/WD-rdf-sparql-query-20050721/ section 10.1 > notes "IRIs are ordered by comparing the character strings making up > each IRI" it's however not clear how character strings are compared, > I would have expected that a `string < string` operator is defined, but > section 11.1 only defines such an operator for numeric and dateTime > types. Please change the draft such that ordering of IRIs is clear. > > regards, The current grammar does have a rather open production for QuotedIRIref (anything except space and >). An IRI reference can be relative. There is a comment referring to RFC 3987 in the grammar. An implementation is going to have to additionally process IRI references anyway to make them absolute. Without including the whol of teh IRI/URI grammar, we just parse IRIs. RFC 2396 defined "excluded charcaters" as: control = <US-ASCII coded characters 00-1F and 7F hexadecimal> space = <US-ASCII coded characters 00-1F and 7F hexadecimal> delims = "<" | ">" | "#" | "%" | <"> RFC 3986 defines: pchar = unreserved / pct-encoded / sub-delims / ":" / "@" pct-encoded = "%" HEXDIG HEXDIG unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" reserved = gen-delims / sub-delims gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" RFC 3987 adds the characters of the UCS beyond U+007F to unreserved ipchar = iunreserved / pct-encoded / sub-delims / ":" / "@" iunreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" / ucschar ucschar = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD / %xD0000-DFFFD / %xE1000-EFFFD iprivate = %xE000-F8FF / %xF0000-FFFFD / %x100000-10FFFD Private can appear in a query string but not in the rest of the IRI. So the characters in an IRI reference are: ALPHA / DIGIT / "-" / "." / "_" / "~" ":" / "/" / "?" / "#" / "[" / "]" / "@" "!" / "$" / "&" / "'" / "(" / ")" "*" / "+" / "," / ";" / "=" ucschar iprivate "%" and the rq23 grammar becomes: QuotedIRIref ::= '<' IRICHAR* '>' /* An IRI reference : RFC 3987 */ IRICHAR ::= [A-Z] | [a-z] | '=' | '.' | '_' | '~' | ':' | '/' | '?' | '#' | '[' | ']' | '@' | '!' | '$' | '&' | ''' | '(' | ')' | '*' | '+' | ',' | ';' | '=' | '%' | [#xA0-D7FF] | [#xF900-FDCF] | [#xFDF0-FFEF] | [#x10000-#x1FFFD] | [#x20000-#x2FFFD] | [#x30000-#x3FFFD] | [#x40000-#x4FFFD] | [#x50000-#x5FFFD] | [#x60000-#x6FFFD] | [#x70000-#x7FFFD] | [#x80000-#x8FFFD] | [#x90000-#x9FFFD] | [#xA0000-#xAFFFD] | [#xB0000-#xBFFFD] | [#xC0000-#xCFFFD] | [#xD0000-#xDFFFD] | [#xE1000-#xEFFFD] | [#xE000-F8FF] | [#xF0000-FFFFD] | [#x100000-#x10FFFD] [I would be very grateful if someone checked this] An alternative is to exclude the illegal characters: That is (RFC3986): 0x00-0x20, 0xFF, '<' '>' "`" but with RFC3987 it isn't that short: FDD0-FDEF FFF0-FFFF 1FFFE, 1FFFF 2FFFE, 2FFFF etc for 3,4,5,6,7,8,9,A,B,C,D,E,F 10FFFE, 10FFFF, 200000 onwards Andy
Received on Monday, 1 August 2005 12:39:30 UTC