W3C home > Mailing lists > Public > public-rif-wg@w3.org > May 2009

Re: collations, etc (a list-builtins issue)

From: Jos de Bruijn <debruijn@inf.unibz.it>
Date: Mon, 04 May 2009 16:23:28 +0200
Message-ID: <49FEFA60.30209@inf.unibz.it>
To: Sandro Hawke <sandro@w3.org>
CC: public-rif-wg@w3.org
I assumed you were talking only about list operators, so I mean indeed 1b.

> When you say users "have to define the funtions themselves", you mean
> using rules to re-implement member, index-of, etc?

Yes.


Jos

Sandro Hawke wrote:
>> I am in favor of option 1: the list operators simply work on the values
>> in the lists, rather than performing all kinds of conversions.  If users
>> want something more, they have to define the functions themselves.
> 
> Sorry, I guess I need more detail within option 1.  Which do you mean:
> 
>     1a.   Remove all 'collation' paramaters from DTB, including on 
>           the string compare builtin.
> 
>     1b.   Remove 'collation' parameters from the list builtins, but
>           leave them in the non-list builtins.
> 
>     1c.   Keep 'collation' in the list builtins (so strings are compared
>           using the compare builtin) but otherwise compare values using
>           RIF equality
> 
> (I'd guess you mean 1b.)
> 
> When you say users "have to define the funtions themselves", you mean
> using rules to re-implement member, index-of, etc?   
> 
>      -- Sandro
> 
>> Jos
>>
>> Sandro Hawke wrote:
>>> In writing the spec for the List builtins, I've come across a difficult=
>>> design choice concerning how literals are compared.  (Some of this migh=
>> t
>>> be considered already decided, but it seems to me there's a fair amount=
>>> of new information here, relevantstuff I didn't know at the F2F.)
>>> =20
>>> Background:
>>> =20
>>>   - In RIF, you have two ways you can compare most literals:
>>> =20
>>>         (1) You can use rif:equals, which is true iff the elements in
>>>             the value space for the two literals are the same.
>>>             Literals with types with disjoint value spaces will never
>>>             compare as equal
>>> =20
>>>                   true:    "01":xs^int =3D "1":xs^int	   =20
>>>                   false:   "1":xs^int =3D "1"^xs^float
>>>                   false:   "1":xs^double =3D "1"^xs^float
>>> 		  false:   "2002-04-02T12:00:00-01:00"^^xs:dateTime=20
>>>                            =3D "2002-04-02T17:00:00+04:00"^^xs:dateTime=
>> )
>>>                   false:   "Strasse" =3D "Stra=C3=9Fe"
>>> =20
>>>         (2) You can use a builtin comparator like numeric-equal,
>>> 	    dateTime-equal, date-equal, time-equal, duration-equal,
>>> 	    XMLLiteral-equal, compare, and text-compare.  These
>>> 	    builtins allow more values to be considered equal, for
>>> 	    example:
>>> =20
>>>                   true:   "1":xs^int =3D "1"^xs^float
>>>                   true:   "1":xs^double =3D "1"^xs^float
>>> 		  true:   op:dateTime-equal(
>>>                              "2002-04-02T12:00:00-01:00"^^xs:dateTime, =
>>>                              "2002-04-02T17:00:00+04:00"^^xs:dateTime)
>>> =20
>>>             In addition, for the comparison of strings and text, an
>>>             optional 'collation' parameter is available:
>>> =20
>>> 	          false:   0 =3D=3D compare("Strasse", "Stra=C3=9Fe")
>>> 	          true:    0 =3D=3D compare("Strasse", "Stra=C3=9Fe", "deutsch=
>> ")
>>> =20
>>>    - As I understand it, the 'collation' is an extensibility
>>>      point. XPath-Functions uses the examples 'deutsch',
>>>      "http://www.example.com/collations/French1", and
>>>      "http://www.example.com/collations/French2", but only defines
>>>      (and requires) one collation, the default:
>>>      "http://www.w3.org/2005/xpath-functions/collation/codepoint" which=
>>>      does unicode normalization (sort of).   See
>>>      http://www.w3.org/TR/xpath-functions/#collations and
>>>      http://www.unicode.org/unicode/reports/tr10/
>>> =20
>>>    - Some list builtins need to compare values.=20
>>> =20
>>>           * member; this is not in XPath-Functions
>>> =20
>>>           * index-of and distinct-values; these take a collation
>>>             parameter in XPath-Function.  The collation is defined to
>>>             apply whenever the values are strings.
>>> =20
>>>           * union, intersect, except; these conceptually compare
>>>             elements, but in XPath-Functions they only operate on
>>>             lists of nodes, so string comparison doesn't come into
>>>             play, and no collation is passed; in added them as
>>>             parameters in drafting the text for DTB.
>>> =20
>>>       Also, some non-list functions in DTB uses collations: compare,
>>>       substring-before, substring-after, contains, starts-with,
>>>       ends-with, and text-compare.
>>> =20
>>>    - Although formally a "collation" is a total preorder (a "compare"
>>>      function, returning -1, 0, 1 for each pair of values, like a
>>>      total order but with equalities), our primary use for it is
>>>      merely do determine equal/not-equal, not to sort things into
>>>      their order.
>>> =20
>>> The Question:
>>> =20
>>>    Which type of literal comparison should the list builtins use?  If,
>>>    like XPath-Functions (and DTB, right now), we let users choose how
>>>    strings are being compared, and there's an obvious choice to make
>>>    in comparing numbers and dates, isn't it odd to not give them the
>>>    same flexibility there?
>>> =20
>>>    Specifically:
>>>        =20
>>>         Question 1: What ways, if any, do we provide for users
>>>                     to specify how literals are compared?
>>> =20
>>>         Question 2: If/when users do not specify how literals are
>>>                     compared?  Specifically do we default to rif:equal
>>>                     or the builtin comparators?=20
>>> =20
>>>    Note also that if the rule author doesn't specify a collation, then
>>>    the rule system *user* might.  That is, the ruleset might not pay
>>>    attention to language, but the user might say they want the French
>>>    collation etc.
>>> =20
>>> Options for Question 1:
>>> =20
>>>    1.  No rule-author control.  Get rid of collations in the API.
>>>        (This might be considered unacceptable by i18n folks; I don't
>>>        know.)
>>> =20
>>>    2.  As written in DTB now: users can offer a URI indicating how
>>>        strings are to be compared; only one such URI is defined and
>>>        required, so this feature can only be used within environments
>>>        that implement some extension here.  No way to control which
>>>        kind of comparison is done for other literals.
>>> =20
>>>    3.  Extend the notion of collations to cover all our literals.
>>>        Instead of passing 'French', you could pass a collation that
>>>        indicated which kind of numerical comparison should be used.
>>>        (The problem is that if you do that, then what happens to the
>>>        user's local use-French-collation setting?)
>>> =20
>>>    4.  Keep 'collations' for strings, and add a similar but different
>>>        comparison parameter.   For example, member would be:
>>> =20
>>>              member(item, list)
>>>              member(item, list, comparator)
>>>              member(item, list, comparator, collation)
>>> =20
>>>        Here, I'm imagining comparator to be a term which for now could
>>>        only be certain pre-defined rif:iris (like collations) -- one
>>>        for each of the two types of RIF equality.  But in dialects
>>>        with higher-order functions, they could be functions defined in
>>>        the ruleset.
>>> =20
>>>        Technical notes:
>>> =20
>>>             - we have to put the comparator argument before the
>>>               collation argument, because we have no way of omitting
>>>               any argument but the last one(s), and sometimes you need
>>>               to supply a comparator but no collation (eg, when you
>>>               want to let the user supply the collation); you never
>>>               need to supply the collation and not the comparator,
>>>               since we'll define a fixed value for the default
>>>               comparator.  (The difference is that while end-users
>>>               might control collations, they're not going to be
>>>               controlling comparators.)
>>> =20
>>>             - when we let users define their own comparators, the
>>>               obvious thing to ask them to define is an "equal"
>>>               predicate with two parameters.  This approach seriously
>>>               impacts the complexity class of the builtins; for
>>>               example, member has to be done as a linear search,
>>>               instead of using a binary search or a hash table (in the
>>>               common cases where the list is known to have some
>>>               structure/ordering.)  Instead, users should either
>>>               define a "compare" function (returning -1, 0, or 1) so
>>>               sorting can be done, or a "fold" function (returning a
>>>               string which is the same for all "equal" values) so
>>>               hashing can be done.
>>> =20
>>> =20
>>>        This lack of higher-order function syntax is a pain, but I think=
>>>        we can live with it here by defining two comparator IRI's, maybe=
>>>        func:literal-compare and func:value-compare, which in our
>>>        existing dialects can only be used as a comparator argument.  If=
>>>        you could use them as functions, literal compare would be like
>>>        the ordered version of rif:equal for literals -- I'd suggest the=
>>>        ordering between disjoint value spaces just be alphabetical orde=
>> r
>>>        of the datatype IRI. Similarly, value-compare is just the big
>>>        expression using every guard and then the builtin comparators fo=
>> r
>>>        that type.    =20
>>> =20
>>>        It is pretty goofy to define these two functions and say you can=
>>>        only use them as a parameter -- you can't really call them -- bu=
>> t
>>>        that's still the best option I see right now.
>>> =20
>>> Thoughts?
>>> =20
>>>      - Sandro
>>> =20
>> --=20
>> +43 1 58801 18470        debruijn@inf.unibz.it
>>
>> Jos de Bruijn,        http://www.debruijn.net/
>> ----------------------------------------------
>> Many would be cowards if they had courage
>> enough.
>>   - Thomas Fuller
>>
>>
>> --------------ms050401010203090100000405
>> Content-Type: application/x-pkcs7-signature; name="smime.p7s"
>> Content-Transfer-Encoding: base64
>> Content-Disposition: attachment; filename="smime.p7s"
>> Content-Description: S/MIME Cryptographic Signature
>>
>> MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJEzCC
>> AuQwggJNoAMCAQICEFQWJg3375t1YRYi6x5QpKcwDQYJKoZIhvcNAQEFBQAwYjELMAkGA1UE
>> BhMCWkExJTAjBgNVBAoTHFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4xLDAqBgNVBAMT
>> I1RoYXd0ZSBQZXJzb25hbCBGcmVlbWFpbCBJc3N1aW5nIENBMB4XDTA5MDEyODE5NTcxNVoX
>> DTEwMDEyODE5NTcxNVowRzEfMB0GA1UEAxMWVGhhd3RlIEZyZWVtYWlsIE1lbWJlcjEkMCIG
>> CSqGSIb3DQEJARYVZGVicnVpam5AaW5mLnVuaWJ6Lml0MIIBIjANBgkqhkiG9w0BAQEFAAOC
>> AQ8AMIIBCgKCAQEAsENUfWYEG8PFApSgNPgfPDmMihwtSHvsq1+yVeKKGel+k/nresDU343R
>> Nz4QCrLeIVhzjUoSUvpbIViBzPw5T+3i0SGmwAoKvYLlw/5Al8JBlKxipf6ZkXLwa9+3agZZ
>> /TzH6FLcJeoYak7ryUFtJOipYiI2ClPlx8porLrOmikAiPmAbxx0rq0Edq4cAxaMDk9lqni4
>> ZaQWgR00MX81+nq1FqIB3KavPeJaJjnB9njHhan64PxUzFKaRgg1d2u1Pi8NfDqElzua0tu+
>> xoXe/alvLVGtTjitRyCsYrTcTt+hZDCcAg65nwlcs1/oaFz/BP2dSYZAk4LEya4kFj+UqQID
>> AQABozIwMDAgBgNVHREEGTAXgRVkZWJydWlqbkBpbmYudW5pYnouaXQwDAYDVR0TAQH/BAIw
>> ADANBgkqhkiG9w0BAQUFAAOBgQBHGdK4P2l67dEm6SvMfklpDPPE5b0hClBw6XOO9XahEYmQ
>> oeq5jxeBp3EdZxbeZtSUjllvJi7wsOKhCqaipe44GzuW5QDziWiAGg3aMrtRBaJXIR9F6MED
>> IWSLksjq5SAEU7uX4HT/sAe6P2F0oe/QzItO/qgrh6NI4vGxw4yt2zCCAuQwggJNoAMCAQIC
>> EFQWJg3375t1YRYi6x5QpKcwDQYJKoZIhvcNAQEFBQAwYjELMAkGA1UEBhMCWkExJTAjBgNV
>> BAoTHFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4xLDAqBgNVBAMTI1RoYXd0ZSBQZXJz
>> b25hbCBGcmVlbWFpbCBJc3N1aW5nIENBMB4XDTA5MDEyODE5NTcxNVoXDTEwMDEyODE5NTcx
>> NVowRzEfMB0GA1UEAxMWVGhhd3RlIEZyZWVtYWlsIE1lbWJlcjEkMCIGCSqGSIb3DQEJARYV
>> ZGVicnVpam5AaW5mLnVuaWJ6Lml0MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA
>> sENUfWYEG8PFApSgNPgfPDmMihwtSHvsq1+yVeKKGel+k/nresDU343RNz4QCrLeIVhzjUoS
>> UvpbIViBzPw5T+3i0SGmwAoKvYLlw/5Al8JBlKxipf6ZkXLwa9+3agZZ/TzH6FLcJeoYak7r
>> yUFtJOipYiI2ClPlx8porLrOmikAiPmAbxx0rq0Edq4cAxaMDk9lqni4ZaQWgR00MX81+nq1
>> FqIB3KavPeJaJjnB9njHhan64PxUzFKaRgg1d2u1Pi8NfDqElzua0tu+xoXe/alvLVGtTjit
>> RyCsYrTcTt+hZDCcAg65nwlcs1/oaFz/BP2dSYZAk4LEya4kFj+UqQIDAQABozIwMDAgBgNV
>> HREEGTAXgRVkZWJydWlqbkBpbmYudW5pYnouaXQwDAYDVR0TAQH/BAIwADANBgkqhkiG9w0B
>> AQUFAAOBgQBHGdK4P2l67dEm6SvMfklpDPPE5b0hClBw6XOO9XahEYmQoeq5jxeBp3EdZxbe
>> ZtSUjllvJi7wsOKhCqaipe44GzuW5QDziWiAGg3aMrtRBaJXIR9F6MEDIWSLksjq5SAEU7uX
>> 4HT/sAe6P2F0oe/QzItO/qgrh6NI4vGxw4yt2zCCAz8wggKooAMCAQICAQ0wDQYJKoZIhvcN
>> AQEFBQAwgdExCzAJBgNVBAYTAlpBMRUwEwYDVQQIEwxXZXN0ZXJuIENhcGUxEjAQBgNVBAcT
>> CUNhcGUgVG93bjEaMBgGA1UEChMRVGhhd3RlIENvbnN1bHRpbmcxKDAmBgNVBAsTH0NlcnRp
>> ZmljYXRpb24gU2VydmljZXMgRGl2aXNpb24xJDAiBgNVBAMTG1RoYXd0ZSBQZXJzb25hbCBG
>> cmVlbWFpbCBDQTErMCkGCSqGSIb3DQEJARYccGVyc29uYWwtZnJlZW1haWxAdGhhd3RlLmNv
>> bTAeFw0wMzA3MTcwMDAwMDBaFw0xMzA3MTYyMzU5NTlaMGIxCzAJBgNVBAYTAlpBMSUwIwYD
>> VQQKExxUaGF3dGUgQ29uc3VsdGluZyAoUHR5KSBMdGQuMSwwKgYDVQQDEyNUaGF3dGUgUGVy
>> c29uYWwgRnJlZW1haWwgSXNzdWluZyBDQTCBnzANBgkqhkiG9w0BAQEFAAOBjQAwgYkCgYEA
>> xKY8VXNV+065yplaHmjAdQRwnd/p/6Me7L3N9VvyGna9fww6YfK/Uc4B1OVQCjDXAmNaLIkV
>> cI7dyfArhVqqP3FWy688Cwfn8R+RNiQqE88r1fOCdz0Dviv+uxg+B79AgAJk16emu59l0cUq
>> VIUPSAR/p7bRPGEEQB5kGXJgt/sCAwEAAaOBlDCBkTASBgNVHRMBAf8ECDAGAQH/AgEAMEMG
>> A1UdHwQ8MDowOKA2oDSGMmh0dHA6Ly9jcmwudGhhd3RlLmNvbS9UaGF3dGVQZXJzb25hbEZy
>> ZWVtYWlsQ0EuY3JsMAsGA1UdDwQEAwIBBjApBgNVHREEIjAgpB4wHDEaMBgGA1UEAxMRUHJp
>> dmF0ZUxhYmVsMi0xMzgwDQYJKoZIhvcNAQEFBQADgYEASIzRUIPqCy7MDaNmrGcPf6+svsIX
>> oUOWlJ1/TCG4+DYfqi2fNi/A9BxQIJNwPP2t4WFiw9k6GX6EsZkbAMUaC4J0niVQlGLH2ydx
>> VyWN3amcOY6MIE9lX5Xa9/eH1sYITq726jTlEBpbNU1341YheILcIRk13iSx0x1G/11fZU8x
>> ggNkMIIDYAIBATB2MGIxCzAJBgNVBAYTAlpBMSUwIwYDVQQKExxUaGF3dGUgQ29uc3VsdGlu
>> ZyAoUHR5KSBMdGQuMSwwKgYDVQQDEyNUaGF3dGUgUGVyc29uYWwgRnJlZW1haWwgSXNzdWlu
>> ZyBDQQIQVBYmDffvm3VhFiLrHlCkpzAJBgUrDgMCGgUAoIIBwzAYBgkqhkiG9w0BCQMxCwYJ
>> KoZIhvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0wOTA1MDQxMjA2MTdaMCMGCSqGSIb3DQEJBDEW
>> BBSN0yP7Ok1Uendj/7L3g7TdWg+EPDBSBgkqhkiG9w0BCQ8xRTBDMAoGCCqGSIb3DQMHMA4G
>> CCqGSIb3DQMCAgIAgDANBggqhkiG9w0DAgIBQDAHBgUrDgMCBzANBggqhkiG9w0DAgIBKDCB
>> hQYJKwYBBAGCNxAEMXgwdjBiMQswCQYDVQQGEwJaQTElMCMGA1UEChMcVGhhd3RlIENvbnN1
>> bHRpbmcgKFB0eSkgTHRkLjEsMCoGA1UEAxMjVGhhd3RlIFBlcnNvbmFsIEZyZWVtYWlsIElz
>> c3VpbmcgQ0ECEFQWJg3375t1YRYi6x5QpKcwgYcGCyqGSIb3DQEJEAILMXigdjBiMQswCQYD
>> VQQGEwJaQTElMCMGA1UEChMcVGhhd3RlIENvbnN1bHRpbmcgKFB0eSkgTHRkLjEsMCoGA1UE
>> AxMjVGhhd3RlIFBlcnNvbmFsIEZyZWVtYWlsIElzc3VpbmcgQ0ECEFQWJg3375t1YRYi6x5Q
>> pKcwDQYJKoZIhvcNAQEBBQAEggEAZqO0ggj+NYCfrBPXJxU5GdfA5rlf4HPzoY1XqhqOTMKU
>> LMH67v6d22Zxw/2JUtDPVZ6v4FDyVR50VG4MNH+Inj3kn79Xr+D0vd2WcwC7XgiuPLgPTA+5
>> ysxkVz05nBSYZEjszXoY0SI4rVJxVBghQ8OfjcOwlZwukIDsT4erUqY0wmcUc5ZYnmE5RlH6
>> cNxBBGzr67LqDa7yLAQhFmyvueaIbHhvMwlogINDEliFwpnKUZw9fj/yXpS6lCLWI98LP3Qr
>> tcviAyYqz3iAo7aLt3zvpnFTrLnYcczLxc5zbBvXCXTyWkj2BitI00ovaVJgm5E7ygF2FJlV
>> 4LQVnQJUZAAAAAAAAA==
>> --------------ms050401010203090100000405--

-- 
+43 1 58801 18470        debruijn@inf.unibz.it

Jos de Bruijn,        http://www.debruijn.net/
----------------------------------------------
Many would be cowards if they had courage
enough.
  - Thomas Fuller


Received on Monday, 4 May 2009 14:24:20 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 18:34:08 GMT