Re: [XRI] IRI thread

Hi Felix,

Unfortunately I am not managing to be any more prompt:)

On 21-Jul-08, at 4:03 AM, Felix Sasaki wrote:

> Hi John (putting Martin Duerst into the loop),
> sorry for my late reply, a mixture of holiday and travel is my excuse.
> John Bradley さんは書きました:
>> Hi Felix,
>> Thanks for the input.
>> IRI  is related to the xri: scheme discussion.
>> One of the reasons (perhaps not the only reason) people don't want  
>> xri: to be a scheme is the concern that the IRI form of XRI will be  
>> used for XML namespace declarations.
>> This seems to be a general problem not specific to XRI.
>> There is a community that believes that all XML namespace  
>> declarations should be http: URLs  unless there is some super  
>> compelling reason to do something else.  I may even fall into that  
>> camp myself.
>> I think David Orchard and I agree that if someone wants to use XRI  
>> versioning or something else in a XML namespace declaration they  
>> MUST use the HXRI form that way it is a just a normal URL from a  
>> XML processing perspective.
>> I have put this to members of the XRI-TC and the above is generally  
>> uncontroversial.
>> The question becomes how do you stop people from using the xri:  
>> scheme.
>> One effective way is to not issue the scheme.    This seems to be  
>> the preferred solution by some W3C TAG members.
>> The perhaps unintended byproduct is that without a scheme we can't  
>> represent a XRI as a IRI in other places that might be more  
>> appropriate like a UI.
>> One thing we could still do is use the IRI form of the HXRI this  
>> would be a normal IRI with the http: scheme.
>> This has certain problems in that we have specified NFKC  
>> normalization rather than the NFC normalization that http uses for  
>> the path.
> From my understanding (which is mainly based on discussion with  
> Martin) NFKC includes NFC. The main difference is that NFKC is  
> getting rid of characters that have a compatibility decomposition,  
> see e.g. appendix J of
> So you could require NFC, add a recommendation like
> "Characters which have a compatibility decomposition (those with a  
> "compatibility formatting tag" in field 5 of the Unicode Character  
> Database -- marked by field 5 beginning with a "<") should not be  
> used in names. This suggestion does not apply to #x0E33 THAI  
> CHARACTER SARA AM or #x0EB3 LAO CHARACTER AM, which despite their  
> compatibility decompositions are in regular use in those scripts."
> and be fine.

Yes the main difference between NFKC and NFC is that NFKC has fewer  
compatibility characters.

We chose NFKC as our normalization for the XRI IRI scheme given that  
this is being used for normalizing user input for authority resolution.

The registration process for a global iName is out of scope for the  
XRI spec.

However it is probably useful to describe the process for people as a  

iNames may start with a number of Global Context symbols.
@ and = prefix names registered in the GRS administered by XDI.ORG.
Names can also start with a IP or DNS cross-reference.

If the first XRI authority subsegment is = or @ then the second  
subsegment is registered under the following rules.

1. There are three character tables - Latin, Hangul and Combined- 
2. Each of the tables contain a subset of ASCII characters permissible  
for registration, plus the additional characters for each script  
included in the table.
3. An i-name to be registered must contain only characters that are  
from a single table. For example, an i-name that contains a Hangul  
character and a Katakana character will not be registrable.
4. An i-name must not be visually similar to an existing registered i- 
name. This is determined by using data from the Unicode Consortium ( 
). For example, if an existing i-name called =résumé is already  
registered, the global registry will not permit =resume to be  

There will be other character tables added as demand dictates.  I  
personally interested in adding Arabic.

For other subsegments NFKC rules apply to name creation.

The real problem occurs when someone enters an iName in a local  
character set and it must be normalized as a part of the input  
process.  This happens with openIDs,

If we don't use a IRI xri: scheme,  and try to fit everything into the  
http: IRI normalization rules for the path segment,  we wind up with  
the possibility of failing match rules because of the extra  
compatibility characters included in NFC.

I admit that I may be the only person who sees this as a problem.

I want to use my @id*五里霧中 iName as an openID  and I can now at  
a number of sites.

I have just added it as a OSIS interop feature test for RPs.

xri://@id*五里霧中  makes it clear that it is an IRI with the  
specific normalization rules that apply to the scheme.*五里霧     may well be interpreted with http  
normalization rules.

I can tell you at the moment the only IRIs that work in openID are the  
ones using the proposed xri: scheme.

There may be good reasons to not want XRI: scheme IRIs turning up in  
XML namespace declarations,  however trying to properly support IRI  
was I think a good goal of XRI.

I just hate giving up progress on the IRI front.

Best Regards
John Bradley

>> Without a scheme and defining our IRI transforms against the scheme  
>> IRI is certainly more awkward to deal with.
>> I want to know if there are opinions for or against having a IRI  
>> form of a HXRI lets call it a IHXRI.
> I don't see the need to make a separation between identifiers for  
> namespaces and others, but I won't oppose to it. I do see a value to  
> allow identifiers which include non-ASCII characters.
> Regards, Felix.
>> Your feedback is greatly appreciated.
>> Regards
>> John Bradley
>> 五里霧中
>> PS my 漢字 sucks I cheat and get Nat Sakimura to translate if I  
>> need to:)
>> On 17-Jul-08, at 3:41 AM, Felix Sasaki wrote:
>>> Hello John,
>>> John Bradley さんは書きました:
>>>> I want to rase a question to the group in general.
>>>> The XRI-TC has defined 7 forms for the representation of XRI, and  
>>>> the transformations between them.
>>>> I reviewed them in response to a question by David Orchard on  
>>>> this thread July 14.
>>>> Three of those forms involve using the xri: scheme indicator at  
>>>> the start.
>>>> Thee forms one scheme? How can the be?
>>>> I think this question is causing some of the push back.
>>>> People are concerned that strings that have a valid scheme  
>>>> prepended are not valid URIs.
>>>> This is true, however this situation is NOT the invention of the  
>>>> XRI-TC, nor is it unique to XRI.
>>>> The XRI-TC followed RFC 3987 to allow internationalized forms of  
>>>> XRIs.
>>> I personally think that this is the right approach (without  
>>> judging XRIs in general).
>>>> XRI has two IRI forms:
>>>> 1. IRI-Normal This allows UTF-16 Though UTF-8 is recommended
>>>> 2. IRI-UTF8 A more restrictive form allowing only UTF-8
>>>> The one difference between a http: IRI and a xri: IRI is that XRI  
>>>> specifies the more restrictive NFKC Normalization across the  
>>>> entire string, Where http uses two separate normalization's  
>>>> PUNyCODE for the Authority segment and NFC for the path, and  
>>>> don't ask about the query string:)
>>>> XRI has one and only one URI form. The transforms to and from  
>>>> this form are clearly defined.
>>>> This is the form that is uses anyplace a URI is required. A IRI  
>>>> is NOT a URI, it would be WRONG to use a IRI in an XML document  
>>>> for name-spacing.
>>>> The XML specs are clear and unambiguous use a URI.
>>>> XRI clearly differentiates between the two things.
>>>> I am currently getting surprising push back on defining IRIs for  
>>>> use with openID. With ICANN's recent decisions on DNS http: IRIs  
>>>> are coming.
>>>> If we had something other than a URI scheme to identify a IRI  
>>>> that might address some of the issues.
>>>> I am tempted to ask if people are opposed to IRI RFC3987 in some  
>>>> way? However that would probably be impolitic.
>>>> Yes there are many open question regarding XRI's fundamental  
>>>> right to exist.
>>>> However is there an issue around our use of IRI that is going  
>>>> unspoken?
>>>> If there was no IRI form would anyone think that having a xri:  
>>>> scheme was a more reasonable thing.
>>> I don't see any issues and, seeing no responses to your question  
>>> in this thread I think others agree silently with that.
>>>> I don't want to dismiss the opinion expressed on this thread that  
>>>> having a scheme is the appropriate way to represent a protocol  
>>>> other than http being used for a URI.
>>>> I think there are three major options at this point:
>>>> 1. Use a URI scheme to indicate that a string is an XRI, Plus  
>>>> HXRI for backwards compatibility with browsers and click behavior.
>>>> 2. HXRI with special coding in the authority segment
>>>> 3. HXRI with special encoding in the Path.
>>>> I suppose there is a fourth possibility which is only using xri:  
>>>> on the URI form and not having an IRI form.
>>>> I suppose we could always define a http: IRI form?
>>>> So I would appreciate your thoughts on how IRI plays into this  
>>>> discussion on XRI.
>>> IMO IRIs are unrelated to the main topic of this discussion.
>>> Felix
>>>> Best Regards
>>>> John Bradley

Received on Wednesday, 23 July 2008 22:07:35 UTC