- From: Tim Kindberg <timothy@hpl.hp.com>
- Date: Tue, 19 Oct 2004 13:35:49 +0100
- To: <uri@w3.org>
- Cc: "'Sandro Hawke'" <sandro@w3.org>, "'Martin Duerst'" <duerst@w3.org>, <hardie@qualcomm.com>, "'Tim Kindberg'" <timothy@hpl.hp.com>
Following Martin Duerst's helpful comments (below), I've produced a new draft of "tag" and submitted it as an Internet-Draft (draft-kindberg-tag-uri-06). There's a delay in I-D processing at the moment, so in the meantime it can be found at http://taguri.org/06/ . The update consists mainly of changes relating to "Internationalization"*. I've also made some minor textual improvements & additions. As usual, all comments are welcome. Cheers, Tim. *"Getting internationalisation right" consisted of balancing, on the one hand, the requirement to avoid minting tag URIs/IRIs with percent-encoded characters -- in the interests of human-friendliness; and, on the other hand, the requirement to include percent-encoded characters in the syntax nonetheless: (a) to make it possible to verify the conformance of tag IRIs and (b) to provide a way for software that handles URIs but not IRIs to handle tags, albeit with some consequent issues. Tim Kindberg hewlett-packard laboratories filton road stoke gifford bristol bs34 8qz uk purl.org/net/TimKindberg timothy@hpl.hp.com voice +44 (0)117 312 9920 fax ++44 (0)117 312 8003 > -----Original Message----- > From: Martin Duerst [mailto:duerst@w3.org] > Sent: 30 August 2004 10:18 > To: Tim Kindberg > Cc: uri@w3.org; Sandro Hawke > Subject: draft-kindberg-tag-uri > > Hello Tim, > > Finally I get around to comment on the newest version of your > TAG draft, a pre-draft at > http://taguri.org/06/draft-kindberg-tag-uri-06.txt. > > The main comment is that you try to have two separate > definitions, one for TAG URIs and the other for TAG IRIs, but > that isn't how the URI spec and the IRI spec work. For > further background, please also see the issue and discussion at > http://www.w3.org/International/iri-edit#iri-scheme-38 > > I also give some comments on general issues that I found, > mostly editorial. > > > At 13:27 04/08/24 +0900, Martin Duerst wrote: > >Network Working Group > T. Kindberg > >Internet-Draft Hewlett-Packard > Corporation > >Expires: January 27, 2005 > S. Hawke > > World Wide > Web Consortium > > July 29, > >2004 > > > > > > The 'tag' URI scheme > > draft-kindberg-tag-uri-06 > > [snip] [also snipped all page breaks] > > >Abstract > > > > This document describes the "tag" Uniform Resource > Identifier (URI) > > scheme, > > This comma is somewhat confusing. It's probably best to end > the sentence here and integrate the points in the remaining > clause into the rest of the paragraph. > > > >for identifiers that are unique across space and time. Tag > > URIs (also known as "tags") are distinct from most other > URIs in that > > there is no authoritative resolution mechanism. A tag > may be used > > purely as an entity identifier. Unlike UUIDs or GUIDs > > Abbreviations shouldn't appear without expansion. (see RFC > guidelines) Also, there should be references for these terms, > but referencing doesn't fit well into an abstract. I'd > concentrate on the description of tags themselves in the > abstract, in positive terms (what tags do, not what they > don't), and put comparision with other schemes into a section > in the body of the document, with references. > > > >such as "uuid" > > So the uuid scheme is an UUID? Or a GUID? Or both? Some > readers will be confused by such minor term differences > without clear explanation. > > > > URIs and "urn:oid" URIs, tags are designed to be > tractable to humans. > > > > Furthermore, using tags has some advantages over the > common practice > > of using "http" URIs as identifiers for non-HTTP-accessible > > resources. > > [snip] > > > >1. Introduction > > > > A tag is a type of Uniform Resource Identifier (URI) [1] > designed to > > meet the following requirements: > > > > 1. Identifiers are likely to be unique across space and time, > > How likely? Very likely? Designed to make it easy to be? > > > >and > > come from a practically inexhaustible supply. > > 2. Identifiers are relatively convenient for humans to mint > > (create), read, type, remember etc. > > 3. No registration is necessary, > > -> no central registration is necessary > > > >at least for holders of domain > > names or email addresses; > > I think that each such holder who creates tags has to keep > their own registry to avoid local conflicts. The draft should > be quite a bit more explicit about this. > > > >and there is negligible cost to mint > > each new identifier. > > 4. The identifiers are independent of any particular resolution > > scheme. > > > > For example, the above requirements may apply in the > case of a user > > who wants to place identifiers on their documents: > > These are the requirements met by tags, yes? It'd be better > to just say so. > > > > a. They > > Who? The documents? The identifiers? The users? Please rework > the whole list so that all the items follow the same > syntactic structure. > > > >want to be reasonably sure that the identifier is unique. > > Global uniqueness is valuable because it prevents identifiers > > from becoming unintentionally ambiguous. > > b. It is useful for the identifier to be tractable to humans: > > 'to humans' -> 'by humans'? > > >they > > should be able to mint new identifiers conveniently, > and to type > > them into emails and forms. > > For more aspects of this (memorize,...), see the 'overview > and motivation' > section of IRIs. > > > > c. They do not want to have to communicate with anyone > else in order > > to mint identifiers for their documents. > > d. The user wants to avoid identifiers that might be > taken to imply > > the existence of an electronic resource accessible > via a default > > resolution mechanism, when no such electronic > resource exists. > > > > Existing identification schemes satisfy some but not all of the > > general requirements above. > > Why 'general'? I read it as if these requirements would always apply. > > > >For example: > > > > UUIDs [8], [9] are hard for humans to read. > > > > OIDs [10], [11] and Digital Object Identifiers [12] > require naming > > authorities to register themselves, > > 'themselves': If the identifiers register themselves, that > would be great. But the problem is that registration requires > work by an user. > > > >even if they already hold a > > domain name registration. > > So 'they' is users, not ids? But users don't register > themselves, they register some ids or schemes,... > > > > URLs (in particular, "http" URLs) are sometimes used as > identifiers > > that satisfy most of our requirements. > > 'our': Who is 'we'? Better avoid. > > > >Many users and organisations > > have already registered a domain name, and the use of > the domain name > > to mint identifiers comes at no additional cost. But there are > > drawbacks to URLs-as-identifiers: > > > > o An attempt may be made to resolve a > URL-as-identifier, even though > > there is no resource accessible at the "location". > > o Domain names change hands and the new assignee of a > domain name > > can't be sure that they are minting new names. For > example, if > > example.org is assigned first to a user Smith and > then to a user > > Jones, there is no systematic way for Jones to tell > whether Smith > > has already used a particular identifier such as http:// > > example.org/9999. > > o Entities could rely on purl.org > > add: or a similar service. > Also, use 'http://purl.org' rather than just 'purl.org', or > provide a reference. > > > >as a (first-come, first-served) > > assigner of unique URIs; but a solution without reliance upon > > another entity such as the Online Computer Library > Center (OCLC, > > which runs purl.org) may be preferable. > > > > Lastly, many entities -- especially individuals -- are > assignees of > > email addresses but not domain names. It would be preferable to > > enable those entities to mint unique identifiers. > > > >2. Tag Syntax and Rules > > > > This section first specifies the syntax of tag URIs and gives > > examples. It then describes a set of rules for minting > tags designed > > to make them unique. Finally, it discusses the resolution and > > comparison of tags. > > > >2.1 Tag Syntax and Examples > > > > The general syntax of a tag URI, in ABNF, is: > > You need a reference to the ABNF RFC > (http://www.ietf.org/rfc/rfc2234.txt), > and to check the ABNF with some tool > (see advice to Internet Draft and RFC authors). > > > > tagURI = "tag:" taggingEntity ":" [specific] > > Is it possible for 'specific' to be empty? In that case, is > the ':' necessary? Is there any specific meaning for this > case? If this is allowed, please provide an example. > Also, later, 'specific' is defined as *(...), so the [] > parentheses are not at all necessary. > > > > Where: > > > > taggingEntity = authorityName "," date > > authorityName = DNSname / emailAddress > > date = 4dig ["-" 2dig ["-" 2dig ]] ; see ISO8601 [2] > > It would be much clearer if this were: > date = year ["-" month ["-" day ]] ; see ISO8601 [2] > and then > year = 4*DIGIT > month = "01" / "02" / "03" / ... > day = ("0" %x31-39) / (("1" / "2") DIGIT) / > "30" / "31" > or some such. This easily catches a lot of illegal stuff, and > makes the semantics much more obvious. > > > > DNSname = DNScomp / DNSname "." DNScomp ; see > RFC1035 [3] > > It's much better to write this rule in a non-recursive fashion: > > DNSname = DNScomp *( "." DNScomp ) > > And you better don't cite RFC 1035 directly. > > > > DNScomp = alphaNum [*(alphaNum /"-") alphaNum] > > To allow Internationalized Domain Names, you have to add > pct-encoded here: > > DNScomp = dnsChar [*(dnsChar / "-") dnsChar] > dnsChar = alphaNum / pct-encoded > > > > emailAddress = 1*(alphaNum /"-"/"."/"_") "@" DNSname > > I'd strongly recommend to also add pct-encoded here, making > this future-proof for potential internationalization of the LHS: > > emailAddress = 1*(alphaNum /"-"/"."/"_"/pct-encoded) > "@" DNSname > > > > alphaNum = DIGIT / ALPHA > > specific = *( pchar / "/" / "?" ) ; pchar from > RFCXXXX [1] > > pchar includes pct-encoded, so this is okay in terms of basic syntax. > > > > ALPHA = %x41-5A / %x61-7A ; any char in the > range "A"-"Z" > > or "a"-"z" > > DIGIT = %x30-39 ; any char in the range "0" > through "9" > > Just import ALPHA and DIGIT from the ABNF RFC, don't repeat them here. > > > At this point, you should say some general things about pct-encoded. > What you want to say probably is: > - pct-encoded (including in the case of pchar) is only allowed for > octets above %7F. > - pct-encoded (including in the case of pchar) is only allowed in > sequences that are valid UTF-8 octet sequences. > - pct-encoded is used to encode characters using UTF-8. > - There may be additional restrictions for each of the components > allowing pct-encoded. > - That pct-encoded is only allowed to allow the minting of tag IRIs, > but that tags created as URIs from the start should/must never > contain any pct-encoded pieces, and that tag IRIs also should/must > never contain any pct-encoded pieces. > > > The component "taggingEntity" is the name space part of > the URI. To > > avoid ambiguity, the domain name in "authorityName" > (whether an email > > address or a simple domain name) MUST be fully qualified. It is > > RECOMMENDED that the domain name should be in lowercase form. > > Alternative formulations of the same authority name will > be counted > > as distinct > > 'counted' -> 'treated', or even better just say that these > *are* different tags. > > > >and hence tags containing them will be unequal (see > > Section 2.4). For example, tags beginning "tag:HP.com,2000:" are > > never equal to those beginning "tag:hp.com,2000:", even > though they > > refer to the same domain name. > > > > Authority names could, in principle, belong to any syntactically > > distinct namespaces whose names are assigned to a unique > entity at a > > time. Those include, for example, certain IP addresses, > certain MAC > > addresses, and telephone numbers. However, to simplify the tag > > scheme, we restrict authority names to be domain names and email > > addresses. Future standards efforts may allow use of > other authority > > names following syntax that is disjoint from this > syntax. To allow > > for such developments, software that processes tags MUST > NOT reject > > them on the grounds that they are outside the syntax for > > authorityName defined above. > > Here, say that a DNSName must, after decoding of > percent-encoding and interpretation of the resulting octet > sequence as UTF-8, be an Internationalized Domain Name > according to IDNA [RFC 3490]. > You may also want to say that a DNSName, after decoding of > percent-encoding and interpretation of the resulting octet > sequence as UTF-8, should be normalized as defined by > Nameprep [RFC 3491] to avoid producing TAGs that look very > similar but are not the same. > > Also, say that pct-encoded is allowed on the left hand side > of emailAddress (before the "@") for future-compatibility, > and is only to be used if and when there is an IETF > Standards-Track document specifying how internationalized > email address left hand sides are handled. > > > > The component "specific" is the name-space-specific part > of the URI: > > it is a string of URI characters (see restrictions in syntax > > specification) chosen by the minter of the URI. It is > RECOMMENDED > > that specific identifiers should be human-friendly. > > Add some text here that after decoding of percent-encoding > and interpretation of the resulting octet sequence as UTF-8, > "specific" should be in NFC and preferably even in NFKC. > > > > Examples of tag URIs are: > > > > tag:timothy@hpl.hp.com,2001:web/externalHome > > tag:sandro@w3.org,2004-05:Sandro > > > tag:my-ids.com,2001-09-15:TimKindberg:presentations:UBath2004-05-19 > > tag:blogger.com,1999:blog-555 > > tag:yaml.org,2002:int > > An example without 'specific', and some I18N examples, should > be added (I can help). > > > >2.2 Rules for Minting Tags > > > > As Section 2.1 has specified, each tag consists of a > "tagging entity" > > followed, optionally, by a specific identifier. The > tagging entity > > is designated by an "authority name" -- a fully > qualified domain name > > or an email address containing a fully qualified domain name -- > > followed by a date. The date is chosen to make the > tagging entity > > globally unique, exploiting the fact that domain names and email > > addresses are assigned to at most one entity at a time. > That entity > > then ensures that it mints unique identifiers. > > The following paragraph can be reworded (and probably > simplified) once the chances to the syntax rules have been made. > > > The date specifies, according to the Gregorian calendar > and UTC, any > > particular day on which the authority name was assigned to the > > tagging entity at 00:00 UTC (the start of the day). The > date MAY be > > a past or present date on which the authority name was > assigned at > > that moment. The date is specified using one of the "YYYY", > > "YYYY-MM" and "YYYY-MM-DD" formats allowed by the ISO > 8601 standard > > [2]. The tag specification permits no other formats. Tagging > > entities MUST ascertain the date with sufficient accuracy > > to avoid accidentally using a date on which the > authority name was > > not in fact assigned (many computers and mobile devices > have poorly > > synchronised clocks). The date MUST be reckoned from > UTC -- which > > may differ from the date in the tagging entity's local > timezone at > > 00:00 UTC. > > I think some readers may be confused by "reckoned from UTC". > Why not just say that the date is always in UTC? > > > > >That distinction can generally be safely ignored in > > practice, but not on the day of the authority name's > assignment. In > > principle it would otherwise be possible on that day for > the previous > > assignee and the new assignee to use the same date and > thus mint the > > same tags. > > > > In the interests of brevity, the month and day default > to 01. A day > > value of 01 MAY be omitted; a month value of 01 MAY be > omitted unless > > it is followed by a day value other than 01. > > I'd quote all the 01 (i.e. "01") for easier readability. It > is easy here to confuse MAY with the month of May. > > > >For example, "2001-07" > > is the date 2001-07-01 and "2000" is the date > 2000-01-01. All date > > formulations specify a moment (00:00 UTC) of a single > day, and not a > > period of a day or more such as "the whole of July 2001" or "the > > whole of 2000". Assignment at that moment is all that > is required to > > use a given date formulation. > > formulation -> format? or just 'use a given date'? > > > > Tagging entities should be aware that alternative > formulations of the > > same date will be counted as distinct and hence tags > containing them > > will be unequal. For example, tags beginning > "tag:hp.com,2000:" are > > never equal to those beginning "tag:hp.com,2000-01-01:", > even though > > they refer to the same date (see Section 2.4). > > Here and elsewhere: The IETF prefers to use domain names such > as example.com. > > > > An entity MUST NOT mint tags under an authority name that was > > assigned to a different entity at 00:00 UTC on the given > date, and it > > MUST NOT mint tags under a future date. > > > > An entity that acquires an authority name immediately > after a period > > during which the name was unassigned MAY mint tags as if > the entity > > was assigned the name during the unassigned period. > This practice > > has considerable potential for error and MUST NOT be > used unless the > > entity has substantial evidence that the name was > unassigned during > > that period. The authors are currently unaware of any > mechanism that > > would count as evidence, other than daily polling of the "whois" > > registry. > > > > For example, Hewlett-Packard holds the domain > registration for hp.com > > and may mint any tags rooted at that name with a current > or past date > > when it held the registration. It must not mint tags such as > > "tag:champignon.net,2001:" under domain names not > registered to it. > > It must not mint tags dated in the future, such as > > "tag:hp.com,2999:". If it obtains assignment of > > "extremelyunlikelytobeassigned.org" on 2001-05-01, then > it must not > > mint tags under > "extremelyunlikelytobeassigned.org,2001-04-01" unless > > it has evidence proving that that name was continuously > unassigned > > between 2001-04-01 and 2001-05-01. > > > > A tagging entity mints specific identifiers that are > unique within > > its context, in accordance with any internal scheme that > uses only > > URI characters. Some tagging entities (e.g. > corporations, mailing > > lists) consist of many people, in which case group > decision-making > > and record-keeping procedures SHOULD be used to achieve > uniqueness. > > Record-keeping is important for individuals, too. > > > >2.3 Resolution of Tags > > > > There is no authoritative resolution mechanism for tags. > Unlike most > > other URIs, tags can only be used as identifiers, and are not > > designed to support resolution. If authoritative resolution is a > > desired feature, a different URI scheme should be used. > > > >2.4 Equality of Tags > > > > Tags are simply strings of characters and are considered > equal if and > > only if they are completely indistinguishable in their machine > > representations. That is, one can compare tags for equality by > > comparing the numeric codes of their characters, in sequence, for > > numeric equality. This equality-criterion allows for > > simplification > > equality-criterion -> equality criterion > > > > of tag-handling software, which does not have to > transform tags in > > any way to compare them. > > > >3. Internationalisation > > > > So far, we have considered tags as URIs, which are > represented in a > > subset of US-ASCII characters. As befits our requirement for > > identifiers to be tractable to humans, tags can also be minted as > > The 'can also be minted as' probably needs some more explanation. > In general, any uri scheme that allows pct-encoded in the > right way can also be used with IRIs. See below. > > > > Internationalized Resource Identifiers (IRIs) [4]. That > is, they can > > be minted in languages that use any characters from the Universal > > Character Set. > > Does a tag have a language? I think it's better to just say: > they can be minted using any characters from ... > > > The following procedure can probably be removed. If not, the > following details should be fixed: > > > The procedure for minting tags as IRIs is to use the > specification of > > Section 2 but with the following syntactic changes: > > o An International Domain Name (IDN) [5] represented > according to > > the rules of 'nameprep' [6] may be used in place of a > domain name > > in authorityName. That includes a domain name > appearing on the > > right-hand side of an email address. > > o If a standard arises for expressing email addresses in > > international form -- that is, including the left-hand side of > > email addresses -- then that form will be allowed in > > authorityName. > > o An international authorityName MUST appear in at > least Normalized > > Form C (NFC) and SHOULD appear in Normalized Form KC > (NFKC) [7]. > > This should not be necessary, because Nameprep takes care of this. > But it may be a good thing to say for 'specific'. > > > > o The specific component of a tag IRI may be any string > allowed by > > the ABNF term *( ipchar / "/" / "?" ) defined in [4]. > > I recommend adding some normalization restrictions here, for > the benefit of transcribability,... > > > Two tag IRIs are equal if and only if they are identical > as character > > sequences -- and thus that their machine representations are > > identical when using the same character encodings. > > It may be a good idea to repeat here explicitly that: > - The use of pct-encoding in the syntax rules is only allowed in > order to define the syntax of IRIs allowed in the tag scheme. > - pct-encoding should not be used in tags generated using only > US-ASCII characters. > - pct-encoding should not be used in tags generated including > non-ASCII characters (i.e. IRIs). > - A tag IRI is not equivalent to the tag URI resulting after > mapping the IRI to an URI according to Section 3.1 of [IRI]. > To reduce any problems resulting from this: > - tags should be used mainly with technology that can transport and > handle IRIs (such as RDF). > - If tags are temporarily converted to URIs because they have > to be passed to some infrastructure that isn't able to handle > IRIs, they should be converted back to IRIs when being recived > back from that infrastructure. > > > > >4. Security Considerations > > > > Minting a tag, by itself, is an operation internal to the tagging > > entity with no external consequences. The consequences > of using an > > improperly minted tag (due to malice or error) in an application > > depends on the application, and must be considered in > the design of > > any application that uses tags. > > > > There is a significant possibility of minting errors by > people who > > fail to apply the rules governing dates, or who use a shared > > (organizational) authority-name without prior organization-wide > > agreement. Tag-aware software MAY help catch and warn > against these > > errors. As stated in Section 2, however, to allow for future > > expansion, software MUST NOT reject tags which do not > conform to the > > syntax specified in Section 2. > > > > A malicious party could make it appear that the same > domain name or > > email address was assigned to each of two or more > entities. Tagging > > entities SHOULD use reputable assigning authorities, and verify > > assignment wherever possible. > > > > Entities SHOULD also avoid the potential for malicious > exploitation > > of clock skew, by using authority names that were assigned > > continuously from well before to well after 00:00 UTC on the date > > chosen for the tagging entity -- preferably by intervals > in the order > > of days. > > > >5. References > > > >5.1 Normative References > > > > [1] Berners-Lee, T., Fielding, R. and L. Masinter, > "Uniform Resource > > Identifier (URI): Generic Syntax (Note to the RFC > Editor: Please > > update this reference with the RFC resulting from > > draft-fielding-uri-rfc2396bis-xx.txt, and remove > this Note)", > > draft-fielding-uri-rfc2396bis-06 (work in > progress), July 2004. > > > > [2] "Data elements and interchange formats -- Information > > interchange -- Representation of dates and times", ISO > > (International Organization for Standardization) > ISO 8601:1988, > > 1988. > > > > [3] Mockapetris, P., "Domain names - implementation and > > specification", STD 13, RFC 1035, November 1987. > > > > [4] Duerst, M. and M. Suignard, "Internationalized Resource > > Identifiers (IRIs)", draft-duerst-iri-09 (work in progress), > > July 2004. > > This should have a similar RFC Editor comment as [1]. > > > > [5] Faltstrom, P., Hoffman, P. and A. Costello, > "Internationalizing > > Domain Names in Applications (IDNA)", RFC 3490, March 2003. > > > > [6] Hoffman, P. and M. Blanchet, "Nameprep: A > Stringprep Profile for > > Internationalized Domain Names (IDN)", RFC 3491, March 2003. > > > > [7] Duerst, M. and M. Davis, "Unicode Normalization > Forms", Unicode > > Standard Annex #15 > http://www.unicode.org/unicode/reports/tr15/ > > tr15-23.html, April 2003. > > > >5.2 Informative References > > > > [8] Leach, P. and R. Salz, "UUIDs and GUIDs", > draft-leach-uuids-01 > > (work in progress), 1997. > > > > [9] "Information technology - Open Systems > Interconnection - Remote > > Procedure Call (RPC)", ISO (International Organization for > > Standardization) ISO/IEC 11578:1996, 1996. > > > > [10] "Specification of abstract syntax notation one > (ASN.1)", ITU-T > > recommendation X.208, (see also RFC 1778), 1988. > > > > [11] Mealling, M., "A URN Namespace of Object Identifiers", RFC > > 3061, February 2001. > > > > [12] Paskin, N., "Information Identifiers", Learned > Publishing Vol. > > 10, No. 2, pp. 135-156, (see also www.doi.org), > April 1997. > > [snip] > > > > Regards, Martin. >
Received on Tuesday, 19 October 2004 12:36:36 UTC