Comment on ITS 2.0 specification WD

Dear Multilingual Web LT group,

Below is a collection of comments on the Last Call draft. These comments are not directly related to internationalization, so I don't expect the Internationalization WG to track or endorse them.

I've also submitted through the issue tracker of the Internationalization WG a number of issues today that I consider internationalization issues (I18N-ISSUE-238 through I18N-ISSUE-247). Note that the working group has not reviewed these issues yet, so at this point they should be considered personal comments.

All comments are on


Global comments:

- A number of headings are missing spaces between section number and title.

- Numerous sentences appear to be missing articles. I've noted some of them below, but some careful copy-editing seems necessary.

Status of this Document
J Acknowledgements (Non-Normative)

The name "MultilingualWeb-LT" Working Group should be spelled out. What does "LT" stand for?

1 Introduction

- "NIF" should be spelled out on first use - I didn't know what it is.

1.2 Motivation for ITS

- Links for DocBook and DITA would be useful. Analytics

- "These types of users": Do you mean "this type of service"? Workflow Managers

- concerend -> concerned

- "bitext format": what's that? (Yes, I found out. Still, I first thought this was a typo...)

1.4.1 Support for legacy HTML content

- "migrate their content to HTML" -> "migrate their content to HTML5"

- "in older versions of HTML ... its-* attributes will be marked as invalid in validators": The W3C validator also reject its-* attributes in HTML5, and I don't see anything in the HTML5 spec that would allow such attributes in conforming HTML5 documents. (The spec allows for conforming HTML5+XXX documents, but there's no way to tell the validator that you'd like to apply HTML5+ITS rules and what they are).

2.1 Selection

- This section needs to clearly specify what it means by "node". This term is not defined in the XML specification, and it's defined differently in XPATH (which includes attributes in its definition) and HTML5/DOM (which exclude attributes). I guess you want to follow the XPATH definition.

- "CSS and other query languages": I guess you mean Selectors (formerly CSS selectors)?

- "supported by application" -> "supported by the application"

- " /docbook" -> ""

3.7 The Term HTML

- This section requires a normative reference to the HTML5 specification; a non-normative one is inadequate.

4 Conformance

- servers -> serves

4.4 Conformance Class for HTML5+ITS documents

- This section should refer to HTML5 section 2.2.3 Extensibility.

- This section should note that conforming HTML5+ITS documents in HTML syntax that include ITS markup are not conforming HTML5 documents.

5.3.3 CSS Selectors

- Selectors are now known as just Selectors, even though they originated in CSS.

- This makes the identifier "css" a bit unfortunate.

- This section requires a normative reference to the Selectors Level 3 specification, but there is none in Appendix A.

5.3.4 Additional query languages

- The "MAY" after "Future versions of this specification" is probably not the MAY of RFC 2119.

5.7 Conversion to NIF

- This section requires a normative reference to the NIF specification, but there is none in Appendix A.

6 Using ITS Markup in HTML

- This section should clarify that by "HTML" it really means "HTML5 (or successor) in HTML syntax". It's not HTML 4, because that doesn't have a translate attribute. It's also not HTML5 in XHTML syntax, because that is case sensitive and has real namespaces.

- This section requires a normative reference to the HTML5 specification, but there is none in Appendix A.

6.1 Mapping of Local Data Categories to HTML

- Is it really necessary to use case-insensitive matching for attribute values? A long discussion with the CSS group has convinced us that case-insensitive matching is generally a bad idea. The case of attribute names in HTML syntax is unfortunately decided...

- "Name of HTML attribute" -> "The name of the HTML attribute", "the name of attribute" -> "the name of the attribute"

- "will gets" -> "will get"

6.3 Standoff Markup in HTML

- The forward references to unexplained but complex sounding concepts are rather unfortunate.

7 Using ITS Markup in XHTML

- I assume this section is also meant to cover HTML5 documents in XHTML syntax. If so, this should be called out.

8.7 Language Information
E References (Non-Normative)

- The 5th edition of the XML specification refers to BCP 47, so there's no need to discuss RFC 3066 anymore.

8.9 Domain

- The substeps of step 3-1-2 and step 3-2 are the same. The algorithm would be simpler if these substeps only occured once. Step 3-1-2 iterates over a string list of length > 1, step 3-2 over a string list of length = 1 - that should be easy to merge.

- Step 3-1-2-5 refers to a mapping without saying where it comes from and how it's defined. This should be clarified based on the later description of domainMapping.

- Step 3-1-2-5 says "the mapping is case-insensitive". This should say that the string being processed is matched against the left part of the pair in a case-insensitive manner (but see also I18N-ISSUE-242).

- Steps 4 and 5 refer to "the resulting string". This should be "the resulting string list".

- The recommendation to use <meta name="keywords"> for HTML seems misguided. Typically the keywords don't contain domain information (e.g., automotive), but are stuffed with words that the authors hope search engines will match against user input (e.g., Toyota Camry, VW Passat, Honda Accord).

8.10 Disambiguation

- "what WordNet services do", "such as DBpedia": What are WordNet and DBpedia? (Yes, I found out, but informative references would help, if these services really need to be mentioned.)

- "serialize in RDFa Lite or Microdata": need informative references.

8.11 Locale Filter

- "can include the wildcard extended language range '*'": this is part of the definition of extended language ranges in BCP 47 and doesn't need to be stated here.

- "included in any local" -> "included in any locale"

8.15 Id Value

- Different parts of this section seem contradictory: First, an id value is supposed to be a "unique identifier for a given part of the content", but then there's a selector that "selects the nodes to which this rule applies", with "nodes" in plural. If the selector selects multiple nodes, then the identifier isn't unique. A name that can be used to select multiple nodes is called a "class" in HTML. So, should this section be about classes, or should the selector be required to select a single node?

- "xml:id (which is defined by XML)": I can't find this in the XML specification. Can you provide a reference?

8.16 Preserve Space

- "not applicable to HTML documents": Not quite correct - it is applicable to HTML documents in XHTML syntax. On the other hand, the non-applicability should be mentioned in normative text, in the Definition section: "The Preserve Space data category does not apply to HTML documents in HTML syntax."

8.21 Storage Size

- "character set encoding" -> "character encoding" (multiple times)

- Example 94: It would be worth pointing out that CONTINUE doesn't fit.

A References

- The XML 1.0 reference should be to the 5th edition.

Appendix B:

- File extensions are commonly specified with leading period, i.e., ".its".

- .its is used for some other file types - I don't know whether that's likely to cause problems:

C Values for the Localization Quality Issue Type

- locale-violation: Both YYYY-MM-DD and DD.MM.YYYY are valid date formats in Germany according to DIN 5008.

Received on Saturday, 19 January 2013 07:05:32 UTC