Re: Eurocentrism, incorrect unit abbreviations, and proprietary Royalist Engish (sic) terms

 05 Jul 2018, at 03:29, Joe Duarte <> wrote:
> All I care about re: American units is that the abbreviations are correct. American websites and publishers will use American units. Chinese publishers will want to use metric and sometimes traditional Chinese units (e.g. tael for precious metals weight)​. 

The values of are **codes**, not "abbreviations". Codes are unique sequences of symbols that computers can understand more reliably because there is exactly one sequence per meaning (i.e. no syntactic/lexical variability) and the very same sequence always refers to the same meaning (i.e. no code collision).

> I assume we'd want a Chinese version of
In terms of translations of the documentation: Yes, maybe. In terms of a conceptually different variant in a new URI space, likely not. tries to foster informatione extraction from Web content all over the world.

> Martin said:
> As for units of quantitative data, you can use any unit of measurement you want, as long as there is either a UN/CEFACT Common Code or a URI for it, or if you can establish a standard prefix plus unique identifiers for it.
> This is what I'm concerned about. First, the units are already established as are their abbreviations.

We are not touching the issue of whether the units are "established" in here or not. 

> We know that a kilogram is kg, a meter is m, a millisecond is ms, etc.

"We" might "know", but computers have a harder time dealing with these strings from human language, in particular for the not-so-frequently-used units of measurement.

> This is all established by the SI or the International System of Units. SI trumps UN/CEFACT any day of the week. It would incredibly unwise to deviate from SI standards for unit symbols/abbreviations.

Nobody deviates from SI units in here. The UN/CEFACT Common Code provides **codes** for all SI units plus many other types of units. See pp. 19f. of

And you can always use the strings you like with It will just be a bit harder for consumers of your data to process it. 

> Note also that UN/CEFACT is a mostly European endeavor.

I do not think that matters in here. It was, after a careful review about a decade ago, the most useful standard code for units of measurement and hence adopted for GoodRelations and later on

Please also note that the World Wide Web was initially a European endeavor developed by an English engineer and computer scientist in a lab in Switzerland ;-).

> Europe is more bureaucratic than the US, and CEFACT was an effort to have a centralized... specification I guess we'd call it, for business processes and trade. For example, they have more than a hundred heavy XML schemas like these:
>  • Lodging House Information Request
>  • Lodging House Information Response
>  • Lodging House Reservation Request
>  • Lodging House Reservation Response
>  • Lodging House Travel Product Information
> ​…with the​ idea that all "lodging houses" would use these XML schemas, across countries and countless other contexts. This kind of thing would never occur to Americans as something we need to do – centrally planning the transaction structure and data transmission for random industries (especially in XML). For another example, see ISO 20222, which desperately needs to be reformulated in YAML.

The UN/CEFACT might have produced some standards that failed to gain adoption. But the UN/CEFACT Common Code is a good one. Many standardisation bodies, including many in North America, developed XML-based standards in the era of XML hype that were complicated, brittle and useless.

> ​But my main point is: please don't​ use non-standard unit abbreviations.

As I have explained, the UN/CEFACT Common Code was at the time, and in my opinion still is, the best available standard for **codes** of units of measurements. Plus, allows using plain text as an alternative range, so you can use "kg" and "gal" if you prefer.

> There's no reason for to do that. And for American units, Wikipedia is a good source for the references: NIST and ANSI will be the sources for some of them, but there are no surprises here. American units and their abbreviations haven't changed in decades. (The definition of a yard and pound, and by extension related units, was established by the International Agreement on the Yard and Pound in the 1950s, which gave them exact metric definitions/conversion factors.)

That Wikipedia page also includes a perfect illustration why standard codes for globally standardized units are useful (image taken from

> If it's about flexibility, that's great, but it doesn't address my concerns about the Eurocentrism of the schema, or the Royalist English. (FYI, campsite is the American term for a "camping pitch")

The English Wikipedia says the following:

"A campsite or camping pitch is a place used for overnight stay in an outdoor area. In UK English, a campsite is an area, usually divided into a number of pitches, where people can camp overnight using tents or camper vans or caravans; this UK English use of the word is synonymous with the US English expression campground. In American English, the term campsite generally means an area where an individual, family, group, or military unit can pitch a tent or park a camper; a campground may contain many campsites.

" is the type for the exact pitch on which you set-up your tent. 

The problem with terminology in here is that the AE term "campsite" is the BE term for campground, and we needed a unique term for the actual pitches. The term was used not because it is from British English, but because it cannot be confused with the broader camping area.

> If you confuse American publishers with a bunch of Royalist English, non-standard unit abbreviations, etc. they'll be less likely to embrace (beyond just having page-level metadata in a JSON-LD script in the head).

The challenge for is to strike a balance between many conflicting requirements:

1. The resulting data should be most useful for computer processing.
2. The data structures should be easy to populate from existing back-end data sources.
3. The partitioning of classes and properties should be precise and ontologically sound, yet understandable to wide audiences.
4. The terminology and textual elements should be precise yet understandable for developers from all over the world, the majority being non-native speakers of English.

And is already a huge success. It could be made simpler, but then the data would be less useful for computers. It could be made more complicated, but then the learning costs for average developers would rise. So our main option is to

1. gradually improve and extend the schema
2. while at the same time increasing the incentive of its use, mainly by new search engine features,
3. and continue to explain the ecosystem to the world.

In my mails to this topic, I tried to explain the techical aspects of the issues you raise, and I think I said what I can contribute to this topic. 

Best wishes
Martin Hepp

Received on Thursday, 5 July 2018 06:30:43 UTC