Re: Eurocentrism, incorrect unit abbreviations, and proprietary Royalist Engish terms

Dear Joe:

thanks for your email!

> On 04 Jul 2018, at 09:54, Joe Duarte <songofapollo@gmail.com> wrote:
> 
> Hi all,
> 
> I have a few threads of feedback:
> 
> 1. The schema is littered with incorrect abbreviations of American units. Examples:
>  • In the Vehicle schema, for cargoVolume, it gives FTQ as the abbreviation for cubic feet.
>  • For fuelCapacity, it gives GLL for gallons.
> I couldn't find any reference on the web that gave these abbreviations, so I'm stumped where they came from. Cubic feet can just be written as cubic feet, or cubic ft., or ft³.
> 
> And gallon is abbreviated gal.
> 

As for the unit codes you critize: These are the official UN/CEFACT Common Codes

http://wiki.goodrelations-vocabulary.org/Documentation/UN/CEFACT_Common_Codes
https://www.unece.org/cefact/codesfortrade/codes_index.html

This is a global standard, and was selected after a careful analysis of alternative standard representation for unit of measurement codes. A bit of background is on p. 20 and p. 26 of 

http://www.heppnetz.de/projects/goodrelations/GoodRelations-TR-final.pdf

Computers benefit from unique codes for meanings, so the UN/CEFACT Common Code strings are better suited for schema.org. Some users of schema.org data may be able to handle other encodings for units of measurements, but it is better to not rely on that.


> For fuelConsumption, the schema doesn't even try to account for the US, and only refers to a European measure (liters per 100 km). (The US measure is miles per gallon, abbreviated as MPG.)

Please read the spec carefully:

    https://schema.org/fuelConsumption

clearly says:

"Note 2: There are two ways of indicating the fuel consumption, fuelConsumption (e.g. 8 liters per 100 km) and fuelEfficiency (e.g. 30 miles per gallon). They are reciprocal."

So simply use 

    https://schema.org/fuelEfficiency

> 2. Which leads to a related observation. The schema is vividly Eurocentric, in that it seems designed around European norms rather than American ones. This is odd, since Schema.org sponsored mostly by American search companies, and there are 325 million people in the US vs. 65 million in Britain, for example.

Schema.org is a global, international standard. It serves 7.6 billion people on the planet.

Also, as for the term "sponsored": Schema.org heavily relies on community contributions. People have dedicated a lot of time to this standard without being paid a single cent by the "American" search companies. Also, Yandex has been a schema.org sponsor (= endorser and user of respective data) from early on

> Since the schema is in the English language, and the user base will be overwhelmingly American, I think it's more appropriate that we use American English by default, unless there's a contextual reason to use Royalist English. Here are some examples:

In general, schema.org tries to use a "global" form of English. Due to the many contributors, this may be neither perfect American nor perfect British English. As for spelling, we mostly try to use the American form. If there are spelling inconsistencies, the most effective approach is to file a pull request on GitHub.

> CampingPitch is not a term Americans will be familiar with. It's a Royalist* term.

Please propose a better term for the area you can rent temporarily on a campground to put your tent on.

> Under Car, there are but two properties (from Car). I would expect these to be important, fundamental properties at the right level of ontological abstraction,

The properties also relevant for other vehicles are one layer up the hierarchy, see

    https://schema.org/Vehicle

> but rather they are:
> 
> acrissCode – this is a coding system used by European car rental businesses. The description is so deeply Eurocentric that it doesn't even mention Europe, or the fact these codes are only used in Europe. It's as if the rest of the world doesn't exist. Any codes that are only used in certain countries or continents should be identified as so encumbered.

ACRISS is used in Europe, Middle East and Africa, and respective data can be valuable in rental car mark-up in those markets.

It is a principle of schema.org to provide properties that may not be applicable in all contexts, i.e. the existence of this property does not imply they must or should be used. They are simply an option.

If there is another coding system for rental car categories in the US or other parts of the world, we could add additional properties or rename this one and expand the range of values.

> roofLoad – this is the second of the two core properties from Car. And again it has unit errors, this time across the board. It claims that a kilogram is abbreviated KGM, and pounds as LBR. The SI abbreviation for kilogram is kg, and for American customary units, pounds are lbs. This can be confirmed in any appropriate reference, including Wikipedia.

It may not be obvious from the documentation, but the range of this property is

    https://auto.schema.org/QuantitativeValue

The unit is encoded using the property

    https://auto.schema.org/unitCode

So you can use any UN/CEFACT Common Code unit code for a weight.

The specs tries to explain that by

"The unit of measurement given using the UN/CEFACT Common Code (3 characters) or a URL. Other codes than the UN/CEFACT Common Code may be used with a prefix followed by a colon."

> Note also that roofLoad is likely to be a European concern – Americans don't talk about it, and it's never advertised by carmakers here. In any case, we have properties for Car, and they are a European rental coding system and roofLoad.

It seems that loading stuff on the roof is more popular in Europe than in North America, since cars in Europe tend to be smaller so the need for additional storage capacity is a more frequent consumer need. But the laws of physics will also apply to roof loads in North America, and a quick Google search reveals that user manuals of American cars contain respective load limits.

So just omit it when you have no such data or do not need the information.

And as said, most of the properties you seem to miss are at the level of Vehicle, since they apply to motorbikes, coaches, and maybe boats and airplanes alike.

> The Car schema is in pretty bad shape.

I think that is a bold statement, and inappropriate.


> There are a lot more errors in the Schema, including repetitions of the bizarre, incorrect unit abbreviations.

The UN/CEFACT Common Codes are neither bizarre nor incorrect but instead widely used in e-commerce data exchange.

Alternative unit coding schemes can be used by a prefix followed by a colon.

> 3. The English-only instantiation of Schema.org also raises some important long-term questions. Do you plan to expand or mirror Schema.org into other major languages (French, Spanish, German, Russian, Simplified Chinese, Japanese, Korean, etc.)? Or is it meant to be mostly Western focused? That still implicates some European languages. Moreover, if we create Schema.org for country-specific languages like French and German, we'll need to be sure to avoid the mistakes in the current schema, and have the French version littered with British assumptions, for example.

There has been a lot of discussions about translations of global data schemas in general and schema.org in particular. 

First, keep in mind that IT systems should be able to process the data independent of the location and language of the respective Web sites. So we strive to find classes and properties that strike a compromise between familiarity (which is often bound to cultural contexts) and cross-cultural usefulness.

Translations make it easier for developers to use the standard, but they are difficult to maintain and can introduce additional ambiguity.

Most programming languages and most other standards in the history of computing have been maintained in English, and so far this has worked out well.

Cultural bias can be a problem, but we can only minimize, never avoid it. And there are tradeoffs. So the most effective way of a contribution are tangible change requests in GitHub.

> In summary, I think there are lots of problems with the schema right now, from bizarrely incorrect units, sections that do not contemplate the existence of the United States, and messy structures and hierarchies that do not meet normal ontological – or just logical – standards.

I have tried to explain that I do not share your assessment here.

> 
> Is the team too small? Is it perhaps based in Europe? I'm happy to help. I'm working on a metadata schema for scientific research right now, and a couple of other schemas that I might propose for inclusion in Schema.org. In any case, I'm happy to help. I can look for pull request opportunities, and you might want to just hire me – I'm a social scientist who specializes intellectual and cultural diversity and how it helps science and teams. It would probably help to have someone on the team who knows mainstream American norms, with a rural background, who isn't white, who loves and knows cars very well, as well as ontology in general. Schema.org won't reach its full potential if it's run by a handful of urbanites in the Bay Area, Europe, etc. There would be too much cultural sameness and bias, and giving semantics to the web is a job for a culturally diverse team.

Joining the team is easy, and you have already made the first step ;-)

Just get a GitHub account, fork schema.org, create proposals for improvement, and submit a pull request to the main repository. My advice would be to start small, with commits that fix typos or wording before investing a lot of time into major structural modifications. The latter are more challenging than they appear at first view, because there are many aspects to consider when designing globally shared data schemas.

Except for a few people employed by the big search engine companies, all others are volunteers. So I think turning it into a paid occupation will be a rather thorny road.

Best wishes

Martin



> 
> Cheers,
> 
> Joe Duarte, PhD
> Phoenix, AZ, USA
> 
> * By Royalist English, I mean that which is spoken in countries where they bend the knee to the underemployed fashion models of the House of Windsor.
> 

Received on Wednesday, 4 July 2018 08:58:20 UTC