- From: Michael Andrews <nextcontent01@gmail.com>
- Date: Wed, 4 Jul 2018 20:56:13 +0530
- To: Martin Hepp <mfhepp@gmail.com>
- Cc: Joe Duarte <songofapollo@gmail.com>, "schema.org Mailing List" <public-schemaorg@w3.org>
- Message-ID: <CAF9ZrJ3=A02wipoyiygvk9bjXaCgvA1bytFhzN5kv-J5cKFE-g@mail.gmail.com>
Thank you Martin Hepp, for providing a very detailed and useful answer to these points! I'm a big admirer of your work introducing the UN/CEFACT Common Code into schema.org, and I feel that it is underutilized, perhaps because the current documentation isn't as helpful as it might be for newcomers about this approach. I also appreciate you addressing the differences in terminology -- even within English. I've lived in 4 English-speaking countries, and can attest that English is not a uniform language. Maybe schema.org can do more to explain what a property might mean to various audiences. The property name and URI are just a common reference to a concept that might have many different labels within a language, or among languages. Maybe I can work with Thad to reuse these explanations to improve schema.org documentation. I imagine both the issues raised by the question, and the answers explaining the status and rationale of choices, will interest a broader audience. On Wed, Jul 4, 2018 at 2:27 PM, Martin Hepp <mfhepp@gmail.com> wrote: > Dear Joe: > > thanks for your email! > > > On 04 Jul 2018, at 09:54, Joe Duarte <songofapollo@gmail.com> wrote: > > > > Hi all, > > > > I have a few threads of feedback: > > > > 1. The schema is littered with incorrect abbreviations of American > units. Examples: > > • In the Vehicle schema, for cargoVolume, it gives FTQ as the > abbreviation for cubic feet. > > • For fuelCapacity, it gives GLL for gallons. > > I couldn't find any reference on the web that gave these abbreviations, > so I'm stumped where they came from. Cubic feet can just be written as > cubic feet, or cubic ft., or ft³. > > > > And gallon is abbreviated gal. > > > > As for the unit codes you critize: These are the official UN/CEFACT Common > Codes > > http://wiki.goodrelations-vocabulary.org/Documentation/ > UN/CEFACT_Common_Codes > https://www.unece.org/cefact/codesfortrade/codes_index.html > > This is a global standard, and was selected after a careful analysis of > alternative standard representation for unit of measurement codes. A bit of > background is on p. 20 and p. 26 of > > http://www.heppnetz.de/projects/goodrelations/GoodRelations-TR-final.pdf > > Computers benefit from unique codes for meanings, so the UN/CEFACT Common > Code strings are better suited for schema.org. Some users of schema.org > data may be able to handle other encodings for units of measurements, but > it is better to not rely on that. > > > > For fuelConsumption, the schema doesn't even try to account for the US, > and only refers to a European measure (liters per 100 km). (The US measure > is miles per gallon, abbreviated as MPG.) > > Please read the spec carefully: > > https://schema.org/fuelConsumption > > clearly says: > > "Note 2: There are two ways of indicating the fuel consumption, > fuelConsumption (e.g. 8 liters per 100 km) and fuelEfficiency (e.g. 30 > miles per gallon). They are reciprocal." > > So simply use > > https://schema.org/fuelEfficiency > > > 2. Which leads to a related observation. The schema is vividly > Eurocentric, in that it seems designed around European norms rather than > American ones. This is odd, since Schema.org sponsored mostly by American > search companies, and there are 325 million people in the US vs. 65 million > in Britain, for example. > > Schema.org is a global, international standard. It serves 7.6 billion > people on the planet. > > Also, as for the term "sponsored": Schema.org heavily relies on community > contributions. People have dedicated a lot of time to this standard without > being paid a single cent by the "American" search companies. Also, Yandex > has been a schema.org sponsor (= endorser and user of respective data) > from early on > > > Since the schema is in the English language, and the user base will be > overwhelmingly American, I think it's more appropriate that we use American > English by default, unless there's a contextual reason to use Royalist > English. Here are some examples: > > In general, schema.org tries to use a "global" form of English. Due to > the many contributors, this may be neither perfect American nor perfect > British English. As for spelling, we mostly try to use the American form. > If there are spelling inconsistencies, the most effective approach is to > file a pull request on GitHub. > > > CampingPitch is not a term Americans will be familiar with. It's a > Royalist* term. > > Please propose a better term for the area you can rent temporarily on a > campground to put your tent on. > > > Under Car, there are but two properties (from Car). I would expect these > to be important, fundamental properties at the right level of ontological > abstraction, > > The properties also relevant for other vehicles are one layer up the > hierarchy, see > > https://schema.org/Vehicle > > > but rather they are: > > > > acrissCode – this is a coding system used by European car rental > businesses. The description is so deeply Eurocentric that it doesn't even > mention Europe, or the fact these codes are only used in Europe. It's as if > the rest of the world doesn't exist. Any codes that are only used in > certain countries or continents should be identified as so encumbered. > > ACRISS is used in Europe, Middle East and Africa, and respective data can > be valuable in rental car mark-up in those markets. > > It is a principle of schema.org to provide properties that may not be > applicable in all contexts, i.e. the existence of this property does not > imply they must or should be used. They are simply an option. > > If there is another coding system for rental car categories in the US or > other parts of the world, we could add additional properties or rename this > one and expand the range of values. > > > roofLoad – this is the second of the two core properties from Car. And > again it has unit errors, this time across the board. It claims that a > kilogram is abbreviated KGM, and pounds as LBR. The SI abbreviation for > kilogram is kg, and for American customary units, pounds are lbs. This can > be confirmed in any appropriate reference, including Wikipedia. > > It may not be obvious from the documentation, but the range of this > property is > > https://auto.schema.org/QuantitativeValue > > The unit is encoded using the property > > https://auto.schema.org/unitCode > > So you can use any UN/CEFACT Common Code unit code for a weight. > > The specs tries to explain that by > > "The unit of measurement given using the UN/CEFACT Common Code (3 > characters) or a URL. Other codes than the UN/CEFACT Common Code may be > used with a prefix followed by a colon." > > > Note also that roofLoad is likely to be a European concern – Americans > don't talk about it, and it's never advertised by carmakers here. In any > case, we have properties for Car, and they are a European rental coding > system and roofLoad. > > It seems that loading stuff on the roof is more popular in Europe than in > North America, since cars in Europe tend to be smaller so the need for > additional storage capacity is a more frequent consumer need. But the laws > of physics will also apply to roof loads in North America, and a quick > Google search reveals that user manuals of American cars contain respective > load limits. > > So just omit it when you have no such data or do not need the information. > > And as said, most of the properties you seem to miss are at the level of > Vehicle, since they apply to motorbikes, coaches, and maybe boats and > airplanes alike. > > > The Car schema is in pretty bad shape. > > I think that is a bold statement, and inappropriate. > > > > There are a lot more errors in the Schema, including repetitions of the > bizarre, incorrect unit abbreviations. > > The UN/CEFACT Common Codes are neither bizarre nor incorrect but instead > widely used in e-commerce data exchange. > > Alternative unit coding schemes can be used by a prefix followed by a > colon. > > > 3. The English-only instantiation of Schema.org also raises some > important long-term questions. Do you plan to expand or mirror Schema.org > into other major languages (French, Spanish, German, Russian, Simplified > Chinese, Japanese, Korean, etc.)? Or is it meant to be mostly Western > focused? That still implicates some European languages. Moreover, if we > create Schema.org for country-specific languages like French and German, > we'll need to be sure to avoid the mistakes in the current schema, and have > the French version littered with British assumptions, for example. > > There has been a lot of discussions about translations of global data > schemas in general and schema.org in particular. > > First, keep in mind that IT systems should be able to process the data > independent of the location and language of the respective Web sites. So we > strive to find classes and properties that strike a compromise between > familiarity (which is often bound to cultural contexts) and cross-cultural > usefulness. > > Translations make it easier for developers to use the standard, but they > are difficult to maintain and can introduce additional ambiguity. > > Most programming languages and most other standards in the history of > computing have been maintained in English, and so far this has worked out > well. > > Cultural bias can be a problem, but we can only minimize, never avoid it. > And there are tradeoffs. So the most effective way of a contribution are > tangible change requests in GitHub. > > > In summary, I think there are lots of problems with the schema right > now, from bizarrely incorrect units, sections that do not contemplate the > existence of the United States, and messy structures and hierarchies that > do not meet normal ontological – or just logical – standards. > > I have tried to explain that I do not share your assessment here. > > > > > Is the team too small? Is it perhaps based in Europe? I'm happy to help. > I'm working on a metadata schema for scientific research right now, and a > couple of other schemas that I might propose for inclusion in Schema.org. > In any case, I'm happy to help. I can look for pull request opportunities, > and you might want to just hire me – I'm a social scientist who specializes > intellectual and cultural diversity and how it helps science and teams. It > would probably help to have someone on the team who knows mainstream > American norms, with a rural background, who isn't white, who loves and > knows cars very well, as well as ontology in general. Schema.org won't > reach its full potential if it's run by a handful of urbanites in the Bay > Area, Europe, etc. There would be too much cultural sameness and bias, and > giving semantics to the web is a job for a culturally diverse team. > > Joining the team is easy, and you have already made the first step ;-) > > Just get a GitHub account, fork schema.org, create proposals for > improvement, and submit a pull request to the main repository. My advice > would be to start small, with commits that fix typos or wording before > investing a lot of time into major structural modifications. The latter are > more challenging than they appear at first view, because there are many > aspects to consider when designing globally shared data schemas. > > Except for a few people employed by the big search engine companies, all > others are volunteers. So I think turning it into a paid occupation will be > a rather thorny road. > > Best wishes > > Martin > > > > > > > Cheers, > > > > Joe Duarte, PhD > > Phoenix, AZ, USA > > > > * By Royalist English, I mean that which is spoken in countries where > they bend the knee to the underemployed fashion models of the House of > Windsor. > > > > >
Received on Wednesday, 4 July 2018 15:26:40 UTC