- From: Greg Hullender <greg_hullender@hotmail.com>
- Date: Tue, 3 May 2016 19:22:44 +0000
- To: Jarno van Driel <jarnovandriel@gmail.com>, Hans Polak <info@polak.es>
- CC: schema.org Mailing List <public-schemaorg@w3.org>
- Message-ID: <BY2PR20MB03260C1670289E2B7331212CE27A0@BY2PR20MB0326.namprd20.prod.outlook.com>
Yes. Up until 2014, I was in charge of the “Item Authority” team at Amazon that did this. This “matching” is a very hard problem. I wonder if your idea of “product family” is similar to what we used to call “product identity.” To explain what that means needs a little explanation. The root of the problem is that product descriptions are something manufacturers think they can be sloppy about. They think in terms of physical stores, so they think their product id (or even UPC) is adequate. In a physical store, if a customer sees that the product is black, not white, then he/she will just decline to buy it, but online, the customer often doesn’t know until the product arrives. But worse than getting the data right, manufacturers, as a rule, haven’t even thought the problem through. You need to select a set of “identity attributes” per product line, where those are attributes that are sufficient to identify the product well enough that if the actual values (for the product) match the displayed values (that the customer saw when placing the order) then he/she will not return the product on the grounds that it wasn’t what they ordered, BUT if ANY identity attribute is different, then the customer does have grounds for a return. Obviously the UPC is not an identity attribute. (No one ever returned a product because it didn’t have the UPC code they wanted.) But for a book, for example, the title, author, binding (e.g. hard or soft cover), publication date, and condition (used or new) are a complete set for most purposes. (Again, no one ever returned a book because they thought it was a different publisher—unless it was a different edition, in which case the title should have contained that.) For a fun exercise, go through Amazon’s flatware offerings and think about what sort of attributes you need to distinguish different sets and individual pieces. http://www.amazon.com/s/ref=nb_sb_noss_2?url=search-alias%3Dgarden&field-keywords=flatware Shoes are another complicated problem. Would you believe there are over 1,000,000 unique shoe sizes? It’s not just that you’ve got gender, width (e.g. D) , instep (e.g. 9.5), and system (US, UK, EU, JP), it’s that different shoes have different mappings between systems. That is, one shoe may be a 9.5 US and a 7.5 UK, but another shoe may be a 9.5 US and a 7 UK and a third might be 9-9.5 US and 7 UK. Shoe companies do supply that data, but there’s an awful lot of it. Color in apparel is a world all its own. Sometimes a shirt will be “blue” and yet’s there’s barely any blue in it. When you see the other seven patterns, though, you abruptly realize that each is distinguished just by that little bit of color. Shortly before I retired we had concluded that the most important identity attribute of them all was one that we had never tried to record for any product: product identity. That is “shoe, shirt, book, etc.” The effort to go through and backfill all the products—even with AI to help—was going to be colossal. It sure would have helped if there had been some sort of international standard at least for the top tier. Something well-thought-through --Greg Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10 From: Jarno van Driel<mailto:jarnovandriel@gmail.com> Sent: Tuesday, May 3, 2016 8:59 AM To: Hans Polak<mailto:info@polak.es> Cc: schema.org Mailing List<mailto:public-schemaorg@w3.org> Subject: Re: How to encode a product family? "Could you not refer to the manufacturer's page for asserting such a relationship?:" Oh, one could definitely do so but the problem I keep running into on large scale e-commerce sites is that the product data they receive/have in 99% of the cases doesn't contain any 'product family' nor any 'manufacturer's page' information. Which is a serious issue for sites containing more than 100k products as on that scale it isn't feasible to manually find/add such information. What happens in these cases is that such parties write algo's that compare products based on the information they do have ('name', 'brand' and some string information) to 'sort of' deduct what the product family is. But since the outcome of such algo's often contain a certain error rate it's nearly impossible to state these fall under the same product model. The end result more often than not is an approximated grouping these businesses internally call product families and which most of the time differ from the product manufacturers state are part of a product model line. Something that makes me wonder for a long time already whether we should have ProductFamily type to accommodate this type of grouping. 2016-05-03 17:37 GMT+02:00 Hans Polak <info@polak.es<mailto:info@polak.es>>: Hi Alexandre, Could you play around with it here https://generator-1260.appspot.com/ProductModel ? I'd love to get more feedback. Cheers, Hans
Received on Tuesday, 3 May 2016 19:23:15 UTC