- From: Jarno van Driel <jarnovandriel@gmail.com>
- Date: Tue, 3 May 2016 22:09:10 +0200
- To: Greg Hullender <greg_hullender@hotmail.com>
- Cc: Hans Polak <info@polak.es>, "schema.org Mailing List" <public-schemaorg@w3.org>
- Message-ID: <CADK2AU2kER42ya+wd06d22PMm3RNM4FaW20ws5XQTq1FNyWRrg@mail.gmail.com>
> > *"This “matching” is a very hard problem. I wonder if your idea of > “product family” is similar to what we used to call “product identity.”"* Interesting things I run into are the creative solutions we have to come up with to be able to tell whether differenties entities in our datawarehouse should be merged into 1 entity or not because shops/resellers/etc aren't necessarily providing proper EAN number because (a) they still had some left over from another Product, (b) add extra digits for identifying it's THEIR product being ordered through our site, (c) Imported products outside the regular channels and providing us with middel eastern gtin numbers, etc, etc. Which is especially is fun when trying to figure which products can or can not have certain reviews. Which would be much easier if product model information was readably available, but since that's rarely available one indeed has to come up with all kinds of 'hacks' like you describe with *"You need to select a set of “identity attributes” per product line" *Greg. :) Through trying to come up with markup that expresses something meaningful in an environment that in many cases doesn't have one 'true' ProductModel or 'master' product surely is one of the toughest challenges so far - especially in combination with canonicalisation algo's that don't always agree with what the structured data is saying. In that regard e-commerce environments really can be cause of some serious headaches. *"one that we had never tried to record for any product: product identity. > That is “shoe, shirt, book, etc.” "* Sounds familiar, it's all about mixing the values you have like EAN/GTIN, name (string), brand (string), ProductModel (string - if you're lucky) combined with any property (which has sufficient coverage in the data warehouse) you can throw into the mix - and than we stil end up with tons of products that seem to be unique to our algo's yet somehow keep providing job security to quite some data mangers, hehe. 2016-05-03 21:22 GMT+02:00 Greg Hullender <greg_hullender@hotmail.com>: > Yes. Up until 2014, I was in charge of the “Item Authority” team at Amazon > that did this. This “matching” is a very hard problem. I wonder if your > idea of “product family” is similar to what we used to call “product > identity.” To explain what that means needs a little explanation. > > > > The root of the problem is that product descriptions are something > manufacturers think they can be sloppy about. They think in terms of > physical stores, so they think their product id (or even UPC) is adequate. > In a physical store, if a customer sees that the product is black, not > white, then he/she will just decline to buy it, but online, the customer > often doesn’t know until the product arrives. > > > > But worse than getting the data right, manufacturers, as a rule, haven’t > even thought the problem through. You need to select a set of “identity > attributes” per product line, where those are attributes that are > sufficient to identify the product well enough that if the actual values > (for the product) match the displayed values (that the customer saw when > placing the order) then he/she will not return the product on the grounds > that it wasn’t what they ordered, BUT if ANY identity attribute is > different, then the customer does have grounds for a return. > > > > Obviously the UPC is not an identity attribute. (No one ever returned a > product because it didn’t have the UPC code they wanted.) But for a book, > for example, the title, author, binding (e.g. hard or soft cover), > publication date, and condition (used or new) are a complete set for most > purposes. (Again, no one ever returned a book because they thought it was a > different publisher—unless it was a different edition, in which case the > title should have contained that.) > > > > For a fun exercise, go through Amazon’s flatware offerings and think about > what sort of attributes you need to distinguish different sets and > individual pieces. > > > > > http://www.amazon.com/s/ref=nb_sb_noss_2?url=search-alias%3Dgarden&field-keywords=flatware > > > > Shoes are another complicated problem. Would you believe there are over > 1,000,000 unique shoe sizes? It’s not just that you’ve got gender, width > (e.g. D) , instep (e.g. 9.5), and system (US, UK, EU, JP), it’s that > different shoes have different mappings between systems. That is, one shoe > may be a 9.5 US and a 7.5 UK, but another shoe may be a 9.5 US and a 7 UK > and a third might be 9-9.5 US and 7 UK. Shoe companies do supply that data, > but there’s an awful lot of it. > > > > Color in apparel is a world all its own. Sometimes a shirt will be “blue” > and yet’s there’s barely any blue in it. When you see the other seven > patterns, though, you abruptly realize that each is distinguished just by > that little bit of color. > > > > Shortly before I retired we had concluded that the most important identity > attribute of them all was one that we had never tried to record for any > product: product identity. That is “shoe, shirt, book, etc.” The effort to > go through and backfill all the products—even with AI to help—was going to > be colossal. > > > > It sure would have helped if there had been some sort of international > standard at least for the top tier. Something well-thought-through > > > > --Greg > > > > Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for > Windows 10 > > > > *From: *Jarno van Driel <jarnovandriel@gmail.com> > *Sent: *Tuesday, May 3, 2016 8:59 AM > *To: *Hans Polak <info@polak.es> > *Cc: *schema.org Mailing List <public-schemaorg@w3.org> > *Subject: *Re: How to encode a product family? > > > >> *"Could you not refer to the manufacturer's page for asserting such a >> relationship?:"* > > > Oh, one could definitely do so but the problem I keep running into on > large scale e-commerce sites is that the product data they receive/have in > 99% of the cases doesn't contain any 'product family' nor any > 'manufacturer's page' information. Which is a serious issue for sites > containing more than 100k products as on that scale it isn't feasible to > manually find/add such information. > > What happens in these cases is that such parties write algo's that compare > products based on the information they do have ('name', 'brand' and some > string information) to 'sort of' deduct what the product family is. But > since the outcome of such algo's often contain a certain error rate it's > nearly impossible to state these fall under the same product model. > > The end result more often than not is an approximated grouping these > businesses internally call product families and which most of the time > differ from the product manufacturers state are part of a product model > line. > > Something that makes me wonder for a long time already whether we should > have ProductFamily type to accommodate this type of grouping. > > 2016-05-03 17:37 GMT+02:00 Hans Polak <info@polak.es>: > >> Hi Alexandre, >> >> Could you play around with it here >> https://generator-1260.appspot.com/ProductModel ? >> >> I'd love to get more feedback. >> >> Cheers, >> Hans >> >> >> >
Received on Tuesday, 3 May 2016 20:09:39 UTC