Re: [w3ctag/design-reviews] WICG Shape Detection API (#176)

@yellowdoge Apologies for sitting on this for so long.

I brought this up in a call quite a while ago - the follow-up took way too long. The raw minutes are here: https://pad.w3ctag.org/p/2017-05-30-minutes.md

We think this is a great addition to the platform - it is really about how to ship it while making it widely adoptable without requiring too much work (from the implementor's perspective), and thinking about extensibility (probably for level 2 of the standard) so users can use these APIs as building blocks. We would be more than happy to discuss about the next step after this ships - I believe that the web would welcome building blocks for machine learning and computer vision, but that is a large undertaking so I let's leave that discussion outside of the scope of this review.

As for the performance argument, WebASM should most likely improve the situation, but most likely not to native level. The other bit is that matrix support in JS is missing, and this does not seem like something that we will be seeing shipping soon, not to mention native implementations can even delegate the operations to dedicated hardware or DSP/GPUs. So yes, it is unlikely that a pure JS implementation will ever beat native performance.

I understand your arguments about Viola-Jones. This is more or less a stable approach - and given that it's fed with relatively similar data it should more or less render fairly similar results.  Barcode and QR have fairly established methods too, so that shouldn't be a problem. QRs with binary data could be a problem with the spec as it stands, as noted above. Will file a bug on this, along with some other minor editorial bits.

Text is tricky. Especially when the API defines detected text to be available as a DOMString, this could be quite a bit of work to implement. I spent some time looking at the differences of the platform APIs across different OS implementations, it seems like for text iOS/macOS is missing the actual text detection bits, which seems like something that the browser would need to provide. (Given that even if support for this gets added later, it won't be available on older OS versions.)

Language support in text detection is another tricky topic - different implementations will most likely have different capabilities and accuracy (not only for text detected in a natural scene, but for complex languages [e.g. CJK] and RTL languages) - I haven't seen open source libraries that provide reasonable performance for multiple languages out of the box. I'm wondering if it would be better to tackle the two easy ones first, and discuss with other implementors about what they are willing to ship for the harder one. (text, in this case)

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/w3ctag/design-reviews/issues/176#issuecomment-309654991

Received on Tuesday, 20 June 2017 06:14:23 UTC