- From: Miguel Casas-Sanchez <notifications@github.com>
- Date: Wed, 26 Jul 2017 02:38:48 +0000 (UTC)
- To: w3ctag/design-reviews <design-reviews@noreply.github.com>
- Cc: Subscribed <subscribed@noreply.github.com>
- Message-ID: <w3ctag/design-reviews/issues/176/317930362@github.com>
> We think this is a great addition to the platform - it is really about how to ship it while making it widely adoptable without requiring too much work (from the implementor's perspective), and thinking about extensibility (probably for level 2 of the standard) so users can use these APIs as building blocks. We would be more than happy to discuss about the next step after this ships - I believe that the web would welcome building blocks for machine learning and computer vision, but that is a large undertaking so I let's leave that discussion outside of the scope of this review. Acknowledged! > As for the performance argument, WebASM should most likely improve the situation, but most likely not to native level. The other bit is that matrix support in JS is missing, and this does not seem like something that we will be seeing shipping soon, not to mention native implementations can even delegate the operations to dedicated hardware or DSP/GPUs. So yes, it is unlikely that a pure JS implementation will ever beat native performance. Agree. > I understand your arguments about Viola-Jones. This is more or less a stable approach - and given that it's fed with relatively similar data it should more or less render fairly similar results. Barcode and QR have fairly established methods too, so that shouldn't be a problem. QRs with binary data could be a problem with the spec as it stands, as noted above. Will file a bug on this, along with some other minor editorial bits. Done, at least the binary vs text one: https://github.com/WICG/shape-detection-api/issues/35 > Text is tricky. Especially when the API defines detected text to be available as a DOMString, this could be quite a bit of work to implement. I spent some time looking at the differences of the platform APIs across different OS implementations, it seems like for text iOS/macOS is missing the actual text detection bits, which seems like something that the browser would need to provide. (Given that even if support for this gets added later, it won't be available on older OS versions.) That's correct, Mac provides only the bounding boxes but not the result of any OCR inside of them. (Whereas Android and Win10 do seem to support OCR, see links in the [Example of the Spec](https://wicg.github.io/shape-detection-api/#text-detection-api)). I guess in this case developers should rely on polyfills, probably using Tesseract -- but you had some concerns about its performance beyond pure document scanning use cases, right? > Language support in text detection is another tricky topic - different implementations will most likely have different capabilities and accuracy (not only for text detected in a natural scene, but for complex languages [e.g. CJK] and RTL languages) - I haven't seen open source libraries that provide reasonable performance for multiple languages out of the box. I'm wondering if it would be better to tackle the two easy ones first, and discuss with other implementors about what they are willing to ship for the harder one. (text, in this case) I've never had first-hand experience with text detection on non-latin based languages, but I know that the Android implementation doesn't work well with either Hanzi nor Katakanas. Are you proposing treating Face+Barcode and Text differently? Aside from this last remark, I understand from this discussion that the Spec looks good TAG-wise ? (Notwithstanding specific Issues to be filled). -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/w3ctag/design-reviews/issues/176#issuecomment-317930362
Received on Wednesday, 26 July 2017 02:39:12 UTC