RE: machine readability (conversation starters) from Paul Topping on 2016-05-23 (public-mathonwebpages@w3.org from May 2016)

From: Paul Topping <pault@dessci.com>
Date: Mon, 23 May 2016 17:58:27 +0000
To: Peter Krautzberger <peter.krautzberger@mathjax.org>, "Siegman, Tzviya - Hoboken" <tsiegman@wiley.com>
CC: "public-mathonwebpages@w3.org" <public-mathonwebpages@w3.org>
Message-ID: <B6C5B1ABA88AF446821B281774E6DB71B7240537@FERMAT.corp.dessci>
Partial results are helpful. If units can be machine readable and accessible, it can make the rest of the math in which it is embedded more readable and accessible. In general, there are no absolutes when it comes to understanding, machine or human. When units are understood, then there’s less of the expression that is not understood. And, of course, units are just an example. There are many more constructs that need understanding.

This can also be thought of as a division of labor between machine/representation and the mind of the reader. At one end of the spectrum, the math can be understood as a simply a series of characters. If a human reads “5 in” as “5”, “i”, “n”, many people will still make sense of it. My point is that even if the machine can understand “5 in”, the rest of the expression may still be understood by a human reader at this character-by-character level.

I believe that the MathML spec does have a recommended way to mark up units but I am not sure that using a class was it but perhaps it was. Regardless, I believe inventing such things one by one will not get us where we need to be fast enough. Same for the microformats efforts. Instead, we need more general mechanisms for marking up semantic concepts and for using that embedded knowledge to guide rendering, search, accessibility, etc. Such mechanisms need to differentiate between (a) markup created by an author, in which case it can be assumed to be authoritative (reflecting the author’s intent), (b) markup injected by some machine understanding process that attempts to infer high level semantics from low-level markup.

We need to allow authors to do high level semantic markup but we also know that most authors won’t do it. Marking up things at authoring time is hard work. Languages like TeX make it a little easier for things like units but aren’t a general solution. For high-value content (eg, textbooks) where there are going to be a lot of readers, it is perhaps worth hiring an expert to mark things up properly but even that is becoming less and less economically viable. More and more we will want to rely on machine generated semantic annotation embedded in the markup. This is basically caching the result of some kind of deep analysis. Some AI process that looks at context and does deep document analysis is likely not to perform well enough for such inference to be done on the fly. Also, doing it on the fly doesn’t allow human judgement to aid the process.

Paul

From: Peter Krautzberger [mailto:peter.krautzberger@mathjax.org]
Sent: Monday, May 23, 2016 9:12 AM
To: Siegman, Tzviya - Hoboken <tsiegman@wiley.com>
Cc: public-mathonwebpages@w3.org
Subject: machine readability (conversation starters)

Hi Tzviya,

For the present at least, I would disagree. I do dream that eventually (maybe 10 years from now?) we'll have a thorough a11y API mapping for mathematics. At the moment, I don't think mathematics (as a culture / language) is ready for this (though web technology probably would be).

Regarding general machine readability vs accessibility, one important difference I see is that machine readability can benefit from partial results whereas accessibility cannot.

A typical example for this might be units. If we can find a way to make units machine readable, I think we'd have a major improvement for STEM on the web. But it won't help accessibility (much) to know that there are units in an expression if it is otherwise unintelligible.

Of course, we currently don't have any standard or best practice for exposing units on the web. The MathWG had a very old note on units  (from 2003) which suggested class='MathML-Unit' on MathML elements; I don't think that's viable approach today. Perhaps schema is a better starting point considering how successful search engines can leverage units in recipes (I could imagine lab protocols and engineering might benefit from similar methods).

For some tools it's extremely easy to generate markup for units, e.g., Jos de Jong's MathJS has a rich interface for handling units and could probably easily expose them in a visual output. TeX has a rich history with the physics and siunitx packages (which are, for example, partially available in MathJax as third party extensions) and heuristics seem feasible to enrich formats in general (again, MathJax can do some of that via the speechruleengine).

I think for humans we have to change our expectations. Otherwise, we'll just end up repeating the mistakes of the past. I'll post some thoughts on the accessibility thread later.

Best regards,
Peter.



On Wed, May 4, 2016 at 8:06 PM, Siegman, Tzviya - Hoboken <tsiegman@wiley.com<mailto:tsiegman@wiley.com>> wrote:
Hi Peter,

There are details that are different. For example, ultimately there will be a detailed Accessibility API mapping for Math as there is for HTML, ARIA, and SVG, but I would love it the world would come to understand that accessibility is a subset of machine readability. Accessibility APIs are a specialized kind of machine. If we are working on machine readable math, we need to make sure that those specialized machines can read the math too. Otherwise we will do the work twice.

Tzviya

Tzviya Siegman
Digital Book Standards & Capabilities Lead
Wiley
201-748-6884<tel:201-748-6884>
tsiegman@wiley.com<mailto:tsiegman@wiley.com>

From: Peter Krautzberger [mailto:peter.krautzberger@mathjax.org<mailto:peter.krautzberger@mathjax.org>]
Sent: Wednesday, May 04, 2016 11:01 AM
To: public-mathonwebpages@w3.org<mailto:public-mathonwebpages@w3.org>
Cc: public-mathonw.
Subject: RE: conversation starters

Hi Tzviya,

Would you like to create a spin off thread?

(Obviously I saw a difference but I'd be interested to hear what you have in mind.)

Best regards,
Peter.

On May 4, 2016 4:38 PM, "Siegman, Tzviya - Hoboken" <tsiegman@wiley.com<mailto:tsiegman@wiley.com>> wrote:
I’m interested in a11y and machine readability (and the intersection of the two, because really, why do we treat these like they are different issues?)

Tzviya Siegman
Digital Book Standards & Capabilities Lead
Wiley
201-748-6884<tel:201-748-6884>
tsiegman@wiley.com<mailto:tsiegman@wiley.com>

From: Peter Krautzberger [mailto:peter.krautzberger@mathjax.org<mailto:peter.krautzberger@mathjax.org>]
Sent: Wednesday, May 04, 2016 6:17 AM
To: public-mathonw.
Subject: conversation starters

Hi math-on-webpages,

Here's a short list of topics to start a conversation. These four came up on the call, feel free to add.

If you're interested, just

* spin off a new thread (change subject in your reply)
* comment or +1 so that we can
* team up for some dedicated conversations

Best regards,
Peter.

**layout**

Whether CSS, SVG, or canvas, if you're interested in talking about math layout on the web, let's collect best practices, use cases, and analyze gaps in the OWP.

**accessibility*

If you're working on making math accessible -- ARIA, speech, braille, internationalization etc.

**editing**

Web-based editing touches virtually everything -- layout, interaction, accessibility -- but gathering information unique to the challenges of editing would be worthwhile.

**machine readability**

Exposing information to machines, I hear it's a thing -- JSON, microdata, RDFa etc.
Received on Monday, 23 May 2016 17:58:57 UTC