Re: More on SVG within HTML pages from Shelley Powers on 2009-09-09 (public-html@w3.org from September 2009)

From: Shelley Powers <shelleyp@burningbird.net>
Date: Wed, 09 Sep 2009 07:44:45 -0500
To: Henri Sivonen <hsivonen@iki.fi>
CC: Maciej Stachowiak <mjs@apple.com>, Jonas Sicking <jonas@sicking.cc>, "Tab Atkins Jr." <jackalmage@gmail.com>, public-html@w3.org, Lachlan Hunt <lachlan.hunt@lachy.id.au>
Message-ID: <4AA7A33D.4050907@burningbird.net>
Henri Sivonen wrote:
> On Sep 9, 2009, at 02:21, Maciej Stachowiak wrote:
>
>> On Sep 8, 2009, at 3:49 PM, Jonas Sicking wrote:
>>
>>> On Tue, Sep 8, 2009 at 3:15 PM, Maciej Stachowiak<mjs@apple.com> wrote:
>>>>
>>>>
>>>> I think in an earlier message, Henri outlined the possible options for
>>>> dealing with this (specifically in the context of <metadata>, but I 
>>>> think it
>>>> applies to SVG in general).[1] I think his option (3) is probably 
>>>> the most
>>>> reasonable, all things considered. The DOM Consistency violation is 
>>>> only
>>>> theoretical, because the content in question is generally not meant 
>>>> to be
>>>> processed by the client. I think his options (1) or (3) could be 
>>>> implemented
>>>> by errata to the SVG 1.1 spec to make clear how SVG conformance 
>>>> rules apply
>>>> in text/html. His option (2) would require changes to the HTML5 
>>>> parsing
>>>> algorithm, but it doesn't seem to be anyone's preferred option.
>>>
>>> Surely this data is expected to be consumed by *some* client, right?
>
> When an HTML author copies and pastes some Inkscape-created or 
> Illustrator-created SVG clip art (e.g. from Wikimedia Commons), I'd 
> expect the HTML author to have no expectations of how either metadata 
> or product-specific state data be processed other than that it doesn't 
> interfere with vector graphic rendering.
>
> For this use case, it's entirely unhelpful for a validator to complain 
> either about what was metadata in SVG/XML or about what was 
> product-specific state in SVG/XML when pasted to text/html. In 
> particular, removing this cruft manually in a text editor is too 
> labor-intensive for too little gain. However, it would be fair to emit 
> one warning suggesting the use of a file size optimizer that zaps the 
> cruft without manual labor in a text editor. For this use case, it's 
> also unnecessary to define any non-validator processing that isn't 
> already defined.

It's not the HTML WG's place to warn people about things outside of the 
province of HTML5.

One single warning about DOM differences between for namespaced entities 
of SVG in HTML as compared toSVG in  XHTML should be sufficient. A 
person can then make their own decisions about what they do, and don't 
want to do, with the annotation.
>
> In the rare case where the HTML author actually tries to express 
> metadata and expects it to be processed as RDF/XML, it would be 
> reasonable to throw errors to signal that what looks like RDF/XML 
> metadata isn't, in fact, RDF/XML when parsed as text/html. Since this 
> case can be expected to be rare compared to the above case and since a 
> validator cannot know which case it is dealing with, on balance, a 
> single warning is probably the most useful compromise even though it's 
> technically wrong not to flag bogus markup as errors.
>
> On the other hand, even though RDF/XML is in principle supported in 
> SVG/XML, it's probably equally wasteful to spend time adding RDF/XML 
> metadata to SVG/XML on the Web (in terms of what one can expect about 
> someone processing the metadata as an RDF graph and deriving value 
> from it), so keeping that in perspective, it's not significantly more 
> wasteful to put RDF/XML-looking stuff in text/html. Can anyone show me 
> products that ingest RDF/XML from SVG/XML, build an RDF graph and do 
> something useful (other than merely displaying it) with it?
>

It is not bogus markup. There are applications that do look for RDF/XML 
in web pages, and there are applications that specifically look for 
Creative Commons. It is also not uncommon. In fact, for many of the 
images from Open Source Clip Art, RDF is quite common.

There's no need for special rendering of the RDF/XML, when the other 
namespaced entities are left the same. To do so, shows a bias 
specifically against RDF, a bias at odds with the W3C's past statement, 
and current interest with semantic web markup.

It is not this group's place to police the web. It is certainly not this 
group's place to impose bias.

The one warning, given when the first namespaced entity is reached is 
sufficient.
> As for the use case of a future version of Inkscape trying to 
> round-trip its state when reading its SVG/XML output that has been 
> pasted into text/html, what Maciej says below applies.
>
>>> And we'd need to define parsing rules that that client can use to
>>> consume the data, no?
>>
>> I think in most cases the data is expected to be consumed by the 
>> authoring tool that created the SVG in the first place. They are 
>> storing information that is of interest to further editing of the 
>> given SVG by the original tool, but not to mere display. If such 
>> tools are ever updated to directly save and load SVG-in-text/html, 
>> then indeed they will have a problem. They'll either need to store 
>> their authoring state metadata differently in HTML serialization, or 
>> at least expect that when parsing the HTML serialization, the 
>> metadata is expressed differently. So indeed, these authoring tools 
>> (but not UAs or Web content) would face a DOM Consistency problem.
>
> This bothers me, but I don't have a solution that isn't bad in some 
> way. It bothers me less than DOM Consistency problems in the parts of 
> the language that are meant to interoperate between multiple products, 
> though.
>
> It would be possible to grandfather a fixed set of Inkscape attribute 
> names into the pre-interned tables of well-known attribute names so 
> that those attributes would get the same magic as xlink:href, but I 
> can see how that could be seen as unfair favoritism towards one 
> product that would bloat everyone else's footprint and it still 
> wouldn't deal with the Inkscape elements. I'd oppose grandfathering 
> Inkscape elements, because they don't fit into a mechanism existing 
> for other reasons. The stuff Illustrator outputs is too varied and 
> crazy to grandfather, in my opinion.
>
That would be overkill, and outside of the interests of this group. 
We're already facing extensibility issues, and hard coding vocabularies 
has been, and continues to be a problem. This is why extensibility was 
originally defined: so we don't have to fill up our page markup 
specifications with a mind numbing list of things allowed, which will 
only be dated as soon as the spec releases.

It will be rare for a client to want to process the metadata or tool 
annotation, but not entirely impossible. There's useful stuff in the 
RDF, such as annotation beyond that in title and desc. The tool 
information could also be of interest. It's little different than the 
metadata included in JPEGs -- most of it is ignored, but some 
applications, such as Gallery, pull the data, and display it for those 
interested (camera information, color model, and so on).

But the data is typically not pulled with JS. Still, in case it is, we 
need to inform JS developers of DOM differences. As for tools that parse 
the HTML, again, we'll need to make them aware of the differences, also.

We are mixing content that is marked up as XML into HTML. Yes, I am 
aware that it's no longer the same thing. But if application developers 
are aware of the differences, between how the SVG is perceived to the 
web page in HTML, as compared to how it is perceived in XHTML, they can 
adjust their applications accordingly. After all, applications have been 
pulled RDF/XML for years for trackbacks, which is XML inserted as HTML 
comments.

Information such as this can be succinctly recorded in the HTML5 
specification, demonstrated in a Primer or tutorial or whatever, and 
that should be sufficient. When encountered in the validator, a simple 
warning about DOM differences when the first namespaced entity is 
uncovered, should be sufficient.

It is the path of least resistance, and the path of least harm.

However, the round trip is a different story. In my opinion, a round 
trip will be just that: SVG/XML will be pasted, as is, into HTML, and 
then copied as is, back out. No problems.

But in HTML5 served as HTML, we do allow crufty stuff in, such as 
unquoted attributes. I modified my test case at 
http://burningbird.net/newbook/testhtml5.php, incorporating crappy 
markup we would expect from HTML. The SVG is still rendered. I copied 
and pasted the SVG into a separate file, at 
http://burningbird.net/newbook/circle.svg. Needless to say, the crappy 
markup caused the FF nightly to barf.

I believe the HTML5 specification states that tools will need to provide 
a way to correct the markup, when crappy stuff is introduced. I believe 
that the current Opera supports this with the Dragonfly application, 
though a right click on the SVG, with an option to copy to clipboard, or 
open as separate image in separate tab would be wonderful. And would 
require cleanup of the crappy markup, of course.

Again, though, this is a separate topic area, though it is an important 
topic area. However, I believe this was already addressed at one time. 
If so, we shouldn't traverse this path again, we don't have the time.

What was new, it the original topic of this thread: how are the 
namespaced entities in SVG are to be treated.

Shelley
Received on Wednesday, 9 September 2009 12:45:37 UTC