W3C home > Mailing lists > Public > www-math@w3.org > October 2004

RE: html - xml transformation - imbricating html <p> tags in header tags

From: Bernhard Keil <Bernhard.Keil@soft4science.com>
Date: Wed, 20 Oct 2004 14:40:20 +0200
To: "'zn p'" <pzn_04@yahoo.fr>, <www-math@w3.org>
Cc: <DSSSList@lists.mulberrytech.com>
Message-ID: <E1CKFkJ-0002hS-15@frink.w3.org>

Hello,
you can use XSLT to transform from XML to XML (or non-xml).
So the the input source has to be valid XML.

I general you cant transform html by XSLT, as html is not in general valid xml.
But your source example is a valid XML document. So if  all of your source documents
are valid XML like this html source example, than you can use 
XSLT to transorm it to some other XML format.

If your html source documents are not valid XML documents,
you can use a tool like Tidy to make it valid XML.
   

regards,
Bernhard Keil
http://www.soft4science.com



-----Original Message-----
From: www-math-request@w3.org [mailto:www-math-request@w3.org] On Behalf Of zn p
Sent: Wednesday, October 20, 2004 2:17 PM
To: www-math@w3.org
Cc: DSSSList@lists.mulberrytech.com
Subject: [Norton AntiSpam] html - xml transformation - imbricating html <p> tags in header tags


Hello,
 
I am trying to convert html files to xml files.  I need to nest (copy) subsections in parent sections, and paragraphs in the
corresponding sections .
 
Is it possible to do it with xsl?
 
Here is an example (knowing that the files are more complex than this):
 
h1 is the document title, h2 section title, h3 subsection title. <p> should go under the corresponding section, and section (h3)
should go under the parent section (h2) in the output xml.
 
<html>
<body>
<h1>Document Title</h1>
<p>Content of paragraph 1</p>
<h2>Header 2</h2>
<p>First paragraph of section 1</p>
<p>Second paragraph of section 1</p>
<h3>Header 3</h3>
<p>Test de p dans 2eme para 3</p>
<h2>Header 2</h2>
<p>First paragraph of section 2</p>
</body>
</html>
 
To be converted into:
<document>
<title>Document Title</title>
<paragraph>Content of paragraph 1</paragraph>
    <section>
        <title>Header 2>/title>
          <paragraph>First paragraph of section 1</paragraph>
           <paragraph>Second paragraph of section 1</paragraph>
                <section>
                    <title>Header 3</title>
                    <paragraph>First paragraph of subsection 1</paragraph>
                </section>
         </section>
        <section>
          <paragraph> First paragraph of section 2</paragraph>
      </section>
</document>
 
Thanks,
PZN


		
---------------------------------
Créez gratuitement votre Yahoo! Mail avec 100 Mo de stockage !
Créez votre Yahoo! Mail

Le nouveau Yahoo! Messenger est arrivé ! Découvrez toutes les nouveautés pour dialoguer instantanément avec vos amis.Téléchargez
GRATUITEMENT ici !
--0-1846014824-1098274601=:81511
Content-Type: text/html; charset=iso-8859-1
Content-Transfer-Encoding: 8bit

<DIV>Hello,<BR>&nbsp;<BR>I am trying to convert html files to xml files.&nbsp; I need to nest (copy) subsections in parent sections,
and paragraphs in the corresponding sections .<BR>&nbsp;<BR>Is it possible to do it with xsl?<BR>&nbsp;<BR>Here is an example
(knowing that the files are more complex than this):<BR>&nbsp;<BR>h1 is the document title, h2 section title, h3 subsection title.
&lt;p&gt; should go under the corresponding section, and section (h3) should go under the parent section (h2) in the output
xml.<BR>&nbsp;<BR>&lt;html&gt;<BR>&lt;body&gt;<BR>&lt;h1&gt;Document Title&lt;/h1&gt;<BR>&lt;p&gt;Content of paragraph
1&lt;/p&gt;<BR>&lt;h2&gt;Header 2&lt;/h2&gt;<BR>&lt;p&gt;First paragraph of section 1&lt;/p&gt;<BR>&lt;p&gt;Second paragraph of
section 1&lt;/p&gt;<BR>&lt;h3&gt;Header 3&lt;/h3&gt;<BR>&lt;p&gt;Test de p dans 2eme para 3&lt;/p&gt;<BR>&lt;h2&gt;Header
2&lt;/h2&gt;<BR>&lt;p&gt;First paragraph of section 2&lt;/p&gt;<BR>&lt;/body&gt;<BR>&lt;/html&gt;<BR>&nbsp;<BR>To
 be converted into:<BR>&lt;document&gt;<BR>&lt;title&gt;Document Title&lt;/title&gt;<BR>&lt;paragraph&gt;Content of paragraph
1&lt;/paragraph&gt;<BR>&nbsp;&nbsp;&nbsp; &lt;section&gt;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;title&gt;Header
2&gt;/title&gt;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;paragraph&gt;First paragraph of section
1&lt;/paragraph&gt;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;paragraph&gt;Second paragraph of section
1&lt;/paragraph&gt;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&lt;section&gt;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp
; &lt;title&gt;Header
3&lt;/title&gt;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp
; &lt;paragraph&gt;First paragraph of subsection
 1&lt;/paragraph&gt;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&lt;/section&gt;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;/section&gt;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&lt;section&gt;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;paragraph&gt; First paragraph of section
2&lt;/paragraph&gt;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;/section&gt;<BR>&lt;/document&gt;<BR>&nbsp;<BR>Thanks,<BR>PZN<BR></DIV><p>
		<hr size=1>
Créez gratuitement votre Yahoo! Mail avec <font color="red"><b>100 Mo de stockage !</b></font>
<br><a
href="http://fr.rd.yahoo.com/mail/taglines/*http://fr.rd.yahoo.com/evt=25917/*http://fr.rd.yahoo.com/mail/mail_taglines_100/default/
*http://fr.benefits.yahoo.com/">Créez votre Yahoo! Mail</a><br><br>
<font color="red"><b>Le nouveau Yahoo! Messenger est arrivé !</b></font> Découvrez toutes les nouveautés pour dialoguer
instantanément avec vos amis.
<a
href="http://fr.rd.yahoo.com/mail/taglines/*http://fr.rd.yahoo.com/evt=26111/*http://fr.rd.yahoo.com/messenger/mail_taglines/default
/*http://fr.messenger.yahoo.com">Téléchargez GRATUITEMENT ici !</a>
--0-1846014824-1098274601=:81511--
Received on Wednesday, 20 October 2004 12:39:15 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 5 February 2014 23:39:49 UTC