[an error occurred while processing this directive]

Word to XML: Homepage of the project Wrocco that deals with conversion from Word to XML. [] (Word to XML), home page, page 720979
https://www.purl.org/stefan_ram/pub/wrocco_en (permalink) is the canonical URI of this page.
Stefan Ram

Wrocco

Introduction to Wrocco

Wrocco  converts the text of a Microsoft ® Word  document to an XML  representation. Beyond the raw text, the names of paragraph styles and character styles and document properties and variables are written to the XML  file. Wrocco  does not export information about direct formatting (without named styles). Tables, pictures and similar page elements are not supported.

Wrocco  was written in VBA  for Microsoft ® Word 2000, it possibly still may work with later or some earlier versions (not tested).

An Example for Wrocco

The page you are reading has been written with Microsoft  (R) Word. It has been converted to XML  with Wrocco  and then processed further in order to get this XHTML  page. You can view the example XML  output for this page as it was generated by Wrocco  from the Microsoft  (R) Word document using the following URI. (The page in the XML  file might be a previous version of this page; and some elements were removed.)

https://www.purl.org/stefan_ram/utf-8/720979_doc.xml

Obtaining the Wrocco Source Code

Legal information Wrocco  is an experimental software project in pre-alpha state. It is only intended for experienced programmers who can read and understand VBA  source code and estimate the risks involved. By using Wrocco  the user agrees that he will use it entirely on his own risk and will backup all data before using Wrocco. Wrocco  is not public domain. It is copyright 20022005 by Stefan Ram. Wrocco  may be used  for free but Wrocco  or any of its parts may not be redistributed as it is or as part of other software by anyone else than Stefan Ram and it may not be mirrored on any other server. It also may not be redistributed via software collections in any form.

Wrocco  can be obtained via an HTTP  get request as a simple text file with its VBA  source code using the following URI .

https://www.purl.org/stefan_ram/utf-8/wrocco

Installing Wrocco

From Word press Alt-F11 to go to VBA . In VBA  use Ctrl-R to view the project explorer window. Use the context menu to send the message Insert/Module to your document template file in the project explorer window. A new empty module window should open. Paste the complete Wrocco source code into this window. Save your document template.

Configuring Wrocco

In the source code, search for "Sub Main". In the next line, edit the text "c:\" to the desired output directory.

Running Wrocco

To start Wrocco, move the Cursor to the text "Sub Main" and press the key F5. The active document will be written out as an XML  file. If you learn more about VBA , you will learn more about its features and it will see possibilities to start macros even easier. But this is not the right place to teach VBA .

Comments?

I appreciate any bug reports or comments. Contact information is below.

See Also

Two VBA-macros
http://www.4haus.de/tips/wordtohtml.html
http://www.4haus.de/test/htmltags.txt
This (German language) pages describe tools to remove Office  specific parts from HTML  files created by Microsoft ® Office.
http://office.microsoft.com/germany/downloads/2000/Msohtmf2.aspx
http://office.microsoft.com/germany/assistance/2000/wDosPeeler.aspx
SGML Author for Word 
This Microsoft®-product was created in 1994, was for Word 6.0 and for Word 97 and was sold for $599. This is product not maintained anymore.
Office 2003 XML  Reference Schemas
http://www.microsoft.com/office/xml/default.mspx
FAQ-Entry (German language)
http://www.netandmore.de/faq/fom-serve/cache/857.html
MajiX  transforms RTF  to XML.
http://www.tetrasix.com
http://perso.wanadoo.fr/tetrasys/docs-1.2.2/default.html
R2Net  converts RTF  to HTML /XML
http://www.logictran.net/products/
Software (upCast  and downCast ) to convert from Word  to XML
http://www.infinity-loop.de
Word  as HTML /XML /SGML -Editor with the MarkupKit
http://www.schema.de/sitehtml/site-d/htmlexpo.htm
German-language report on a Microsoft -patent related to WordML 
heise "hps-23.01.04-000"heise "43948"
Patent-submission by Microsoft: Word-processing document stored in a single XML  file
http://v3.espacenet.com/origdoc?DB=EPODOC&IDX=EP1376387&QPN=EP1376387
WordML -Viewer for the Microsoft ® Internet Explorer
http://www.microsoft.com/downloads/details.aspx?FamilyID=19676b18-1bcd-4852-93ba-0b5a203ea731&DisplayLang=en
Requires Microsoft ® Word 2003.
Microsoft Office Assistance: Use Office HTML Filter to Create Web Pages that Download Faster
http://office.microsoft.com/assistance/preview.aspx?AssetID=HA010548651033

Microsoft ® Office  Documents can be imported into the free word processor OpenOffice Writer, i.e., they can be converted by OpenOffice Writer  to the format used by OpenOffice Writer. Then software to convert from the format OpenOffice Writer  to other formats can be applied. This is yet another way to create XML  from a Microsoft ® Word  document.

Writer2LaTeX, Writer2BibTeX  and Writer2xhtml
http://www.hj-gym.dk/~hj/writer2latex/

It is also possible to write a customized RTF -translator in C. An RTF -reader by Microsoft ® can be used as a starting point which is available as C  source code.

Other Problem Areas in RTF
http://msdn.microsoft.com/library/en-us/dnrtfspec/html/rtfspec_53.asp?frame=true
Appendix A. How to Write an RTF Reader
http://latex2rtf.sourceforge.net/rtfspec_45.html
Survey of Word to HTML conversion solutions
http://web.archive.org/web/20050316004851/http://www.e.govt.nz/web-guidelines/word-to-html-conversion.asp

About this page, Impressum  |   Form for messages to the publisher regarding this page  |   "ram@zedat.fu-berlin.de" (without the quotation marks) is the email-address of Stefan Ram.   |   Beginning at the start page often more information about the topics of this page can be found. (A link to the start page appears at the very top of this page.)  |   Copyright 2004 Stefan Ram, Berlin. All rights reserved. This page is a publication by Stefan Ram. slrprd, PbclevtugFgrsnaEnz