Reading an XML file, general question
Hi all,
This time, I just have a quick question, and I can't seem to get a straight answer from anyone I know. I'm thinking about getting Microsoft Word 2007 soon, and from what I've heard, since it's XML formatted, you can read it right into Java as a Word file (that is, you don't have to bother converting it to text) with a FileReader and BufferedReader, then parse it just like you would a text file. Is that true?
Thanks,
Jezzica85
[467 byte] By [
jezzica85a] at [2007-11-27 7:13:31]

I've got Word 2003 and it has the option to save as XML. I created the document with the contents "Hello, world!".
Here is the XML produced (I was nice and indented it.)
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<?mso-application progid="Word.Document"?>
<w:wordDocument
xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml"
xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:w10="urn:schemas-microsoft-com:office:word"
xmlns:sl="http://schemas.microsoft.com/schemaLibrary/2003/core"
xmlns:aml="http://schemas.microsoft.com/aml/2001/core"
xmlns:wx="http://schemas.microsoft.com/office/word/2003/auxHint"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882"
xmlns:wsp="http://schemas.microsoft.com/office/word/2003/wordml/sp2"
w:macrosPresent="no"
w:embeddedObjPresent="no"
w:ocxPresent="no"
xml:space="preserve">
<w:ignoreElements w:val="http://schemas.microsoft.com/office/word/2003/wordml/sp2"/>
Hello, world
Terrence Stamp
Terrence Stamp
1
1
2007-06-12T17:00:00Z
2007-06-12T17:01:00Z
1
2
12
Nerd in VWs
1
1
13
11.8134
<w:fonts>
<w:defaultFonts
w:ascii="Times New Roman"
w:fareast="Times New Roman"
w:h-ansi="Times New Roman"
w:cs="Times New Roman"/>
</w:fonts>
<w:styles>
<w:versionOfBuiltInStylenames w:val="4"/>
<w:latentStyles w:defLockedState="off" w:latentStyleCount="156"/>
<w:style w:type="paragraph" w:default="on" w:styleId="Normal">
<w:name w:val="Normal"/>
<w:rPr>
<wx:font wx:val="Times New Roman"/>
<w:sz w:val="24"/>
<w:sz-cs w:val="24"/>
<w:lang w:val="EN-US" w:fareast="EN-US" w:bidi="AR-SA"/>
</w:rPr>
</w:style>
<w:style w:type="character" w:default="on" w:styleId="DefaultParagraphFont">
<w:name w:val="Default Paragraph Font"/>
<w:semiHidden/>
</w:style>
<w:style w:type="table" w:default="on" w:styleId="TableNormal">
<w:name w:val="Normal Table"/>
<wx:uiName wx:val="Table Normal"/>
<w:semiHidden/>
<w:rPr>
<wx:font wx:val="Times New Roman"/>
</w:rPr>
<w:tblPr>
<w:tblInd w:w="0" w:type="dxa"/>
<w:tblCellMar>
<w:top w:w="0" w:type="dxa"/>
<w:left w:w="108" w:type="dxa"/>
<w:bottom w:w="0" w:type="dxa"/>
<w:right w:w="108" w:type="dxa"/>
</w:tblCellMar>
</w:tblPr>
</w:style>
<w:style w:type="list" w:default="on" w:styleId="NoList">
<w:name w:val="No List"/>
<w:semiHidden/>
</w:style>
</w:styles>
<w:docPr>
<w:view w:val="print"/>
<w:zoom w:percent="100"/>
<w:doNotEmbedSystemFonts/>
<w:proofState w:spelling="clean" w:grammar="clean"/>
<w:attachedTemplate w:val=""/>
<w:defaultTabStop w:val="720"/>
<w:punctuationKerning/>
<w:characterSpacingControl w:val="DontCompress"/>
<w:optimizeForBrowser/>
<w:validateAgainstSchema/>
<w:saveInvalidXML w:val="off"/>
<w:ignoreMixedContent w:val="off"/>
<w:alwaysShowPlaceholderText w:val="off"/>
<w:compat>
<w:breakWrappedTables/>
<w:snapToGridInCell/>
<w:wrapTextWithPunct/>
<w:useAsianBreakRules/>
<w:dontGrowAutofit/>
</w:compat>
<wsp:rsids>
<wsp:rsidRoot wsp:val="008866C9"/>
<wsp:rsid wsp:val="008866C9"/>
</wsp:rsids>
</w:docPr>
<w:body>
<wx:sect>
<w:p wsp:rsidR="008866C9" wsp:rsidRDefault="008866C9">
<w:r>
<w:t>Hello, world!</w:t>
</w:r>
</w:p>
<w:sectPr wsp:rsidR="008866C9">
<w:pgSz w:w="12240" w:h="15840"/>
<w:pgMar
w:top="1440"
w:right="1800"
w:bottom="1440"
w:left="1800"
w:header="708"
w:footer="708"
w:gutter="0"/>
<w:cols w:space="708"/>
<w:docGrid w:line-pitch="360"/>
</w:sectPr>
</wx:sect>
</w:body>
</w:wordDocument>
Arg! the filter is stripping out some stuff, but you get the idea.
Message was edited by:
Hippolyte