![]()
Step 6 - The DOM Tree
The HTML DOM fixer ensures that the application builds a technically correct DOM even if the incoming HTML is incorrect. This step prints the DOM tree to a file after it has been fixed.
The application must listen for the contentLoading,end event to make sure the tree has been built. When the event is fired, the code creates a new thread to print the tree.
The following is part of the new propertyChange method in BrowserFrame:
else if (newValue.equals(PropertyConstants.BEGIN)) { backButton.setEnabled( stormBase.getHistoryManager().canGoBack(viewportId)); forwardButton.setEnabled( stormBase.getHistoryManager().canGoForward(viewportId)); } else if ((newValue.equals(PropertyConstants.END)) && (viewport.getId().equals(viewportId))) { if (viewport.getPilot() instanceof ThePilot) { ThePilot pilot = (ThePilot)viewport.getPilot(); Thread domTreeThread = new Thread(new DomTree(pilot.getDDocument())); domTreeThread.start(); } }Next, the code passes ice.pilots.html4.DDocument to the DOMTree class.
The code tries to give the output in the same form as the W3C HTML validator, including verification of attributes. This validator also confirms if an input HTML document is compliant with W3C HTML DTD, but it is much more "strict", whereas an ICEbrowser application "forgives" many common errors.
Following is the sample code to accomplish this:
class DomTree implements Runnable { private DDocument dDocument; private OutputStream outputStream; DomTree(DDocument dDocument) { this.dDocument = dDocument; try { File userDirectory = new File(System.getProperty("user.dir")); outputStream = new FileOutputStream(new File(userDirectory, "domtree.txt")); } catch (FileNotFoundException exception) { System.err.println( "Unable to create file for parse tree output...aborting"); } } public void run() { if (outputStream != null) { writeln(""); writeln("<!--"); writeln("Document source is " + dDocument.getURL()); writeln("-->"); writeln(""); printAll((DElement)dDocument.getDocumentElement(), 0); } } private void printAll(DElement dElement, int numberOfTabs) { StringBuffer stringBuffer = new StringBuffer(); for (int i = 0; i < numberOfTabs; i++) { stringBuffer.append(" "); } String indentation = stringBuffer.toString(); String nodeName = dElement.getNodeName().toUpperCase(); String attributes = ""; NamedNodeMap namedNodeMap = dElement.getAttributes(); for (int i = 0; i < namedNodeMap.getLength(); i++) { Attr attribute = (Attr)namedNodeMap.item(i); attributes += " " + attribute.getName() + "=\"" + attribute.getValue() + "\""; } writeln(indentation + "<" + nodeName + attributes + ">"); DNode currentDNode = (DNode)dElement.getFirstChild(); while (currentDNode != null) { if (currentDNode instanceof DElement) { printAll((DElement)currentDNode, numberOfTabs + 1); } else if (currentDNode instanceof DTextNode) { writeln(indentation + " " + currentDNode); } currentDNode = (DNode)currentDNode.getNextSibling(); } writeln(indentation + "</" + nodeName + ">"); } private void writeln(String line) { try { outputStream.write((line + "\n").getBytes()); } catch (IOException exception) { // do nothing. } } }You also need the following import statements:
import java.io.*; import org.w3c.dom.Attr; import org.w3c.dom.NamedNodeMap;
- On creation, saves the DDocument as an instance variable and creates a text file to write to, called domtree.txt, in the current directory.
- In the run( ) method, writes a header in HTML comment format before calling the printAll( ) method with the documentElement, in this case the HTML element, as an argument.
- The printAll( ) method is recursive, with HTML as a starting point. It writes the name of the node as well as all the attributes (with values) to the output, using the writeln( ) method. This is done in HTML format to ensure similarity with the source HTML document.
- Calls the getNodeName( ) method on the DElement. It returns the HTML tag name, the getAttributes( ) method which returns a org.w3c.dom.NamedNodeMap of org.w3c.dom.Attr objects, and finally, the getName( ) and getValue( ) methods on the Attr objects.
- For the children of the Element, calls the printAll( ) recursively, except for instances of DTextNode. For these, their text value is printed using the toString( ) method.
- Traverses all the children using a while loop, ending when the getNextSibling( ) method returns null.
- Writes the end tag to the file.
You could make this method more sophisticated, for example, by making the printing of the tree optional with buttons or system properties. You could also save the entire document in HTML form, with images. The steps demonstrated in the tutorial code are just an introduction.
|
Copyright 2005. ICEsoft Technologies, Inc. http://www.icesoft.com |