pvillega’s posterous

pvillega’s posterous

Pere Villega  //  Born in Barcelona, living in Dublin, and tagged as geek since youth. Developer in the path to becoming a software architect. I swear this is not a proper blog :)

Feb 16 / 6:18am

XML and Java

According to the Wikipedia XML (Extensible Markup Language) is a general-purpose specification for creating custom markup languages. It is classified as an extensible language, because it allows the user to define the mark-up elements. XML's purpose is to aid information systems in sharing structured data, especially via the Internet, to encode documents, and to serialize data. XML is heavily used in Java, specially as the language to define configuration files. Sometimes this usage becomes (in my opinion) an abuse, the reason annotations where developed in Java 5. But besides that XML is a good format to transmit data and like it or not you will probably need to work with it. In this page I'll try to sum up the standard ways to work with XML using Java. To refresh your memory, JAXP (Java API for XML Processing) provides two leading methods:

  • SAX - Simple API for XML - event driven method where you write a processor which receives events while the XML is being read. This is also known as "stream parser". Events include Start Document, Start Element, End Element, etc.
  • DOM - Document Object Model - means the XML is modelled a graph of nodes that may be traversed by code with methods like Get Children, Get Parent, etc.
  • There's an alternative called StAX, that was though as something half-way between SAX and DOM that can write documents. Is similar to SAX but it can't use XPath either as it works streaming the content, which (to me) is a huge inconvenient.

The SAX approach is considered very fast and memory efficient, while the DOM is usually easier to handle by code, especially if the processing requires information from multiple nodes. There's another alternative outside JAXP, called JAXB, that allows you to access the content as objects in a way similar to what you do when working with web services and is gaining popularity due to the benefits of this approach. Personally I favour DOM as it allows you to use XPath queries to retrieve data efficiently and, being honest, few times will you have such strict memory requirements or big files as for not being able to use DOM. I've never used JAXB but it seems an interesting tool to consider in the future.

SAX

SAX is the way to go for truly efficient processing of XML files, but at the cost of lots of extra code and a harder to understand model. You can see an example here, but I have to say I've never used SAX in my code. As its mentioned on the comments of the example, SAX makes easy to hardcode the structure of the XML file (adding coupling) and can't write documents. For SAX you need 2 classes, a parser and the event handler. The event handler might be:

public class SimpleHandler extends DefaultHandler {

public void startElement(String namespaceURI, String localName, String qName, Attributes atts)

throws SAXException {

if (“book”.equals(localName)) {

System.out.print(“Book details: Book ID: “ + atts.getValue(“id”));

} else {

System.out.print(localName + “: “);

}

}

public void characters(char[] ch, int start, int length)

throws SAXException {

System.out.print(new String(ch, start, length));

}

public void endElement(String namespaceURI, String localName, String qName)

throws SAXException {

if (“book”.equals(localName)) {

System.out.println(“=================================”);

}

}

}

while the parser could be:

  SAXParserFactory factory = SAXParserFactory.newInstance();

factory.setNamespaceAware(true);

factory.setValidating(true);

SAXParser saxParser = factory.newSAXParser();

saxParser.setProperty(“http://java.sun.com/xml/jaxp/properties/schemaLanguage”,

http://www.w3.org/2001/XMLSchema”);

XMLReader reader = saxParser.getXMLReader();

reader.setErrorHandler(new SimpleErrorHandler());

reader.setContentHandler(new SimpleHandler());

reader.parse(“src/books.xml”);

DOM

The method I've used the most and probably the easiest to use. You just load the document in memory and query it using XPath. Youc an add or remove nodes from the document and write the result into a file. This code shows a DOM parser that uses XPath to query the content:

  DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();

DocumentBuilder builder = dbFactory.newDocumentBuilder();

Document xmlDocument = builder.parse(“src/books.xml”);

XPathFactory factory = XPathFactory.newInstance();

XPath xPath = factory.newXPath();

String copyright = xPath.evaluate(“/publications/book[publisher= ‘Wrox’]/copyright”, xmlDocument);

System.out.println(“Copyright: “ + copyright);

NodeList nodes = (NodeList) xPath.evaluate(“//book”, xmlDocument, XPathConstants.NODESET);

String bookid = xPath.evaluate(“/publications//book[contains(title,’XML’) and position()=3]/@id”, xmlDocument);

System.out.println(“Book ID: “ + bookid);

JAXB

JAXB allows you to marshal and marshal objects into XML. The job is transparent to the user, that only provides the schema to the library and starts working with the corresponding objects. JAXB is particularly useful when the specification is complex and changing. In such a case, regularly changing the XML Schema definitions to keep them synchronised with the Java definitions can be time consuming and error prone. To bind the schema you use a tool called xcj that reads the schema and generates the proper classes, similar to the wsdl2java tool used to work with web services. The output is a set of classes to be used in your application. To read a document, you call a context

  JAXBContext jc = JAXBContext.newInstance("test.jaxb");

Unmarshaller unmarshaller = jc.createUnmarshaller();

Collection collection= (Collection) unmarshaller.unmarshal(new File( "books.xml"));

//now you can retrieve data

collection.getBooks();

...

test.jaxb is the package where you stored the generated classes. This process allows you to validate the data against the schema, reporting errors. It can also be used to generate XML files from the objects. You can find more detail on how it works in this tutorial from Sun.

Loading mentions Retweet

Filed under // dom java jaxb sax xml

Comments (0)