Automatically generate XPath Expressions in Java
Anyone who’s worked with J2EE application servers has more than a passing familiarity with XML. It’s the weapon of choice for configuration in the 3 platforms I’ve spent most of my time working with; JBoss, WebLogic, and WebSphere. When I first started working at Phurnace, I would verify these files by hand using a diff tool that could handle single documents or a directory tree. I would compare a ‘known good’ configuration to the configuration produced by Phurnace Deliver. This worked great while I was getting my feet wet and finding my way around each platform’s configuration peculiarities.


For obvious reasons, this method sucked a lot after I got a handle on the different configuration conventions used by each platform. The process was incredibly monotonous and error prone. My mind would start to wander as I robotically clicked through the differences in dozens of XML files. Most of the time the differences were expected due to system generated unique IDs or environmental differences. I decided I needed to automate this process simply because I dreaded doing it manually. Because we were dealing with XML, XPath seemed to be the obvious choice for programmatic validation. So I just needed to whip up some XPath expressions and a simple Java class to resolve them and I was done. No bubbles, no troubles. Right?


Well I was forgetting that a WebSphere profile with one node and one server contains more than 100 individual XML files ranging in size from <1K to 600K+. If I thought using the diff tools was tedious, it was nothing compared to generating thousands of XPath expressions by hand. Plus, every time the product expanded into new areas of configuration, I was looking at hours of updating the automation data to keep it current. I needed a tool that would generate these XPath expressions for me. I had some specific needs that weren’t addressed by the sample classes and open source solutions I was finding on the web so I decided to write my own. Keep in mind:

 

  • Coding isn’t my forte. You may find bone-headed logic or hacky work-arounds. But for my purposes it works and it’s quick. Consider it your chance to test the tester and file a bug by leaving a comment. I’d love to get feedback and improve it!
  • This is a simplified version of the class I use and it works fine on simple XML. This sample class has some limitations. It doesn’t handle namespace designations and I haven’t added handling for special characters. I encourage you to play around with it and add methods that suit your purposes.

import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;

import org.apache.commons.lang.StringUtils;
import org.w3c.dom.Document;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;

public class XPathGenerator {

/*
* fileList - a list of files to generate XPath expressions from xpath - an
* xpath instance used to verify generated xpaths
*/
private static List fileList = new ArrayList();
private static XPath xpath = XPathFactory.newInstance().newXPath();

/*
* Simple utility method that checks for whitespace at the beginning a text
* node
*/
private static boolean isWhiteSpace(String nodeText) {
if (nodeText.startsWith("\r") || nodeText.startsWith("\t")
|| nodeText.startsWith("\n") || nodeText.startsWith(" "))
return true;
else
return false;
}

/*
* Simple utility method to verify brutishly assembled xpath expressions
*/
private static void checkXPath(String xpathExpression, Node node) {
Object xpathCheck;
try {
xpathCheck = xpath.evaluate(xpathExpression, node
.getOwnerDocument(), XPathConstants.BOOLEAN);
if (xpathCheck.toString() == "true") {
/*
* print the file/xpath combo to use for future verification For
* Example:
* file:/C:/tmp/sample.xml=/rootNode/firstChild/firstGrandChild
* [@attrib="value"][text()="some text"]
*/
System.out.println(node.getOwnerDocument().getDocumentURI()
+ "=" + xpathExpression);
}
} catch (XPathExpressionException xpe) {
System.out.println("\n\n" + xpathExpression);
xpe.printStackTrace();
}
}

/*
* Simple utility method to check for a text node on the currenrt XML
* element
*/
private static boolean hasValidText(Node node) {
String textValue = node.getTextContent();

return (textValue != null && textValue != ""
&& isWhiteSpace(textValue) == false
&& !StringUtils.isWhitespace(textValue) && node.hasChildNodes());
}

/*
* Simple utility to check for attributes on the current element
*/
private static boolean hasValidAttributes(Node node) {
return (node.getAttributes().getLength() > 0);

}

/*
* Build XPath attribute list for individual XML Element Nodes Iterate
* through all attribute nodes and build a string that can be included in an
* XPath expression to validate an XML document. Resulting string will look
* like: [@attrib1="value1" and @attrib2="value2"] Skip this if the
* attribute list is empty.
*/

private static String buildAttribString(Node node, String pathExpr) {
NamedNodeMap nnlist = node.getAttributes();
pathExpr = pathExpr + "[";

int attribCount = 0;
// iterate over attributes
for (int i = 0; i < nnlist.getLength(); i++) {
// grab attribute name and value
String attribName = nnlist.item(i).getNodeName();
String attribValue = nnlist.item(i).getNodeValue();

// if we've already added attributes to the path expression append
// the current one
if (attribCount > 0) {
pathExpr = pathExpr + " and ";
}

pathExpr = pathExpr + "@" + attribName + "=\"" + attribValue + "\"";
attribCount++;
}

pathExpr = pathExpr + "]";

return pathExpr;
}

/*
* processNode checks for attributes and text nodes on an xml node and
* process them accordingly
*/
public static NodeList processNode(Node node) {

if (hasValidAttributes(node) || hasValidText(node)) {
String pathExpr = "/" + node.getNodeName();

// check for attributes
if (hasValidAttributes(node)) {
pathExpr = buildAttribString(node, pathExpr);
}

// Make a copy of node to preserve it's state
Node tmpNode = node;

// Build pathExpr for XPath Expression by working backward through
// the XML until we hit the document node.
while (tmpNode.getParentNode() != null
&& tmpNode.getParentNode().getNodeType() != Node.DOCUMENT_NODE) {

tmpNode = tmpNode.getParentNode();
String nodeName = tmpNode.getNodeName();

if (hasValidAttributes(tmpNode)) {
String attribString = buildAttribString(tmpNode, nodeName);
pathExpr = "/" + attribString + pathExpr;
} else {
pathExpr = "/" + nodeName + pathExpr;
}
}

if (hasValidText(node)) {

/*
* Build XPath text value expression to verify text values for
* node. This is a little tricky because linebreaks, tabs and
* spaces included in the XML document are considered text
* nodes.
*/

// Copy original node to a textNode to preserve state of
// original node for future operations
Node textNode = node;

/*
* This check iterates through child nodes of the original node
* until it reaches the next element node. As long as the node
* is not null, starts with white space and is not an element
* node, move on to the next node
*/
while (textNode != null
&& isWhiteSpace(textNode.getTextContent()) == true
&& StringUtils.isWhitespace(textNode.getTextContent())
&& textNode.getNodeType() != Node.ELEMENT_NODE) {
textNode = textNode.getFirstChild();
}

/*
* If the text content of the current node is not null, doesn't
* start with whitespace and has child nodes, we grab the text
* and frame it into an XPath text argument. A sample element
* and it's resulting text argument:
*
* some text here YIELDS
* element1[text()="some text here"]
*
* NOTE: The check for child nodes prevents us from creating a
* blank text argument for self contained elements (example:
* )
*/
pathExpr = pathExpr + "[text()=\""
+ StringUtils.strip(textNode.getTextContent()) + "\"]";

}

checkXPath(pathExpr, node);
}
// gather children and return
return node.getChildNodes();
}

// This function takes a group of nodes and generates XPath expressions for
// them until it runs out of nodes
public static void processNodeList(NodeList nodelist) {
for (int i = 0; i < nodelist.getLength(); i++) {
if (nodelist.item(i).getNodeType() == Node.ELEMENT_NODE
&& (hasValidAttributes(nodelist.item(i)) || hasValidText(nodelist
.item(i)))) {
processNode(nodelist.item(i));
}
processNodeList(nodelist.item(i).getChildNodes());
}
}

// This function takes the cmd line argument and adds to fileList all files
// in the directory and it's subdirectory
private static void processDir(File dir) {
File[] fileArray = dir.listFiles();

for (int i = 0; i < fileArray.length; i++) {
if (fileArray[i].isFile()) {
if (fileArray[i].toString().endsWith(".xml")) {
fileList.add(fileArray[i]);
}
} else if (fileArray[i].isDirectory()) {
processDir(fileArray[i]);
}
}
}

// This starts the process of an individual XML file
private static void processXML(File xmlFile) throws SAXException,
IOException, ParserConfigurationException, XPathExpressionException {
try {
DocumentBuilderFactory factory = DocumentBuilderFactory
.newInstance();
factory.setIgnoringComments(true);

DocumentBuilder parser = factory.newDocumentBuilder();
Document doc = parser.parse(xmlFile);

NodeList startlist = doc.getChildNodes();

processNodeList(startlist);
} catch (Exception e) {

}
}

/*
* Entry point. Pass in the xml file or top level directory of a group of
* files to generate XPath expressions for them.
*/

public static void main(String[] args) {
File startDir = new File(args[0]);
try {
if (startDir.isFile()) {
processXML(startDir);
} else if (startDir.isDirectory()) {
processDir(startDir);
for (int i = 0; i < fileList.size(); i++) {
processXML(fileList.get(i));
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
}

Comments (4)Add Comment
Suggestion to work with schema aware documents
written by Adi B, December 23, 2008
Change getNodeName() to use getLocalName() for documents with name space elements and adding factory.setNamespaceAware(true).

Adi.
report abuse
vote down
vote up
Votes: +1
XML in Java makes me crave Groovy
written by James E. Ervin, March 15, 2009
Every time I have to write code to parse and/or generate XML documents makes me wish to do more work in Groovy. I know that once you try it, you don't like going back to Java, I sure don't.

http://groovy.codehaus.org/Reading XML with Groovy and XPath
report abuse
vote down
vote up
Votes: +1
...
written by edwin, February 25, 2010
There are some constructs that are specifically string expressions, but in addition any other kind of expression can be used in a context where a string expression is required:

* A numeric expression is converted to a string by giving its conventional decimal representation, for example the value -3.5 is displayed as "-3.5", and 2.0 is displayed as "2".
* A boolean expression is displayed as one of the strings "true" or "false".
* When a node-set expression is used in a string context, only the first node of the node-set (in document order) is used: the value of this node is converted to a string dedicated server . The value of a text node is the character content of the node; the value of any other node is the concatenation of all its descendant text nodes.
* A result tree fragment is technically converted to a string in the same way as a node-set; but since the corresponding node-set will always contain a single node, the effect is to generate all the descendant text nodes ignoring all element tags
report abuse
vote down
vote up
Votes: +0
XPath Lister
written by Edison Kay, July 16, 2010
Thanks for the code. There is a executable tool that takes a directory full of XML files and dumps all the unique XPath's found in it in an xpath like syntax:

http://www.bestcode.com/html/x...lizer.html

It helps when you don't know the structure of documents you got but want to type xpath expressions right away.
report abuse
vote down
vote up
Votes: +0

Write comment
quote
bold
italicize
underline
strike
url
image
quote
quote
smile
wink
laugh
grin
angry
sad
shocked
cool
tongue
kiss
cry
smaller | bigger

busy