Posted by: Pete Pickerill on Aug 07, 2008
For obvious reasons, this method sucked a lot after I got a handle on the different configuration conventions used by each platform. The process was incredibly monotonous and error prone. My mind would start to wander as I robotically clicked through the differences in dozens of XML files. Most of the time the differences were expected due to system generated unique IDs or environmental differences. I decided I needed to automate this process simply because I dreaded doing it manually. Because we were dealing with XML, XPath seemed to be the obvious choice for programmatic validation. So I just needed to whip up some XPath expressions and a simple Java class to resolve them and I was done. No bubbles, no troubles. Right?
Well I was forgetting that a WebSphere profile with one node and one server contains more than 100 individual XML files ranging in size from <1K to 600K+. If I thought using the diff tools was tedious, it was nothing compared to generating thousands of XPath expressions by hand. Plus, every time the product expanded into new areas of configuration, I was looking at hours of updating the automation data to keep it current. I needed a tool that would generate these XPath expressions for me. I had some specific needs that weren’t addressed by the sample classes and open source solutions I was finding on the web so I decided to write my own. Keep in mind:
- Coding isn’t my forte. You may find bone-headed logic or hacky work-arounds. But for my purposes it works and it’s quick. Consider it your chance to test the tester and file a bug by leaving a comment. I’d love to get feedback and improve it!
- This is a simplified version of the class I use and it works fine on simple XML. This sample class has some limitations. It doesn’t handle namespace designations and I haven’t added handling for special characters. I encourage you to play around with it and add methods that suit your purposes.
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.apache.commons.lang.StringUtils;
import org.w3c.dom.Document;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class XPathGenerator {
/*
* fileList - a list of files to generate XPath expressions from xpath - an
* xpath instance used to verify generated xpaths
*/
private static List fileList = new ArrayList();
private static XPath xpath = XPathFactory.newInstance().newXPath();
/*
* Simple utility method that checks for whitespace at the beginning a text
* node
*/
private static boolean isWhiteSpace(String nodeText) {
if (nodeText.startsWith("\r") || nodeText.startsWith("\t")
|| nodeText.startsWith("\n") || nodeText.startsWith(" "))
return true;
else
return false;
}
/*
* Simple utility method to verify brutishly assembled xpath expressions
*/
private static void checkXPath(String xpathExpression, Node node) {
Object xpathCheck;
try {
xpathCheck = xpath.evaluate(xpathExpression, node
.getOwnerDocument(), XPathConstants.BOOLEAN);
if (xpathCheck.toString() == "true") {
/*
* print the file/xpath combo to use for future verification For
* Example:
* file:/C:/tmp/sample.xml=/rootNode/firstChild/firstGrandChild
* [@attrib="value"][text()="some text"]
*/
System.out.println(node.getOwnerDocument().getDocumentURI()
+ "=" + xpathExpression);
}
} catch (XPathExpressionException xpe) {
System.out.println("\n\n" + xpathExpression);
xpe.printStackTrace();
}
}
/*
* Simple utility method to check for a text node on the currenrt XML
* element
*/
private static boolean hasValidText(Node node) {
String textValue = node.getTextContent();
return (textValue != null && textValue != ""
&& isWhiteSpace(textValue) == false
&& !StringUtils.isWhitespace(textValue) && node.hasChildNodes());
}
/*
* Simple utility to check for attributes on the current element
*/
private static boolean hasValidAttributes(Node node) {
return (node.getAttributes().getLength() > 0);
}
/*
* Build XPath attribute list for individual XML Element Nodes Iterate
* through all attribute nodes and build a string that can be included in an
* XPath expression to validate an XML document. Resulting string will look
* like: [@attrib1="value1" and @attrib2="value2"] Skip this if the
* attribute list is empty.
*/
private static String buildAttribString(Node node, String pathExpr) {
NamedNodeMap nnlist = node.getAttributes();
pathExpr = pathExpr + "[";
int attribCount = 0;
// iterate over attributes
for (int i = 0; i < nnlist.getLength(); i++) {
// grab attribute name and value
String attribName = nnlist.item(i).getNodeName();
String attribValue = nnlist.item(i).getNodeValue();
// if we've already added attributes to the path expression append
// the current one
if (attribCount > 0) {
pathExpr = pathExpr + " and ";
}
pathExpr = pathExpr + "@" + attribName + "=\"" + attribValue + "\"";
attribCount++;
}
pathExpr = pathExpr + "]";
return pathExpr;
}
/*
* processNode checks for attributes and text nodes on an xml node and
* process them accordingly
*/
public static NodeList processNode(Node node) {
if (hasValidAttributes(node) || hasValidText(node)) {
String pathExpr = "/" + node.getNodeName();
// check for attributes
if (hasValidAttributes(node)) {
pathExpr = buildAttribString(node, pathExpr);
}
// Make a copy of node to preserve it's state
Node tmpNode = node;
// Build pathExpr for XPath Expression by working backward through
// the XML until we hit the document node.
while (tmpNode.getParentNode() != null
&& tmpNode.getParentNode().getNodeType() != Node.DOCUMENT_NODE) {
tmpNode = tmpNode.getParentNode();
String nodeName = tmpNode.getNodeName();
if (hasValidAttributes(tmpNode)) {
String attribString = buildAttribString(tmpNode, nodeName);
pathExpr = "/" + attribString + pathExpr;
} else {
pathExpr = "/" + nodeName + pathExpr;
}
}
if (hasValidText(node)) {
/*
* Build XPath text value expression to verify text values for
* node. This is a little tricky because linebreaks, tabs and
* spaces included in the XML document are considered text
* nodes.
*/
// Copy original node to a textNode to preserve state of
// original node for future operations
Node textNode = node;
/*
* This check iterates through child nodes of the original node
* until it reaches the next element node. As long as the node
* is not null, starts with white space and is not an element
* node, move on to the next node
*/
while (textNode != null
&& isWhiteSpace(textNode.getTextContent()) == true
&& StringUtils.isWhitespace(textNode.getTextContent())
&& textNode.getNodeType() != Node.ELEMENT_NODE) {
textNode = textNode.getFirstChild();
}
/*
* If the text content of the current node is not null, doesn't
* start with whitespace and has child nodes, we grab the text
* and frame it into an XPath text argument. A sample element
* and it's resulting text argument:
*
* some text here YIELDS
* element1[text()="some text here"]
*
* NOTE: The check for child nodes prevents us from creating a
* blank text argument for self contained elements (example:
* )
*/
pathExpr = pathExpr + "[text()=\""
+ StringUtils.strip(textNode.getTextContent()) + "\"]";
}
checkXPath(pathExpr, node);
}
// gather children and return
return node.getChildNodes();
}
// This function takes a group of nodes and generates XPath expressions for
// them until it runs out of nodes
public static void processNodeList(NodeList nodelist) {
for (int i = 0; i < nodelist.getLength(); i++) {
if (nodelist.item(i).getNodeType() == Node.ELEMENT_NODE
&& (hasValidAttributes(nodelist.item(i)) || hasValidText(nodelist
.item(i)))) {
processNode(nodelist.item(i));
}
processNodeList(nodelist.item(i).getChildNodes());
}
}
// This function takes the cmd line argument and adds to fileList all files
// in the directory and it's subdirectory
private static void processDir(File dir) {
File[] fileArray = dir.listFiles();
for (int i = 0; i < fileArray.length; i++) {
if (fileArray[i].isFile()) {
if (fileArray[i].toString().endsWith(".xml")) {
fileList.add(fileArray[i]);
}
} else if (fileArray[i].isDirectory()) {
processDir(fileArray[i]);
}
}
}
// This starts the process of an individual XML file
private static void processXML(File xmlFile) throws SAXException,
IOException, ParserConfigurationException, XPathExpressionException {
try {
DocumentBuilderFactory factory = DocumentBuilderFactory
.newInstance();
factory.setIgnoringComments(true);
DocumentBuilder parser = factory.newDocumentBuilder();
Document doc = parser.parse(xmlFile);
NodeList startlist = doc.getChildNodes();
processNodeList(startlist);
} catch (Exception e) {
}
}
/*
* Entry point. Pass in the xml file or top level directory of a group of
* files to generate XPath expressions for them.
*/
public static void main(String[] args) {
File startDir = new File(args[0]);
try {
if (startDir.isFile()) {
processXML(startDir);
} else if (startDir.isDirectory()) {
processDir(startDir);
for (int i = 0; i < fileList.size(); i++) {
processXML(fileList.get(i));
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
In Xpath Expressions, XML

written by James E. Ervin, March 15, 2009
http://groovy.codehaus.org/Reading XML with Groovy and XPath
written by edwin, February 25, 2010
* A numeric expression is converted to a string by giving its conventional decimal representation, for example the value -3.5 is displayed as "-3.5", and 2.0 is displayed as "2".
* A boolean expression is displayed as one of the strings "true" or "false".
* When a node-set expression is used in a string context, only the first node of the node-set (in document order) is used: the value of this node is converted to a string dedicated server . The value of a text node is the character content of the node; the value of any other node is the concatenation of all its descendant text nodes.
* A result tree fragment is technically converted to a string in the same way as a node-set; but since the corresponding node-set will always contain a single node, the effect is to generate all the descendant text nodes ignoring all element tags
Adi.