Transform XML with Groovy and XMLTask

(originally created 15 July 2007)

Abstract

Presented is an example using Groovy’s AntBuilder to invoke XMLTask to transform an XML file.  Also shown is how to unit test using the XMLUnit framework.

Jump to script
Jump to testing

Background

This is my third program used for learning Groovy. How it came about? I had to do an XML transform, change a flag value in an element. After looking at very cryptic sed, grep, awk, bash approaches, I decided that a naive Java program would be good enough to get this out the door ASAP, a simple state machine to traverse the file. This is a Java shop, so if I get run over by a truck, anyone could maintain it. So, I coded a Java class that performed the string replace. Not so unusual, of course, plenty of applications do this, like parse RSS feeds and so forth.

XML transform

The problem with a programmatic string replace of XML is that it is not semantically coherent. One is changing a tree structured data structure using a flat text based approach. Sure it works now, but changes to the data structure may break it. Plus, XML aware tools would provide better coding and testing. Thus, though we shipped my simple transformer I still was thinking about this; maybe next maintenace cycle I could replace it with something more robust. Should I have used XSLT or some other XML based approach, SAX, DOM, StAX, XQuery, JDOM, XOM? Forgot to mention that other preexisting scripts were already manipulating the XML files as text, so transforming and creating XML output that changes the ‘text’ layout could have broken existing processes.

After thinking about it I finally felt that XMLTask would offer the most direct approach. Essentially, this is just a single line, which in Ant would be:

<replace path="//UNIT[@name='widgetB']/VALUE/@value" withText="${newValue}"/>

Software used

  • Groovy 1.5
  • XMLTask 1.15.1
  • XMLUnit 1.1
  • JUnit 3.8.1
  • Ant 1.7
  • Java 1.5
  • GroovyIDE plugin for Eclipse 1.0.1

Requirements

Some requirements that illustrate why the Groovy AntBuilder was chosen:

  • Not change the XML text file except for the specific target node.  (don’t remember if this was true.  7/21/10)
  • Easy to write.
    • Uses mainstream language (Java, but…)
    • Compact, using scripting flavor
    • Plenty of docs on the language
    • Easy to test and debug
  • Command line driven;
  • Cross platform. Used on Windows and Linux, so no Bash or PowerShell scripting
  • No cygwin, just because it is not on each machine
  • Easy to maintain. That ruled out a one-line Perl script or monstrous Bash script with sed, awk, here documents, etc.
  • Reusable
    • Can be copied and used for other tasks. (Note, don’t worry about extensibility).
  • Performance.  Not a concern in this case.

Replace Script

As shown in listing 2, this is amazingly small. Sure there are other frameworks and libraries that are even more powerful, but this is within an existing framework, Ant, so its available as part of larger processes.

Not shown here is the hours I wasted trying to get XMLCatalog support to work so that the DOCTYPE could be handled properly. I’m sure if other XML technologies such as namespaces or Entities were being used, that would have also caused aggravations. Fortunaely, in this case, using XMLTask’s ‘entity’ element got around the custom ‘classpath’ path being used here. I left this in this example in case someone has the same issue. I saw a bunch of forum pleas for help with this.

Listing 2 script

/**
 * Example of using Groovy AntBuilder to invoke XMLTask via Ant API.
 * @author Josef Betancourt
 * @date 20071205T23:14
 */

 def SOURCE='data/storage.xml'
 def DEST='target/storage-new.xml'
 def XPATH='//UNIT[@name='widgetB']/VALUE/@value'
 def VALUE='Y'
 def SYSTEM = 'classpath://some/path/to/dtd/narly.dtd'
 def REMOTE = SYSTEM

 def ant = new AntBuilder()
 ant.sequential{
        path(id: "path") {
            fileset(dir: 'lib') {
                           include(name: "**/*.jar")
                  }
        }

        ant.taskdef(name:'xmltask',classname:
            'com.oopsconsultancy.xmltask.ant.XmlTask',
            classpathref: 'path')

        ant.xmltask(source:SOURCE,dest:DEST,
               expandEntityReferences:false,
               report:false,system:SYSTEM){
                 // don't use DTD spec'd in DOCTYPE
                 entity(remote:REMOTE,local:'')
                 replace(path:XPATH,withText:VALUE)
        }
 }
// end Replace.groovy script

How would have normal Ant have looked like? Not bad. In this case, the Ant script is just as small. The only advantage the Groovy approach would have, other then the avoidance of pointy brackets, is the potential to allow more programmatic manipulations. Of course, I had problems getting XML Catalogs to work in Ant too. Here is my plea to the open source movement: if your not going to document it well, don’t bother. Minimally, there should be examples for all the use cases.

Listing 3 conventional Ant use

<taskdef name="xmltask" classname="com.oopsconsultancy.xmltask.ant.XmlTask"/>

<target name="transform" depends="init">
 <xmltask source="${inputFile}" dest="${outputFile}"
    expandEntityReferences="false" report="false"
      system="classpath://some/path/to/dtd/narly.dtd">
    <entity remote="classpath://some/path/to/dtd/narly.dtd"
         local=""/>
    <replace path="//UNIT[@name='widgetB']/VALUE/@value"
         withText="${newValue}"/>
 </xmltask>
</target>
Using command Line arguments

Instead of hard coding the values, you can get them from the command line with something like the following, which uses Apache Common’s CLI.

Listing 4 command line arg parsing

 // === command line options handling
def cli = new CliBuilder()
cli.s(longOpt: 'source',"source file path",args:1,required:true)
cli.d(longOpt: 'dest',"destination file path",args:1,required:true)
cli.x(longOpt: 'xpath',"xpath expression",args:1,required:true)
cli.v(longOpt: 'value',"replacement string",args:1,required:true)

def options = cli.parse(args)
if(options.s){SOURCE = options.getOptionValue('s')}
if(options.d){ DEST = options.getOptionValue('d')}
if(options.x){ XPATH = options.getOptionValue('x')}
if(options.v){ VALUE = options.getOptionValue('v')}
//=== end command line options handling

Unit Testing

Ok the transform works. How do you know? Eyeballing the resulting XML files? If there are structural changes made to the XML file, will it still work?

Eyeballing the files is not reliable and cannot be automated. One way of testing the changes, is to just use tools such as ‘diff’, and testing the output. I tried that, worked great, until the actual QA testing. There were end-of-line differences in the files depending where you ran the transform and the initial source file. So that would have then required dos2unix or unix2dos to be part of the pipeline. Perhaps there is a switch to diff command to get by this, but I did not find it.

For testing I used JUnit and XMLUnit. I just subclassed Groovy’s JUnit Test subclass called GroovyTestCase.

The test data file is similar to the production data, but much smaller, of course.

Writing the tests was harder then writing the actual script. Fortunately XMLUnit has a very easy to use API.

As usual there were complications. Again, the DOCTYPE was killing the test and I could not get the XMLCatalog support working. My hack was to preprocess the source and output files and filter the DOCTYPE. Notice how small this method is. Straight Java would be pretty wordy.

/** read file, stripping doctype string */
 String filterDocType(path, docString) throws Exception{
           def input = new File(path);
           def writer = new StringWriter()
           input.filterLine(writer){
             it.replaceAll(DOCSTRING,{Object[]s -> ""})
           }
           return writer
 }

The code below is showing the use of command line arguments, whereas the Replace.groovy code was not using them. I left them in here since my original code was using args, this testing shows how to create a command line arg array and invoke a Groovy script.

Listing 5 Replace Unit Test

/**
 * Unit testing the example of using Groovy AntBuilder to invoke XMLTask via Ant API.
 * @author Josef Betancourt
 * @date 20071205T23:14
 *
 * @see http://www.bytemycode.com/snippets/snippet/475/
 * @see http://www.oopsconsultancy.com/software/xmltask
 * @see http://groovy.codehaus.org/Using+Ant+from+Groovy
 *
 */

import org.custommonkey.xmlunit.*

/**
Run the Replace script and make sure only one change is made.
The file paths are relative to TEST_ROOT_PATH passed in
with -DTEST_ROOT_PATH=xxxxx
*/
class ReplaceTest extends GroovyTestCase {
 def DOCSTRING =
 '<!DOCTYPE STORE SYSTEM "classpath://some/path/to/dtd/narly.dtd">'
 def main
 def root
 def lineSeparator = System.getProperty("line.separator")
 def source='data/storage.xml'
 def dest='target/storage-new.xml'
 def args

 void setUp() throws Exception {
         super.setUp()
         main = new Replace()
         root = System.getProperty("TEST_ROOT_PATH")+
                            "/GroovyXMLTaskXMLUnit/"
         XMLUnit.setIgnoreWhitespace(true)
         args = ['-s', source,  '-d', dest, '-x',
             '//UNIT[@name='widgetB']/VALUE/@value', '-v', 'Y']
             as String[]
 }

 void testReplaceScript() throws Exception {
          main.main(args);
          def input =root + source;
          def output = root + +dest;
          validateSingleChange(input, output, "N","Y");
 }

 /** Ensure only one change in XML at XPath */
 void validateSingleChange(final inFile, final outFile,
           value, newValue) throws Exception {
         def diff =
              new DetailedDiff(
                      new Diff(filterDocType(inFile,DOCSTRING),
                               filterDocType(outFile,DOCSTRING)
              )
         )

         def list = diff.getAllDifferences();
         if(list.size()==1){
                  for (dif in list) {
                   def v1 = dif.getControlNodeDetail().getValue();
                   def v2 = dif.getTestNodeDetail().getValue();
                   assertTrue("Failed to change '${value}'
                     to '${newValue}' 
                     using XPath:
                     '${dif.getControlNodeDetail()
                      .getXpathLocation()}'  
                     control value is: $v1 test value is: $v2",
                     (v1.equals(value) &amp;&amp; v2.equals(newValue)));
         }
       }else{
           fail("Expected 1 change, but had ${list.size()} changes.
                diff is: " + diff)
       }
 }

 /** Not really a unit test, but provides confidence in XMLUnit tool use */
 void testNoReplacement() throws Exception {
         main.main(args);
         def input =root + source;
         assertTrue("Should have been the same",
                new DetailedDiff(
                      new Diff(filterDocType(input,DOCSTRING),
                       filterDocType(input,DOCSTRING)
                )
         ).getAllDifferences().size()==0
      );
 }

 /** Another non-unit test, but provides confidence in XMLUnit tool use */
 void testTooManyChanges() throws Exception {
         main.main();
         def t = '(<UNIT name=".*?">)'
         def input =filterDocType(root+source,DOCSTRING)
         def output = input.replaceAll(t,{Object[]it ->
                        it[1]+'<extra>1</extra>'})
         def diff =new DetailedDiff(new Diff(input,output))

         assertTrue("Should have been more then one change",
            diff.getAllDifferences().size()>1)
 }

 /** read file, stripping doctype line */
 String filterDocType(path, docString) throws Exception{
           def input = new File(path);
           def writer = new StringWriter()
           input.filterLine(writer){!(it =~ docString)}
           return writer
 }
}

When run and no failures:


C:homeprojectsdevGroovyXMLTaskXMLUnit&gt;replacetest.cmd ... Time: 2.578
OK (3 tests)

Links

Groovy
XMLTask
XMLUnit
AntBuilder
Ant

Similar Posts:

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

One thought on “Transform XML with Groovy and XMLTask”

Leave a Reply

Your email address will not be published. Required fields are marked *