Fun with Java RegEx String Replacement

In the Pachyderm presentation publishing code, there is a step where it compiles the various versions of images for use in the final product – resizing as needed, wrapping in the .swf file and burning in the metadata. It uses an XML format, provided by JSwiff, to replace the freeze-dried content of a templated flash file wrapper with the dynamically defined data (image and text).

We just do a simple find-and-replace, looking for special tags that we’ve placed in the templated .swf xml version – looking for things like “{tombstoneTitleShort}” and replacing it with “My Most Excellent Photo”. Seems simple. But I just came across a case where it failed. The extended text for an image included a $ – which would be fine, but it’s a Magic Regex Character, symbolizing the end of a line of text (likewise, ^ symbolizes the beginning of a line). And, it was unescaped (and not at the end of a line), so the String.replaceAll() method was barfing appropriately. I think this is what was happening… Looked like it from the debug output, anyway…

I just switched that bit of string replacement to use the Apache Commons Lang StringUtils replacement method (org.apache.commons.lang.StringUtils.replace(String, String, String) ), and all appears fine now.

As an aside, there are a lot of handy little goodnesses in the Apache Commons Lang library (as in the other Commons libraries). I need to make sure I’m taking advantage of it more, rather than writing my own utility code…

Update: Nope. That didn’t work as cleanly as I’d hoped. Now it was complaining that some of the replaced characters were invalid UTF-8. So, I’m now replacing characters in my replacement string to attempt to escape them properly before feeding them to String.replaceAll( pattern, value ), using the “oldReplace” method from this tip.

In the Pachyderm presentation publishing code, there is a step where it compiles the various versions of images for use in the final product – resizing as needed, wrapping in the .swf file and burning in the metadata. It uses an XML format, provided by JSwiff, to replace the freeze-dried content of a templated flash file wrapper with the dynamically defined data (image and text).

We just do a simple find-and-replace, looking for special tags that we’ve placed in the templated .swf xml version – looking for things like “{tombstoneTitleShort}” and replacing it with “My Most Excellent Photo”. Seems simple. But I just came across a case where it failed. The extended text for an image included a $ – which would be fine, but it’s a Magic Regex Character, symbolizing the end of a line of text (likewise, ^ symbolizes the beginning of a line). And, it was unescaped (and not at the end of a line), so the String.replaceAll() method was barfing appropriately. I think this is what was happening… Looked like it from the debug output, anyway…

I just switched that bit of string replacement to use the Apache Commons Lang StringUtils replacement method (org.apache.commons.lang.StringUtils.replace(String, String, String) ), and all appears fine now.

As an aside, there are a lot of handy little goodnesses in the Apache Commons Lang library (as in the other Commons libraries). I need to make sure I’m taking advantage of it more, rather than writing my own utility code…

Update: Nope. That didn’t work as cleanly as I’d hoped. Now it was complaining that some of the replaced characters were invalid UTF-8. So, I’m now replacing characters in my replacement string to attempt to escape them properly before feeding them to String.replaceAll( pattern, value ), using the “oldReplace” method from this tip.

3 thoughts on “Fun with Java RegEx String Replacement”

  1. Oh yeah, I think replacing stuff with RegEx is nearly as complex as doing the complete search-replace-algorithm yourself. Do you know a good source of an “idiots guide to RegEx”? Somthing foolproof that explains how this stuff works in human speak not in tech-speak? I mean hey it is only some small little String which is needed for RegEx to work, but to figure out which string you need will cost you so much time. You have to wrap you head around machine-thinking I reall dislike dangeling my head with machine-thinking like this.

    I tried making sense of RegEx several times without getting what I wanted. So I did my own recursive (computingtime-saving) Search-and-Replace-Engine, which I can feed with human-readable and understandable rules.

  2. There’s a pretty good doc. on regex built into the BBEdit Help section. Other than that, I just keep hitting The Goog (or del.icio.us for “regex” ). I wonder if there’s a decent Wikipedia article?

Comments are closed.