Tag Archives: regex

Regex group capture with optional delimeter

Ever had one of those days where the simplest thing seems complex? A simple regex is doing that to me now.

This is the example, simplified. You have a string “BEFOREAFTER”. If this string has a ‘x’ in it, you want to capture everything before the ‘x’, else you want to capture the whole string. So,
  “BEFOREAFTER” gives “BEFOREAFTER”
  “BEFORExAFTER” gives “BEFORE”

jump to solution

Yes, I know this can be done programmatically or just using String.split(….). Don’t you hate on Stackoverflow and other places where they say don’t do it that way, but you still want to know: yes, but if I do choose this way, how would it be done? This is a learning incident.

This should be easy, regex 101, right? I thought so too. I thought /^(.*)x?.*/ would work, nope. I tried captures with non-greedy quantifiers, zero-width this or that, etc. I even went back to reading Mastering Regular Expresssions by Jeffrey E. F. Friedl, O’Reilly Media, Inc. Great book by the way.

My solution just has to be wrong. Here it is written using the Groovy language.

def re = /^(.*)x|^(.*)/

getCaptureGroup(re,"BEFORExAFTER")
getCaptureGroup(re,"BEFOREAFTER")
getCaptureGroup(re,"")
getCaptureGroup(re,null)
getCaptureGroup(null,"x")

def getCaptureGroup(regex, inString){
  def found = ""	

  def m = (inString =~ regex)
  if(m){
    List matches = (List)m[0]		
    found =  matches[1] ? matches[1] : 
      (matches[2] ? matches[2] : "")
   }	
	
   println "Using pattern $regex in String '$inString', 
    found group: '($found)' of type: ${found.getClass().getSimpleName()}"
}

The output of this script is:

Using pattern ^(.*)x|^(.*) in String 'BEFORExAFTER', found group: '(BEFORE)' of type: String
Using pattern ^(.*)x|^(.*) in String 'BEFOREAFTER', found group: '(BEFOREAFTER)' of type: String
Using pattern ^(.*)x|^(.*) in String '', found group: '()' of type: String
Using pattern ^(.*)x|^(.*) in String 'null', found group: '(null)' of type: String
Using pattern null in String 'x', found group: '()' of type: String


Any other way of designing the regex using the Java regex engine?

Solved
I found a Stackoverflow entry that solved a similar question here.

So the solution is now:

def re = /^(.*?)(?:x.*|$)/

getCaptureGroup(re,"BEFOREAFTER")
getCaptureGroup(re,"BEFORExAFTER")
getCaptureGroup(re,null)
getCaptureGroup(null,"x")
println ""

def getCaptureGroup(regex, inString){
  def found = ""
	
  def m = (inString =~ regex)
  if(m){
  	List matches = (List)m[0]		
  	found =  matches[1]
  }	
  
  println "Using pattern $regex in String '$inString', 
    found group: '($found)' of type: ${found.getClass().getSimpleName()}"
}

Or as one line:
println ((“BEFORExAFTER” =~ /^(.*?)(?:x.*|$)/)[0][1])

Further reading

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.