Skip to content Skip to sidebar Skip to footer

Regex For Multiple Group Not Working - Java 1.7 To 1.8 - Scriptengine Rhino To Nashorn

I had a regex problem as described in this question The string is following : 06-24-2015 11:28AM 0250 01 90775 05342 The JS code on 1.7 Java is working f

Solution 1:

As far as I could check, all you need to do is to add [^<]*? before the closing <\\/span\\> . Also, you don't need to escape the starting < of the last span. So, this is this script:

ScriptEngineManager mgr = new ScriptEngineManager();
    ScriptEngine engine = mgr.getEngineByName("JavaScript");
    String js ="var fileSrc = '<SPAN>06-24-2015  11:28AM  0250 01 90775 05342</SPAN>';"+"var trans_regex = /\\<span\\>(\\d{2}-\\d{2}-\\d{4})\\s*?(\\d{1,2}:\\d{2}\\s*?(?:am|pm))\\s*?(?:<\\/SPAN><BR\\/?><SPAN>)?\\s*?((\\d[ -]*?){13,17})\\s*?[^<]*?\\<\\/span\\>/i ;"+"print('executed regex result : ' + trans_regex.exec(fileSrc) ) ; "+"var t_time = trans_regex.exec(fileSrc)[2];"+"var t_cc = trans_regex.exec(fileSrc)[3];"+"print(\" time \" + t_time)";

    Object result = engine.eval(js);

Yields to this:

executed regex result : <SPAN>06-24-2015  11:28AM  0250 01 90775 05342</SPAN>,06-24-2015,11:28AM,02500190775,5
 time 11:28AM

Update - explanation and alternatives

This is the last group of the original regex: ((\\d[ -]*?){13,17}). It looks tricky to me, however, I don't know the intention behind this. What it does:

  1. match one digit
  2. match zero or more space or hyphen
  3. Repeat step 1 and two it at least 13, at most 17 times.

This is really tricky because the zero or more space or hypen can match anywhere. I believe the intention was something like this:

Considering the string 0250 01. The leading '0' would match 1 digit and 0 other characters. Same for the '2' and '5'. Then, '0 ' would match one digit and one space. And so on, up to 13-17 digits.

Apparently, the Nashorn engine cannot handle this construct. If I add this print statement:

+ "print (trans_regex.exec(fileSrc));"

then I'll get this result:

<SPAN>06-24-2015  11:28AM  0250 01 90775 05342</SPAN>,06-24-2015,11:28AM,0250 01 90775,5

Which tells me that ((\\d[ -]*?){13,17}) matched 0250 01 90775, which is only 11 digits. So my 'fix' just catches the rest of the characters up the the start of a closing <\span>.

According to this regex demo, your logic should match.

Workaround 1

You can make the hyphen-or-space part greedy. I.e. remove the ? after the *:

((\\d[ -]*){13,17})

I would go with this one.

Workaround 2

You can specify some more repetitions, 19 in this case:

((\\d[ -]*?){13,19})

I'm afraid in this case you'll have to change the lower bound too.

Post a Comment for "Regex For Multiple Group Not Working - Java 1.7 To 1.8 - Scriptengine Rhino To Nashorn"