Regex For Multiple Group Not Working - Java 1.7 To 1.8 - Scriptengine Rhino To Nashorn
Solution 1:
As far as I could check, all you need to do is to add [^<]*?
before the closing <\\/span\\>
. Also, you don't need to escape the starting <
of the last span. So, this is this script:
ScriptEngineManager mgr = new ScriptEngineManager();
ScriptEngine engine = mgr.getEngineByName("JavaScript");
String js ="var fileSrc = '<SPAN>06-24-2015 11:28AM 0250 01 90775 05342</SPAN>';"+"var trans_regex = /\\<span\\>(\\d{2}-\\d{2}-\\d{4})\\s*?(\\d{1,2}:\\d{2}\\s*?(?:am|pm))\\s*?(?:<\\/SPAN><BR\\/?><SPAN>)?\\s*?((\\d[ -]*?){13,17})\\s*?[^<]*?\\<\\/span\\>/i ;"+"print('executed regex result : ' + trans_regex.exec(fileSrc) ) ; "+"var t_time = trans_regex.exec(fileSrc)[2];"+"var t_cc = trans_regex.exec(fileSrc)[3];"+"print(\" time \" + t_time)";
Object result = engine.eval(js);
Yields to this:
executed regex result : <SPAN>06-24-2015 11:28AM 0250 01 90775 05342</SPAN>,06-24-2015,11:28AM,02500190775,5
time 11:28AM
Update - explanation and alternatives
This is the last group of the original regex: ((\\d[ -]*?){13,17})
. It looks tricky to me, however, I don't know the intention behind this. What it does:
- match one digit
- match zero or more space or hyphen
- Repeat step 1 and two it at least 13, at most 17 times.
This is really tricky because the zero or more space or hypen can match anywhere. I believe the intention was something like this:
Considering the string 0250 01
. The leading '0'
would match 1 digit and 0 other characters. Same for the '2'
and '5'
. Then, '0 '
would match one digit and one space. And so on, up to 13-17 digits.
Apparently, the Nashorn engine cannot handle this construct. If I add this print statement:
+ "print (trans_regex.exec(fileSrc));"
then I'll get this result:
<SPAN>06-24-2015 11:28AM 0250 01 90775 05342</SPAN>,06-24-2015,11:28AM,0250 01 90775,5
Which tells me that ((\\d[ -]*?){13,17})
matched 0250 01 90775
, which is only 11 digits. So my 'fix' just catches the rest of the characters up the the start of a closing <\span>
.
According to this regex demo, your logic should match.
Workaround 1
You can make the hyphen-or-space part greedy. I.e. remove the ?
after the *
:
((\\d[ -]*){13,17})
I would go with this one.
Workaround 2
You can specify some more repetitions, 19 in this case:
((\\d[ -]*?){13,19})
I'm afraid in this case you'll have to change the lower bound too.
Post a Comment for "Regex For Multiple Group Not Working - Java 1.7 To 1.8 - Scriptengine Rhino To Nashorn"