Can A Javascript Regexp Match At An Exact String Offset Without Searching?
Solution 1:
I've thoroughly tested possibly efficient methods, see JSPerf: ~20000 chars, ~1000000 chars. I've created a function to generate a random string consisting of alphanumeric characters. After running this function once, a RegExp pattern is created, to match a string with length 10 at a given offset.
Tested cases (when the condition in if(..)
is true, the pattern is found at offset index
):
varstring = "...about 20000 characters: A-Z a-z 0-9...";
var regexp = /abcdef1324/g;
var regexpSubstr = /^abcdefg1234/;
var index = 10000;
/*1*/if ( regexpSubstr.exec(string.substr(index,10)) ) ;
/*2*/ regexp.lastIndex = index;
var found = regexp.exec(string);
if (found && found.length + index == regexp.lastIndex ) ;
/*3*/if ( regexpSubstr.test(string.substr(index,10)) ) ;
/*4*/// Only when the RegExp matches a fixed number of characters
regexp.lastIndex = index;
if ( regexp.test(string) && regexp.lastIndex == index + 10 ) ;
Case 1 and Case 3 are equivalent, because they're checking whether a substring matches the /^abcdef1234/
pattern (does the selected substring start with "abc..etc"?).
Case 2 and Case 4 use the .lastIndex
method:
1. Set the .lastIndex
property of the RegExp to the desired offset
2. Check whether the pattern is found.
3. Check whether a found pattern is positioned at offset index
.
These methods require a Regular expression to have enabled the global flag.
At very large strings, method 4 (lastIndex
+ test
) is proved to be most efficient whn a match at the offset occurs. Method 4, however, requires the matched pattern to have a pre-defined, fixed size.
Method 3 (substr
+ test
) is slightly slower than 4, when a match occurs at the given position. When no match is found at a large string, however, method 3 is significantly faster than method 4. Method 1 and method 3 seems to be equally fast when no match is found.
RegExp methods
The .exec
seems to not be more efficient than .test
. The match
method is not suitable for this case, since it attempts to find all matches, regardless of the .lastIndex
property. Another possible RegExp function is the .search
function, which is significantly slower for large strings, compared to previously shown methods.
Solution 2:
Perhaps new RegExp("^.{" + offset + "}whatever")
, i.e. any character exactly offset times, followed by whatever? Needs benchmarking, but one would hope RegExp would optimize the "any character exactly offset times" part?
Post a Comment for "Can A Javascript Regexp Match At An Exact String Offset Without Searching?"