Skip to content Skip to sidebar Skip to footer

Match "//" Comments With Regex But Not Inside A Quote

I need to match and replace some comments. for example: $test = 'the url is http://www.google.com';// comment '<-- that quote needs to be matched I want to match the comments o

Solution 1:

You can have a regexp to match all strings and comments at the same time. If it's a string, you can replace it with itself, unchanged, and then handle a special case for comments.

I came up with this regex:

"(\\[\s\S]|[^"])*"|'(\\[\s\S]|[^'])*'|(\/\/.*|\/\*[\s\S]*?\*\/)

There are 3 parts:

  • "(\\[\s\S]|[^"])*" for matching double quoted strings.
  • '(\\[\s\S]|[^'])*' for matching single quoted strings.
  • (\/\/.*|\/\*[\s\S]*?\*\/) for matching both single line and multiline comments.

The replace function check if the matched string is a comment. If it's not, don't replace. If it is, replace " and '.

functiont_replace(data){
    var re = /"(\\[\s\S]|[^"])*"|'(\\[\s\S]|[^'])*'|(\/\/.*|\/\*[\s\S]*?\*\/)/g;
    return data.replace(re, function(all, strDouble, strSingle, comment) {
        if (comment) {
            return all.replace(/"/g, '&quot;').replace(/'/g, '&apos;');
        }
        return all;
    });
}

Test run:

Input: $test = "the url is http://www.google.com";// c'o"mment """<-- that quote needs to be matched
Output: $test = "the url is http://www.google.com";// c&apos;o&quot;mment &quot;&quot;&quot;<-- that quote needs to be matched

Input: $test = 'the url is http://www.google.com';# c'o"mment """<-- that quote needs to be matched
Output: $test = 'the url is http://www.google.com';# c'o"mment """<-- that quote needs to be matched

Input: $test//= "the url is http://www.google.com"; //c'o"mment """<-- that quote needs to be matched
Output: $test//= &quot;the url is http://www.google.com&quot;; //c&apos;o&quot;mment &quot;&quot;&quot;<-- that quote needs to be matched

Solution 2:

I have to admit, this regex took me a while to generate...but I'm pretty sure this will do what you are looking for:

<script>var str ="$test = \"the url is http://www.google.com\";// comment \"\"\"<-- that quote needs to be matched";
var reg =/^(?:(([^"'\/]*(("[^"]*")|('[^']*'))?[\s]*)?\/\/[^"]*))"/g;

while( str !== (str = str.replace( reg, "$1&quot;") ) );

console.log( str );

</script>

Here's what's going on in the regex:

^# start with the beginning of the line(?:# don't capture the following(([^"'\/]*# start the line with any character as long as it isn't a string or a comment(("[^"]*")# grab a double quoted string|# OR ('[^']*')# grab a single quoted string)?# but...we don't HAVE to match a string
   [\s]*# allow for any amount of whitespace)?# but...we don't HAVE to have any characters before the comment begins\/\/# match the start of a comment
  [^"]*# match any number of characters that isn't a double quote)# end un-caught grouping)# end the non-capturing declaration"             # match your commented double quote

The while loop in javascript is just find/replacing until it can't find any additional matches in a given line.

Solution 3:

Don't forget that PHP comments can also take the form of /* this is a comment */ which can be span across multiple lines.

This site may be of interest to you:

http://blog.stevenlevithan.com/archives/mimic-lookbehind-javascript

Javascript does not have native lookbehind support in it's regular expression engine. What you may be able to do is start at the end of a line and look backward to capture any characters that follow a semi colon + optional whitespace + // So something like:

;\w*\/\/(.+)$

This may not capture everything.

You also may want to look for a Javascript (or other languages) PHP syntax checker. I think Komodo Edit's PHP syntax checker may be written in Javascript. If so, it may give you insight on how to strip everything out but comments as the syntax checkers need to ensure the PHP code is valid, comments and all. The same can be said about syntax color changers. Here are two other links:

http://ecoder.quintalinda.com/

http://www.webdesignbooth.com/9-useful-javascript-syntax-highlighting-scripts/

Solution 4:

In complement of @Thai answer which I found very good, I would like to add a bit more:

In this example using original regex only the last character of quotes will be matched: https://regex101.com/r/CoxFvJ/2

So I modified a bit to allow capture of full quotes content and give a more talkative and generic example of content: https://regex101.com/r/CoxFvJ/3

So final regex:

/"((?:\\"|[^"])*)"|'((?:\\'|[^'])*)'|(\/\/.*|\/\*[\s\S]*?\*\/)/g

Big thanks to Thai for unlocking me.

Post a Comment for "Match "//" Comments With Regex But Not Inside A Quote"