Skip to content Skip to sidebar Skip to footer

Regex For Todo Keyword When Passing Through A List Of Directories To Get A List Of Files With Todo Keyword (eg. //todo) But Not As Variable / String

I'm trying to write an application that looks through a directory and flag out all files (be it in directory or subdirectories) that has the TODO keyword (the one that flashes/high

Solution 1:

I don't think there is a reliable way to exclude TODO in variable names or string values across languages. You'd need to parse each language properly, and scan for TODO in comments.

You can do an approximation that you can tweak over time:

  • for variable names you'd need to exclude TODO = assignments, and any type of use, such as TODO.length
  • for string value you could exclude 'TODO' and "TODO", and even "Something TODO today" while looking for matching quotes. What about a multi-line string with backticks?

This is a start using a bunch of negative lookaheads:

const input = `Test Case:
// TODO blah
// TODO do "stuff"
/* stuff
 * TODO
 */
let a = 'TODO';
let b = 'Something TODO today';
let c = "TODO";
let d = "More stuff TODO today";
let TODO = 'stuff';
let l = TODO.length;
let e = "Even more " + TODO + " to do today";
let f = 'Nothing to do';
`;
let keyword = 'TODO';
const regex = newRegExp(
  // exclude TODO in string value with matching quotes:'^(?!.*([\'"]).*\\b' + keyword + '\\b.*\\1)' +
  // exclude TODO.property access:'(?!.*\\b' + keyword + '\\.\\w)' +
  // exclude TODO = assignment'(?!.*\\b' + keyword + '\\s*=)' +
  // final TODO match'.*\\b' + keyword + '\\b'
);
input.split('\n').forEach((line) => {
  let m = regex.test(line);
  console.log(m + ': ' + line);
});

Output:

false: Test Case:
true: // TODO blahtrue: // TODO do "stuff"false: /* stuff
true:  * TODO
false:  */false: leta = 'TODO';
false: letb = 'Something TODO today';
false: letc = "TODO";
false: letd = "More stuff TODO today";
false: letTODO = 'stuff';
false: letl = TODO.length;
false: lete = "Even more " + TODO + " to do today";
false: letf = 'Nothing to do';
false: 

Explanation of composition of regular expression:

  • ^ - start of string (in our case start of line due to split)
  • exclude TODO in string value with matching quotes:
    • (?! - negative lookahead start
    • .* - greedy scan (scan over all chars, but still match what follows)
    • (['"]) - capture group for either a single quote or a double quote
    • .* - greedy scan
    • \b - word woundary before keyword (expect keyword enclosed in non-word chars)
    • add keyword here
    • \b - word woundary after keyword
    • .* - greedy scan
    • \1 - back reference to capture group (either a single quote or a double quote, but the one captured above)
    • ) - negative lookahead end
  • exclude TODO.property access:
    • (?! - negative lookahead start
    • .* - greedy scan
    • \b - word woundary before keyword
    • add keyword here
    • \.\w - a dot followed by a word char, such as .x
    • ) - negative lookahead end
  • exclude TODO = assignment
    • (?! - negative lookahead start
    • .* - greedy scan
    • \b - word woundary before keyword
    • add keyword here
    • \s*= - optional spaces followed by =
    • ) - negative lookahead end
  • final TODO match
    • .* - greedy scan
    • \b - word woundary (expect keyword enclosed in non-word chars)
    • add keyword here
    • \b - word woundary

Learn more about regular expressions: https://twiki.org/cgi-bin/view/Codev/TWikiPresentation2018x10x14Regex

Post a Comment for "Regex For Todo Keyword When Passing Through A List Of Directories To Get A List Of Files With Todo Keyword (eg. //todo) But Not As Variable / String"