Topic: Regular Expressions with JavaScript
Flags
Technologies: JavaScript
Subject: Regular Expressions
Use Case:
search patterns for strings
First Principles:
- the term regular expressions is meant to indicate a standard textual syntax for representing patterns for matching text
- regular expressions, or regexes, can match string literals, and can use metacharacters that add other rules for matching string literals
- regular expressions can be formally defined using set theory
- see https://en.wikipedia.org/wiki/Regular_expression, Formal language theory
- the syntax for regular expressions is
/\<pattern>/\<modifiers> - regular expressions in JavaScript are objects, and can be also created with a class constructor using the
new RegExp('<pattern>', '<modifier>')expression, ornew RegExp('/<pattern>/', '<modifier>'), as well as assigning/<pattern>/<modifiers>to a variable - the resulting RegExp object has methods that we can use to test our strings with
- strings that we want to test against regular expressions using JavaScript are piped into the test via these methods
let string = 'some string'; let regEx = /<pattern>/<modifiers>; let otherRegex = new RegExp('<pattern>', '<modifier>'); let test = regEx.test(string); - other methods take the regular expressions as an input and are called on the string itself
let string = 'some string'; let regEx = /\<pattern>/\<modifiers>; let otherRegex = new RegExp('\<pattern>', '\<modifier>'); let match = string.match(regEx);Intro to Regular Expressions
"Some people, when confronted with a problem, think ‘I know, I’ll use regular expressions.’ Now they have two problems." -Jamie Zawinski
This quote appropriately found at the start of chapter 9 of Eloquent JavaScript by Marijn Haverbeke.
Lazy matching
Regex is greedy by default. It finds the longest possible part of a string that fits the regex pattern and returns that.
Finds the smallest possible part of the string that satisfies the regex pattern. Uses the *? character.
Character Classes
-
placed within square brackets
^- not; exclude all the characters that follow the caretMetacharacters
.- wildcard; match any one character-- range; match any character withing range [<start> - <end>] (inclusive)^- beginning of string | not; outside of character class: find regex at beginning of a string (/^<pattern>/) | inside of a character class: exclude all the characters that follow the caret (/[^<pattern>]/)$- end of string; search end of string for regex pattern (/<pattern>$/)+- one or more consecutive; character appears at least once, but may be repeated*- zero or more consecutive; character appears any number of times (or not at all)*?- lazy match; return the smallest possible part of a string that matches the pattern provided\w- match alphanumeric characters and underscore\W- match all characters other than alphanumeric characters and underscore\d- match all numeric characters\D- match all characters other than numeric characters\s- match all ‘white space characters’; matches carriage return (\r), horizontal tab (\t), form feed (\f), new line (\n), and vertical tab (\v){min, max}- return number of characters that match pattern between min and max number of characters (inclusive){num}- return exact number of matches?- zero plus match; check for possible existence of a match<string>(?=<following string>)- positive lookahead; match <string> only if followed by <following string><string>(?!<following string>)- negative lookahead; match <string> only if not followed by <following string>(<expression 1>),(<expression 2>),(<expression 3>),...,(<expression n>) = $1,$2,$3,...,$n- capture group; For each capture group (<expression n>), a stand-in representation \n can be placed within a larger expression to represent the parenthetic expression so that the parenthetic expression does not need to be written out again. A replacement string can use the $n variable to save the capture group and re-arrange it
Methods
<regex>.test(<string>)- returnstrueorfalse; test for match in a string-
<string>.match(<regex>)- returns an array or object containing matches (different output depending on whether or not the ‘g’ flag is used) ornull;- when
gflag is used, outputs array with all matched string literals - when no flag is used, outputs an object with length property of
1and with structure:[ <first complete match>, index: <index of input string where first complete match starts>, input: <complete input string> ]- <output>[0] returns <first complete match>
- <output>.index returns <index of input string where first complete match starts>
- <output>.input returns <complete input string>
- when
<string>.replace(<regex>, <replacement string>)Flags
i- “ignore case” (RegExp.prototype.ignoreCase)g- “global search (find more than one instance)” (RegExp.prototype.global)
Why I had to learn this
I am currently working through the Learn Regular Expressions section of the JavaScript Data Structures and Algorithms certification