Topic: Regular Expressions with JavaScript
Flags
Technologies: JavaScript
Subject: Regular Expressions
Use Case:
search patterns for strings
First Principles:
- the term regular expressions is meant to indicate a standard textual syntax for representing patterns for matching text
- regular expressions, or regexes, can match string literals, and can use metacharacters that add other rules for matching string literals
- regular expressions can be formally defined using set theory
- see https://en.wikipedia.org/wiki/Regular_expression, Formal language theory
- the syntax for regular expressions is
/\<pattern>/\<modifiers>
- regular expressions in JavaScript are objects, and can be also created with a class constructor using the
new RegExp('<pattern>', '<modifier>')
expression, ornew RegExp('/<pattern>/', '<modifier>')
, as well as assigning/<pattern>/<modifiers>
to a variable - the resulting RegExp object has methods that we can use to test our strings with
- strings that we want to test against regular expressions using JavaScript are piped into the test via these methods
let string = 'some string'; let regEx = /<pattern>/<modifiers>; let otherRegex = new RegExp('<pattern>', '<modifier>'); let test = regEx.test(string);
- other methods take the regular expressions as an input and are called on the string itself
let string = 'some string'; let regEx = /\<pattern>/\<modifiers>; let otherRegex = new RegExp('\<pattern>', '\<modifier>'); let match = string.match(regEx);
Intro to Regular Expressions
"Some people, when confronted with a problem, think ‘I know, I’ll use regular expressions.’ Now they have two problems." -Jamie Zawinski
This quote appropriately found at the start of chapter 9 of Eloquent JavaScript by Marijn Haverbeke.
Lazy matching
Regex is greedy by default. It finds the longest possible part of a string that fits the regex pattern and returns that.
Finds the smallest possible part of the string that satisfies the regex pattern. Uses the *?
character.
Character Classes
-
placed within square brackets
^
- not; exclude all the characters that follow the caretMetacharacters
.
- wildcard; match any one character-
- range; match any character withing range [<start> - <end>] (inclusive)^
- beginning of string | not; outside of character class: find regex at beginning of a string (/^<pattern>/) | inside of a character class: exclude all the characters that follow the caret (/[^<pattern>]/)$
- end of string; search end of string for regex pattern (/<pattern>$/)+
- one or more consecutive; character appears at least once, but may be repeated*
- zero or more consecutive; character appears any number of times (or not at all)*?
- lazy match; return the smallest possible part of a string that matches the pattern provided\w
- match alphanumeric characters and underscore\W
- match all characters other than alphanumeric characters and underscore\d
- match all numeric characters\D
- match all characters other than numeric characters\s
- match all ‘white space characters’; matches carriage return (\r
), horizontal tab (\t
), form feed (\f
), new line (\n
), and vertical tab (\v
){min, max}
- return number of characters that match pattern between min and max number of characters (inclusive){num}
- return exact number of matches?
- zero plus match; check for possible existence of a match<string>(?=<following string>)
- positive lookahead; match <string> only if followed by <following string><string>(?!<following string>)
- negative lookahead; match <string> only if not followed by <following string>(<expression 1>),(<expression 2>),(<expression 3>),...,(<expression n>) = $1,$2,$3,...,$n
- capture group; For each capture group (<expression n>), a stand-in representation \n can be placed within a larger expression to represent the parenthetic expression so that the parenthetic expression does not need to be written out again. A replacement string can use the $n variable to save the capture group and re-arrange it
Methods
<regex>.test(<string>)
- returnstrue
orfalse
; test for match in a string-
<string>.match(<regex>)
- returns an array or object containing matches (different output depending on whether or not the ‘g’ flag is used) ornull
;- when
g
flag is used, outputs array with all matched string literals - when no flag is used, outputs an object with length property of
1
and with structure:[ <first complete match>, index: <index of input string where first complete match starts>, input: <complete input string> ]
- <output>[0] returns <first complete match>
- <output>.index returns <index of input string where first complete match starts>
- <output>.input returns <complete input string>
- when
<string>.replace(<regex>, <replacement string>)
Flags
i
- “ignore case” (RegExp.prototype.ignoreCase)g
- “global search (find more than one instance)” (RegExp.prototype.global)
Why I had to learn this
I am currently working through the Learn Regular Expressions section of the JavaScript Data Structures and Algorithms certification