A brief introduction to regular expressions
Awk matches text in terms of regular
expressions, which are enclosed
in forward slashes: /pattern/.
Here is a brief synopsis of some of the basics:
- An ordinary character matches itself. Ordinary characters include
letters, digits, and spaces. For example, /banana/ matches
lines containing banana and bananas.
- A metacharacter matches one or more characters. Some metacharacters
include:
- The period (.), which matches any single character.
For example, the pattern /h.t/ matches lines containing
hat, hotter, and huts.
- The asterisk (*), which matches zero or more occurrences of
of the character preceding it. For example, the pattern
/met*/ matches lines containing meteorology, mean,
and mettle.
- The circumflex (^), which matches lines that begin with the
indicated pattern. For example, /^#/ matches lines
beginning with the # character.
- The dollar sign ($), which matches lines that end with the
indicated
pattern. For example, /xyz$/ matches lines whose last three
characters are xyz.
- Characters in square brackets match a single character from the
indicated selection. For example:
- /[Aa]rmy/ matches lines containing army and Army.
- /[A-Z]/ matches lines containing any capital letter.
- /xy[0-9]/ matches lines containing any of xy0,
xy1, ..., xy9.
- All these elements can be combined in reasonable ways. For example,
- /^[0-9][0-9]/ matches lines that begin with
two consecutive digits.
- /[0-9][0-9]*/ matches lines containing one or more
consecutive digits.
- /^[A-Z].*[0-9]$/ matches any line that begins with a capital
letter and ends with a numeric digit. (The .* indicates that
any number of other characters may intervene.)
For more details, consult the section on
regular
expressions in the GNU Awk manual.