Regular Expression syntax for TFStats

TFStats v2.0 allows the use of regular expressions in match patterns. This document describes the syntax used for those regular expressions. The regular expression searching/matching tool that TFStats uses is called Regex++.

Regex++, regular expression library.
(version 2.12, 02 August 1999)
Copyright (c) 1998-9
Dr John Maddock

Note: To preserve a modicum of simplicity, this document is heavily abridged. For the FULL specification of the regular expression syntax that can be used, please see http://ourworld.compuserve.com/homepages/John_Maddock/regex.htm#syntax
Regular expression syntax

This section covers the regular expression syntax used by this library, this is a programmers guide, the actual syntax presented to your program's users will depend upon the flags used during expression compilation.
Literals

All characters are literals except: ".", "*", "?", "+", "(", ")", "{", "}", "[", "]", "^" and "$". These characters are literals when preceded by a "\". A literal is a character that matches itself, or matches the result of traits_type::translate(), where traits_type is the traits template parameter to class reg_expression.

Wildcard

The dot character "." matches any single character except a newline character

Repeats

A repeat is an expression that is repeated an arbitrary number of times. An expression followed by "*" can be repeated any number of times including zero. An expression followed by "+" can be repeated any number of times, but at least once. An expression followed by "?" may be repeated zero or one times only. When it is necessary to specify the minimum and maximum number of repeats explicitly, the bounds operator "{}" may be used, thus "a{2}" is the letter "a" repeated exactly twice, "a{2,4}" represents the letter "a" repeated between 2 and 4 times, and "a{2,}" represents the letter "a" repeated at least twice with no upper limit. Note that there must be no white-space inside the {}, and there is no upper limit on the values of the lower and upper bounds. All repeat expressions refer to the shortest possible previous sub-expression: a single character; a character set, or a sub-expression grouped with "()" for example.

Examples:

"ba*" will match all of "b", "ba", "baaa" etc.

"ba+" will match "ba" or "baaaa" for example but not "b".

"ba?" will match "b" or "ba".

"ba{2,4}" will match "baa", "baaa" and "baaaa".

Parenthesis

Parentheses serve two purposes, to group items together into a sub-expression, and to mark what generated the match. For example the expression "(ab)*" would match all of the string "ababab".

Alternatives

Alternatives occur when the expression can match either one sub-expression or another, each alternative is separated by a "|". Each alternative is the largest possible previous sub-expression; this is the opposite behaviour from repetition operators.

Examples:

"a(b|c)" could match "ab" or "ac".

"abc|def" could match "abc" or "def".

Sets

A set is a set of characters that can match any single character that is a member of the set. Sets are delimited by "[" and "]" and can contain literals, character ranges, character classes, collating elements and equivalence classes. Set declarations that start with "^" contain the compliment of the elements that follow.

Examples:

Character literals:

"[abc]" will match either of "a", "b", or "c".

"[^abc] will match any character other than "a", "b", or "c".

Character ranges:

"[a-z]" will match any character in the range "a" to "z".

"[^A-Z]" will match any character other than those in the range "A" to "Z".

[:alnum:]

Any alpha numeric character.

[:alpha:]

Any alphabetical character a-z and A-Z. Other characters may also be included depending upon the locale.

[:blank:]

Any blank character, either a space or a tab.

[:cntrl:]

Any control character.

[:digit:]

Any digit 0-9.

[:graph:]

Any graphical character.

[:lower:]

Any lower case character a-z. Other characters may also be included depending upon the locale.

[:print:]

Any printable character.

[:punct:]

Any punctuation character.

[:space:]

Any whitespace character.

[:upper:]

Any upper case character A-Z. Other characters may also be included depending upon the locale.

[:xdigit:]

Any hexadecimal digit character, 0-9, a-f and A-F.

[:word:]

Any word character - all alphanumeric characters plus the underscore.

[:unicode:]

Any character whose code is greater than 255, this applies to the wide character traits classes only.

There are some shortcuts that can be used in place of the character classes:

\w in place of [:word:]

\s in place of [:space:]

\d in place of [:digit:]

\l in place of [:lower:]

\u in place of [:upper:]

	[:alnum:]	Any alpha numeric character.
	[:alpha:]	Any alphabetical character a-z and A-Z. Other characters may also be included depending upon the locale.
	[:blank:]	Any blank character, either a space or a tab.
	[:cntrl:]	Any control character.
	[:digit:]	Any digit 0-9.
	[:graph:]	Any graphical character.
	[:lower:]	Any lower case character a-z. Other characters may also be included depending upon the locale.
	[:print:]	Any printable character.
	[:punct:]	Any punctuation character.
	[:space:]	Any whitespace character.
	[:upper:]	Any upper case character A-Z. Other characters may also be included depending upon the locale.
	[:xdigit:]	Any hexadecimal digit character, 0-9, a-f and A-F.
	[:word:]	Any word character - all alphanumeric characters plus the underscore.
	[:unicode:]	Any character whose code is greater than 255, this applies to the wide character traits classes only.