TCL based Regular Expressions


This section provides basic help for creating Regular Expressions based on the TCL "engine".

A Regular Expression is a pattern description using a "meta language", a language that you use to describe particular patterns of interest. The characters used in this "meta language" are part of the standard ASCII character set used in UNIX and MS-DOS, which can sometimes lead to confusion. The characters that form regular expressions are:

. Matches any single character.
[], ^, - A character class which matches any character within the brackets. If the first character is a circumflex (" ^ ") it changes the meaning to match any character except those within the brackets. A dash (" - ") inside the square brackets indicates a character range, e.g., " [0-9] " means the same thing as " [0123456789] ", " [^A-Z] " matches any single character except A to Z upper case letters.
\ Except inside character classes (" [...] "), this makes the next character lose its special meaning, e.g., " \* " is a literal asterisk.
When this appears just before a letter r, n or t: \r, \n, \t respectively matches a carriage return, a line feed, or an horizontal tab.
Only in Universal Analyzer description files (named xxxLanguagePattern.xml in folder $CAST_INSTALL\configuration\universal\xxx):
  • \x followed by one or two hexadecimal digits matches the corresponding character for digits' value, e.g. " \x41 " matches an "A", " \xA " matches a line feed. Such characters never have a special meaning, e.g. " \x2E " matches a literal dot (same as " \.").
  • Inside character classes, a \ makes the next character lose its special meaning, e.g. " [\^\\A\-\]\r\n\t] " matches one of the following characters: "^", "\", "A", "-", "]", a carriage return, a line feed, or an horizontal tab.
[\r\n]+

Use the expression [\r\n]+ to match one or many carriage returns (\r) and/or line feeds (\n).

+ Matches one or more occurrences of the preceding regular expression.

For example: [0-9]+ matches " 1 ", " 111 ", or " 123456 " but not an empty string (if the plus sign were an asterisk, it would also match the empty string).

* Matches one or more occurrences of the preceding regular expression or an empty string.

For example: [0-9]* matches " 1 ", " 111 ", " 123456 " or an empty string " ".

? Matches zero or one occurrence of the preceding regular expression.

For example: -?[0-9]+ matches a signed number including an optional leading minus.

| Matches either the preceding regular expression or the following regular expression.

For example: Cow|pig|sheep matches any of the three words.

Note: Empty alternatives are disallowed.

() Groups a series of regular expressions together into a new regular expression.

For example: (01) represents the character sequence 01. Parentheses are useful when building up complex patterns with *, +, ?, and |.

Note that some of these operators operate on single characters (e.g., []) while others operate on regular expressions. Usually, complex regular expressions are built up from simple regular expressions.
Examples of how to use Regular Expressions follow:

First, the regular expression for a " digit " is:
[0-9]

This can be used to build a regular expression for an integer:
[0-9]+
... for which at least one digit is required (this would have allowed no digits at all: [0-9]*)

Let's add an optional unary minus:
-?[0-9]+

This can then be expanded to allow decimal numbers. First, a decimal number can be specified (for the time being the last character will always be a digit):
[0-9]*\.[0-9]+

Note that the " \ " before the period will make it a literal period rather than a wild card character. This pattern matches " 0.0 ", " 4.5 ", or " .31415 ". However, it does not match " 0 " or " 2 ". In order to combine the definition to match them as well, simply leave out the unary minus, and use the following instead:
([0-9]+)|([0-9]*\.[0-9]+)

In this example, the grouping symbols " () " are used to specify what the regular expressions are for the " | " operation. If the unary minus is added :
-?(([0-9]+)|([0-9]*\.[0-9]+))

This can be furthered by allowing a float-style exponent to be specified as well. First, here's an example of a regular expression for an exponent:
[eE][-+]?[0-9]+

This matches an upper, or lowercase letter E, then an optional plus or minus sign, then a string of digits. For instance, this will match " e12 " or " E-3 ". This expression can then be used to build our final expression, one that specifies a real number:
-?(([0-9]+)|([0-9]*\.[0-9]+)([eE][-+]?[0-9]+)?)

Valid number: .65ea12

Specific example for the CAST Snapshot Preparation Assistant when creating a Module

When using the Snapshot Generation Assistant, it is possible to use Regular Expressions to automatically create your Modules (using the Match option against a Regular Expression.

In some cases it can be particularly useful to exclude certain objects from the Module. To do so using a Regular Expression, you can use the following syntax (this is simply an example):

To include all objects in the Module except those that match "Stoc<something>", use the following Regular Expression:

([^S]*)|(S[^t]*)|(St[^o]*)|(Sto[^c]*)


CAST Website