Application analysis configuration - Config - Advanced - Reference Finder


Overview

The Reference Finder feature enables you to define one or multiple rules at Application level to search for links (i.e. a word, series of words or string (based on Regular Expressions)) between Source and Target code when an analysis is run. The Reference Finder is an extension of the Dependencies feature where references are traced using search strings which is less selective than parser based technology used for other links traced by the analyzer. This technology detects a reference to an object wherever its name is mentioned, regardless of the context in which this reference occurs. As a result, incorrect links may be traced if a string happens to match a given name even if it is not logically related to the corresponding object. As a result you may have to intervene to filter incorrect references. Refer to Dynamic Links for more details on how to ignore these ambiguous links.

The Reference Finder is therefore particularly useful when standard Dependencies (those provided by CAST Imaging) based solely on simple word based matches between Technologies are either too broad (creating too many “false” links between objects) or do not detect the links you require. Since the Reference Finder is based on Regular Expression matching, you can define very specific search strings to identify the links between source code that you require.

Content

This section lists all Reference Finder rules that already exist with a brief summary of the configuration and various options.

Add a new Reference Finder rule

Use the Add button to create a new empty Reference Finder rule:

The creation window will then be displayed enabling you define the rule:

Name

Choose a name for the Reference Finder rule to identify it. The characters in the Reference Finder name must match the following regular expression:

[a-zA-Z][a-zA-Z0-9]+

For example, a name such as Test Rule (with a white space) is not permitted.

Description

This field is a simple free text field that allows you to enter an optional description of the rule - i.e. what is it intended to do.

Source

Technology

Select the Source technology in the drop down:

Object types

Now select the specific object types within your chosen Technology that you want to focus on. These object types are taken from the technology metamodel not from the current application, therefore, the specific object type may not be present in your application. You can select multiple object types, as shown below. Use the X icon in an object type if you change your mind and do not want to include that specific object.

Target

Exactly as above with the Source fields. Select the technology and object types you require.

Preview window

The preview window is used to display the source and target technologies for the current rule and any results that the rule identifies when the Check button is pressed (see below).

Expression

Pattern - mandatory

Enter the Regular Expression, word or phrase that you want the Reference Finder rule to target. For example to match any word in upper case beginning with TAB use:

TAB_[A-Z]+

CAST Imaging uses Python Regular Expression syntax and you can can find some hints and tips here:

CAST recommends using the (?i) syntax , i.e. (?i)[a-z]+ (see https://www.regular-expressions.info/modifiers.htmlexternal link) to ensure that the regular expression search is applied in case insensitive mode (i.e. CAST Imaging will not make a distinction between identical matched strings that only differ in case).

Begin / End - optional

You can optionally choose to restrict the pattern search to a specific zone. You can specify the begin and/or end of the zone through Regular Expressions (you can enter only Begin, only End or both Begin and End) - the pattern will then only be searched when a matching begin or end is located in the code.

For example, in a Cobol program, you want to search all code for all instances of PERFORM that can be found between IF and END-IF. To do so configure the following:

  • The limits of the zones you are searching (i.e.: the Begin and End Expressions) MUST NOT be located in comments.
  • Zones can overlap one another and a zone can be included within another zone.
  • Each zone will be searched independently for its own regular expression.

Replace Pattern - optional

Activating this option enables you to apply a replacement process to the results of the Regular Expression search prior the results being saved to the Analysis Service schema:

Each time the Regular Expression is matched in the source, the chosen replacement string is produced and is used to match the name (with the same sub/over/whole match options). Replacement is based on Regular Expression grouping. For example:

  • Using the Regular Expression R: ([a-zA-Z_])([a-zA-Z_0-9]+) each parentheses pair generates a grouping and that grouping can be referenced using the notation \1, \2, \etc., \n. If R matches the text The_Cat then \1 is the character T and \2 is the string he_Cat.
  • Using the Regular Expression S: (HisFunc|MyFunc)\(([^)]*)\). If S matches the text HisFunc(his_parameter), then \1 is HisFunc and \2 is his_parameter.

These examples illustrate how this feature could be used:

  • Example 1

    • Regular Expression entered: LoadLibrary\("(([^"\r\n]|""|\\")*)\.dll"\)
    • Replacement entered: \1
    • This combination will match the name of the DLLs (without the extension) called in C/C++ source code
  • Example 2

    • Regular Expression entered: com\.my_package(\.([a-zA-Z_][a-zA-Z_0-9]*))+
    • Replacement entered: \2
    • This combination matches the last part of qualified names beginning with com.my_package
  • Example 3

    • Regular Expression entered: Id(d|x)_Object_([a-zA-Z_][a-zA-Z_0-9]*)
    • Replacement entered: Id\1_\2
    • This combination matches the the names that have the form Idd_Object_Something and then eliminates the Object in the middle E.g.: Idd_Object_Frame -> Idd_Frame Idd_Object_Window -> Idd_Window Idx_Object_Button -> Idx_Button
  • Example 4

    • Regular Expression entered: System.loadLibrary\([ \t]*"([^"]+)"[ \t]*\) Replacement entered: \1.dll
    • This combination matches the names of the C libraries used to call functions via Java native methods
Order of events during execution

A source code analysis has already been carried out and the Analysis Service schema contains the objects resulting from this analysis. A Reference Pattern is then created and the Replace Pattern option is activated and a replacement text entered in the field. When the Reference Pattern is then run, the following occurs:

  • The string chosen as the replacement string is checked for validity, then one of the following occurs:
    • If the chosen replacement string is not valid, the Reference Pattern cannot be executed. The replacement string is considered invalid if it is empty (i.e. nothing entered in the field) or if it contains a reference to a non existent grouping (i.e. \4 whereas the Regular Expression contains only two groups).
    • If the chosen replacement string is valid, the Reference Pattern will then be run. During the process, each time a match with the Regular Expression is located in the selected objects, it is transformed using the chosen replacement string. The result of this transformation is then compared to the names/full names/paths of the Target objects. For each object A whose name matches the result of the transformation, a link will be created between the object containing the Regular Expression match and object A.
Limitations

The maximum number of groupings that you can reference is 9, which means that if the you specify \10 as a replacement it will be interpreted as \1 followed by the character 0 (zero).

It is also worth remembering that if the grouping is the object of a repetition then only the last match in the grouping will be retained. Take for example:

  • Regular Expression entered: [a-zA-Z_]([a-zA-Z_0-9-])*
  • Match string: James_BrowN
  • Replacement entered: \1

In this case, \1 corresponds to N and not to ames_BrowN, thus there is a difference between the following two Regular Expressions using the same match string and replacement:

  • [a-zA-Z_]([a-zA-Z_0-9-])* = "N"
  • [a-zA-Z_]([a-zA-Z_0-9-]*) = "ames_BrowN"

Use this option to select the link that will be created between the Source and the Target objects:

Match target

This section enables you to define what the results of the Source search will be matched to in the Target. Choose from:

  • Name > The object’s short name
  • Full Name > The object’s full name as stored in the Analysis schema

Save and Run

See below.

Check

See below.

Save

See below.

Run the Reference Finder rule manually

The Check button will run the Reference Finder check on the configuration you have entered (Source and Target technologies / object types and the Regular Expression pattern) - the configuration will not be saved and no links will generated using this option. This is purely a “preview”:

Results are available via the VIEW DETAILS button:

And in addition, any items that match this configuration will be displayed in the left hand preview box:

Click the X new References Found to view the links that have been identified:

Clicking a link in the list will also display the code of the Caller object in which the match has been located:

Save the Reference Finder rule

There are two methods to save the Reference Finder rule:

  • Save and Run > The Reference Finder rule configuration will be saved (creating a new rule or updating an existing rule) AND the rule will be executed and any links resulting from the rule will be generated.
  • Save > The Reference Finder rule configuration will be saved (creating a new rule or updating an existing rule) only.

When using either option, the rule will be displayed in the list of Reference Finder rules:

When is a Reference Finder rule run?

Any rule listed in the Reference Finder section (i.e. that has been saved) will be run the next time an analysis is run. This means that any links that are identified by any of the rules that have been saved will be created and stored as part of the analysis results. Links will appear in the Dynamic Links section as they are classed as “dynamic”: