Automatic Links Validator - 1.0

Extension ID

com.castsoftware.automaticlinksvalidator

What’s new ?

Description

At the core of the CAST Imaging transaction discovery algorithm is the understanding of the links between objects discovered during the source code analysis of the target application. For cross-technology links, External Links will identify and record a link between two objects whose validity cannot be precisely determined. These links are tagged as “dynamic”. This extension provides automatic validation of these dynamic links.

In what situation should you install this extension?

The inspection of these dynamic links is necessary to determine whether the link in question is legitimate (i.e. valid) or if instead it should be rejected and removed from the Analysis Service. In many situations, however, manually reviewing Dynamic Links (although a legitimate approach) is discouraged as it will not address the underlying cases that triggered the detection in the first place and it can be very time consuming, particularly if you have a large number of dynamic links to review. This extension is therefore aimed at situations where analysis results contain a very large number of dynamic links that need to be validated automatically.

What does it do?

On completion of an analysis, this extension will scan the results (stored in the Analysis Service schema) to validate, reject or skip the dynamic links automatically. The validation or rejection of a dynamic link is based on a series of heuristics which gave a score θ to the dynamic link:

if θ > 0, the link is validated as true
if θ < 0, the link is rejected as false
if θ = 0, the link is skipped (generally this means that none of the heuristics can be applied to this link and in this case, you will need to review the links manually).
only links that have not yet been manually reviewed or reviewed by this extension in a previous analysis will be pass through the validation process.
links with several bookmarks are handled by the extension, the rule is: if at least one bookmark is validated, then the entire link is validated.
the status of the link in the Analysis Service schema is modified following the validation process.

Report generation

A Microsoft Excel report is generated on completion of an analysis, which contains:

link information: caller, type, callee, and code of link
the resulting action (validated, rejected, skipped)
the description of the heuristics used

This Microsoft Excel report is stored in the LISA folder (Large Intermediate Storage Area) which is usually set to %PROGRAMDATA%\CAST\CAST\CASTMS\LISA on the analysis node:

The report can also be accessed directly in the CAST Console interface in the Overview page under Reports:

Note

In Console 2.x this report is available in:
- legacy onboarding mode without fastscan in all releases
- fastscan onboarding mode from 2.11.x
In Console 1.x this report is available from 1.25.x

Compatibility

Core release	Operating System	Supported
8.4.x	Microsoft Windows / Linux	✅
≥ 8.3.24	Microsoft Windows	✅

Download and installation instructions

This extension is automatically installed (via the Force Install mechanism):

What results can you expect?

The vast majority of the dynamic links in the Analysis Service schema will be reviewed and either validated as true or rejected as false Below is an example of a view in CAST Enlighten, first without the extension and then with the extension. We see that three dynamic links have been (correctly) rejected as false:

Results without extension

Below is the code of the first ‘getInstance’ method: we can see that the reference is in a throw exception, so the link is not valid and needs to be rejected as false:

public static XMLCipher getInstance(String transformation, String canon)
      throws XMLEncryptionException
   {
      XMLCipher instance = XMLCipher.getInstance(transformation);
      
      if (canon != null)
      {
         try
         {
            instance._canon = Canonicalizer.getInstance(canon);
         }
         catch (InvalidCanonicalizerException ice)
         {
            throw new XMLEncryptionException("empty", ice);
         }
      }
      
      return instance;
   }

Results with extension - the false links have been rejected:

Report contents

Below an example of the Microsoft Excel report generated by the extension:

The report contains several sheets/tabs:

Automatic DLM: This sheet shows all the link information, corresponding actions and descriptions of the heuristics used.
Remaining links: This sheet shows the links which haven’t been successfully validated or rejected and in this case, you will need to review the links manually.
Summary: This sheet show numbers summarizing the results of the process.
1. Number of dynamic links
2. Number of links handled by the extension
3. Number of links validated, ignored or skipped
4. Rates of handling, validating, ignoring or skipping links
Conflicting links: Links assessed with conflicts. The links have been checked with clear results but with both validating and ignoring rules. These links will need to be reviewed manually as they are more likely to have an incorrect assessment.

How does it work - Mechanics of the validation process

The extension checks the dynamic link against a series of heuristics
Each heuristic gives a score (positive or negative) to each dynamic link
All scores are added up to give a final score = θ.
The decision to validate as true, reject as false or skip the links is based on the value of θ:
- if θ > 0, the link is validated as true
- if θ < 0, the link is rejected as false
- if θ = 0, the link is skipped (generally this means that none of the heuristics can be applied to this link and in this case, you will need to review the links manually)
A Microsoft Excel report is generated and stored in the LISA folder (Large Intermediate Storage Area) containing information about the status of each link after validation

General information about dynamic links

The application source code is parsed and investigated by analyzers. From this analysis, references to other objects are detected and links are created when appropriate. These links are tagged as “dynamic”. Not all links generated in this fashion are valid and their validation is therefore required. Note that the term ‘dynamic’ is ambiguous: calling them ‘grep’ would be more in accordance with the reality. These links are also described as ’not sure’ compared to links created by parsing associated with resolution. As a consequence they need “validation”. The most common example of such links concerns parsed strings:

std::string message = "SELECT * FROM table";

One of the main objectives of the presence of dynamic links is to be able to see links to a database, even when the SQL code is in client code strings and in unsupported frameworks.

Primary heuristic

Inside a program, a string may either be:

a string that will be interpreted by a human: log, message, ui etc.
a string that represents another code, or part of a code to be interpreted by a program : SQL, name of resource etc…

In the first case the dynamic link is incorrect. In the second case, it is ‘correct’; at least in the sense of ‘grep’.

Description of the heuristics used by the extension

Heuristic	Rationale
Ignore throws exception	String in a `throw` exception is always a message to be interpreted by a human, so the link is invalid.
Skip reference finder	Reference finder link, the extension will skip them and not process any heuristic on it.
Ignore message logging	Log messages are to be interpreted by human, so the link is invalid.
Ignore SQL parameter	SQL parameter are not valid link.
Ignore WPF property changed	Reference is `RaisePropertyChanged(\"ObjectName\")`, this is a classic WPF construct, so an invalid link.
Validate or ignore when the Reference is a path	Validate a reference which is valid path file and the callee object is a file.
Validate call to program	-
Validate or ignore link to properties element	Validate or ignore link to JSP property
Validate link to properties element as argument of a call	Validate link to JSP property when it is the argument of a method call
Validate or ignore SQL query	Validate correct SQL query syntax
Validate C# call procedure	Known functions call to database procedure.
Validate link to Spring bean	-
Ignore link JSP servlet mapping	Ignore link to JSP servlet mapping.
Ignore link from properties element to properties element	Ignore link from JSP property to JSP property.
Ignore properties element when it’s a message logging	Ignore link when caller is `JSP_PROPERTY_MAPPING` and its name contains a log marker
Ignore link to natural language	Ignore link when the reference is in a string of natural language.
Ignore link to directory	Ignore link to a directory (a directory is not an end point neither can it calls a link).
Ignore link to a column of a table	Ignore link to a column table (the link should be to a table).
Ignore link to synonym	Ignore link to a synonym (the link should be to a table).
Ignore link to a wrong type of callee	Ignore link to a wrong type of callee.
Validate .NET DataTable links	Validate link using method from ADO.Net DataTable.
Ignore link when the caller is a sourceFile	Ignore link when caller is a sourceFile and the callee is not a sourceFile.
Ignore link on database index	Ignore link when callee is an index of a database.
Validate .NET ObjectContext methods	Validate link using method from .NET `ObjectContext`.
Validate link to JPA Persistence XML	Validate link to JPA persistence XML file.
Ignore link from JSP file to table	Ignore link from JSP file to table with callee in a tag.
Validate link to SEARCHSTRING	Validate link to a `REFIND_SEARCHSTRING` callee.
Validate link to a java applet	-
Validate link from .NET object to `ENTITY_WRAPPER` object	_
Ignore invalid Struts or Spring links	Ignore Struts or Spring links with wrong type of callee (it’s a common DLM rule).
Validate or Ignore link from Java field to JPA	Validate link with `JV_FIELD` caller and `JPA_NAMED_QUERY` callee when the field is strictly equal to `jpa_entity.jpa_named_query`. Ignore link with `JV_FIELD` caller and `JPA_ENTITY` callee when the field is strictly equal to `jpa_entity.jpa_named_query`.
Ignore link from `toString` methods	-
Validate or Ignore link to JspForward	Ignore link to callee of type `JSP_FORWARD` unless a part of the fullname is found in the source.
Ignore link from wrong method or function	Some standard methods or functions can’t be used to call an object (typically manipulation of string, etc.)
Ignore link to wrong type of callee object from another technology	Some object type can’t be called from “outside” their technology
Ignore link with pattern `callee_name.XXX` or `callee_nameXXX` in code	-
Ignore link when caller is an exception handler	-
Ignore link when caller is exception constructor	-
Ignore link from java field to spring bean	Ignore link from java field caller to spring bean callee
Ignore call to IMS Transaction when the caller is not a Cobol type	Ignore call to IMS Transaction when the caller is not a Cobol type
Validate function call to named queries	Validate function call to named queries
Ignore link from EFile to Struts	Ignore link when the caller is `CAST_Web_File` and the callee is `CAST_JEE_StrutsResult`, `CAST_JEE_StrutsPackage` or `CAST_JEE_StrutsAction`
Ignore link to removed object	Ignore links to object of certain types which are removed by another extension. Current link with callee which are `CAST_JEE_Spring_Batch_Job`, or `CAST_JEE_Spring_Batch_Step` are removed
Ignore link from properties to wrong type of callee	Ignore link when the caller is `JSP_PROPERTY_MAPPING` and the callee is `SPRING_BEAN`, `JSP_SERVLET`, `JPA_ENTITY`, `JPA_EMBEDDABLE` or `CAST_JEE_StrutsAction`
Validate or Ignore link to procedure query	Validate or Ignore link when callee is `CAST_MSTSQL_Procedure`, `SQLScriptProcedure`, `CAST_Oracle_Procedure` and exec, execute or call pattern found
Validate link to potential execute procedure method call	Validate link when callee is `CAST_MSTSQL_Procedure`, `SQLScriptProcedure`, `CAST_Oracle_Procedure` and callee called by a method which name contains executeprocedure, executeproc, execprocedure, execproc

Conflicting links

As mentioned already, each heuristic rule computes a score for each link which can be positive or negative. A positive score will weight in favor of validating the link and a negative score in favor of rejecting it. A conflicting exists when a link obtains positive and negative scores regardless of the value of the final score. These links are worth mentioning because they are the ones where there is the highest risk of an incorrect assessment.

In the case of these links the difficulty lies in the estimation of the marks for each rule and some times a choice has been that deserves an explanation:

Validating rule	Ignoring rule	Results	Rationale
Link to properties element as argument of a call	Probably a log message	IGNORE	A log message is a dead end in transaction analysis and here we typically have an insertion of the property value in a log message. Yes, the link toward the “property” is real but as it is inserted in a log message it has no real value as a link. Moreover if we validate this link we create a risk: if an incorrect analysis is done of the content of the “property” then there is a risk of creating a false transaction. We choose the safe choice of ignoring these links which has more value in the global analysis of an application.
Link to properties element	Probably a log message	IGNORE	A log message is a dead end in transaction analysis and here we typically have an insertion of the property value in a log message. Yes, the link toward the “property” is real but as it is inserted in a log message it has no real value as a link. Moreover if we validate this link we create a risk : if an incorrect analysis is done of the content of the “property” then there is a risk of creating a false transaction. We choose the safe choice of ignoring these links which has more value in the global analysis of an application.
Link to properties element as argument of a call	This is probably natural language	IGNORE	Natural language is destined to be read by human, it is a dead end in transaction analysis and here we typically have an insertion of the property value in this message. Yes, the link toward the “property” is real but as it is inserted in a natural language message it has no real value as a link. Moreover if we validate this link we create a risk: if an incorrect analysis is done of the content of the “property” then there is a risk of creating a false transaction. We choose the safe choice of ignoring these links which has more value in the global analysis of an application.
Link to properties element	This is probably natural language	IGNORE	Natural language is destined to be read by human, it is a dead end in transaction analysis and here we typically have an insertion of the property value in this message. Yes, the link toward the “property” is real but as it is inserted in a natural language message it has no real value as a link. Moreover if we validate this link we create a risk : if an incorrect analysis is done of the content of the “property” then there is a risk of creating a false transaction. We choose the safe choice of ignoring these links which has more value in the global analysis of an application.
Link to properties element as argument of a call	This is a throw exception, so an invalid link	IGNORE	An exception message is a dead end in transaction analysis and here we typically have an insertion of the property value in an exception message. Yes, the link toward the “property” is real but as it is inserted in an exception message it has no real value as a link. Moreover if we validate this link we create a risk : if an incorrect wrong analysis is done of the content of the “property” then there is a risk of creating a false transaction. We choose the safe choice of ignoring these links which has more value in the global analysis of an application.
Link to properties element	This is a throw exception, so an invalid link	IGNORE	An exception message is a dead end in transaction analysis and here we typically have an insertion of the property value in an exception message. Yes, the link toward the “property” is real but as it is inserted in an exception message it has no real value as a link. Moreover if we validate this link we create a risk : if an incorrect analysis is done of the content of the “property” then there is a risk of creating a false transaction. We choose the safe choice of ignoring these links which has more value in the global analysis of an application.

Run the extension independently

This section present the method to run the extension independently of an analysis and directly on a knowledge base. It can be useful if you have already performed your analysis without having installed the extension or if you want to use a new version of the extension on an old analysis. You have two possibilities:

run directly the extension via a python script with a list of arguments;
run the batch script run.bat present in the extension.

Warning

It is strongly advised to use the python interpreter of your version of CAST Imaging Core. If not you take the risk of missing libraries (cast extension SDK for example). The interpreter can be found in the folder “ThirdParty\Python34” of CAIP.

The extension can be run independently only if the application has already been analyzed.

Using a python command line to run the extension

The script is located in main.py file of the extension folder. The command is the following:

/path to python interpreter/python /path to com.castsoftware.automaticlinksvalidator/main.py cmd kb_name application_name src_code_root_path  [-l LOCAL_SRC_ROOT_PATH] [-r REPORT_PATH] [-p REPORT_PREFIX] [-n] [-a] [-d]

Where:

cmd asks for the command line run (MANDATORY);
kb_name is the name of the knowledge base used for the analysis (MANDATORY);
application_name is the name of the application (MANDATORY);
src_code_root_path is the path to the root folder of the code used for the analysis (MANDATORY);
local_scr_root_path is the path to the root folder of the code if it is not the same used for the analysis (only interesting if you have retrieved a kb and the source code of the application);
report_path path to the folder where you want the report to be put;
report_prefix is the prefix for the report, by default it is the application name;
-n specifies that you do not want the extension to modify the knowledge base (useful if you’re only interested in the report);
-a specifies that you want the extension to check all dynamic links including those which are already validated or ignored (To be use with strong caution as it will probably changes results);
-d specifies that you want the development report (only useful for developers of the extension);

Using the script run.bat

Fills the mandatory fields and the optional parameters in the script and run it.

aip_path is the path to AIP (MANDATORY);
automaticlinksvalidator_path is the path to the extension automaticlinksvalidator (MANDATORY);
kb_name is the name of the knowledge base used for the analysis (MANDATORY);
application_name is the name of the application (MANDATORY);
kb_src_root_path is the path to the root folder of the code used for the analysis (MANDATORY);
local_src_root_path is the path to the root folder of the code if it is not the same used for the analysis (only interesting if you have retrieved a kb and the source code of the application);
report_path is the path to the folder where you want the report to be put;
report_prefix is the prefix for the report, by default it is the application name;
not_apply_validation specifies that you do not want the extension to modify the knowledge base (useful if you’re only interested in the report);
review_all_dynamic_links specifies that you want the extension to check all dynamic links including those which are already validated or ignored (To be use with strong caution as it will probably changes results);
development_report specifies that you want the development report (only useful for developers of the extension);

Limitations

Automaticlinksvalidator does not support VB6 Dynamic Links and as such all links with a caller from visual basic will not be treated nor will it be counted in the statistics.