Automatic Links Validator - 1.0
Extension ID
com.castsoftware.automaticlinksvalidator
What’s new ?
See Release Notes - 1.0.
Description
At the core of the CAST Imaging transaction discovery algorithm is the understanding of the links between objects discovered during the source code analysis of the target application. For cross-technology links, External Links will identify and record a link between two objects whose validity cannot be precisely determined. These links are tagged as “dynamic”. This extension provides automatic validation of these dynamic links.
In what situation should you install this extension?
The inspection of these dynamic links is necessary to determine whether the link in question is legitimate (i.e. valid) or if instead it should be rejected and removed from the Analysis Service. In many situations, however, manually reviewing Dynamic Links (although a legitimate approach) is discouraged as it will not address the underlying cases that triggered the detection in the first place and it can be very time consuming, particularly if you have a large number of dynamic links to review. This extension is therefore aimed at situations where analysis results contain a very large number of dynamic links that need to be validated automatically.
What does it do?
On completion of an analysis, this extension will scan the results (stored in the Analysis Service schema) to validate, reject or skip the dynamic links automatically. The validation or rejection of a dynamic link is based on a series of heuristics which gave a score θ to the dynamic link:
- if θ > 0, the link is validated as true
- if θ < 0, the link is rejected as false
- if θ = 0, the link is skipped (generally this means that none of the heuristics can be applied to this link and in this case, you will need to review the links manually).
- only links that have not yet been manually reviewed or reviewed by this extension in a previous analysis will be pass through the validation process.
- links with several bookmarks are handled by the extension, the rule is: if at least one bookmark is validated, then the entire link is validated.
- the status of the link in the Analysis Service schema is modified following the validation process.
Report generation
A Microsoft Excel report is generated on completion of an analysis, which contains:
- link information: caller, type, callee, and code of link
- the resulting action (validated, rejected, skipped)
- the description of the heuristics used
This Microsoft Excel report is stored in the LISA folder (Large Intermediate Storage Area) which is usually set to %PROGRAMDATA%\CAST\CAST\CASTMS\LISA on the analysis node:
The report can also be accessed directly in the CAST Console interface in the Overview page under Reports:
Note
- In Console 2.x this report is available in:
- legacy onboarding mode without fastscan in all releases
- fastscan onboarding mode from 2.11.x
- In Console 1.x this report is available from 1.25.x
Compatibility
CAST Imaging Core | Supported |
---|---|
≥ 8.3.0 | ✅ |
Download and installation instructions
This extension is automatically installed (via the Force Install mechanism):
What results can you expect?
The vast majority of the dynamic links in the Analysis Service schema will be reviewed and either validated as true or rejected as false Below is an example of a view in CAST Enlighten, first without the extension and then with the extension. We see that three dynamic links have been (correctly) rejected as false:
Results without extension
Below is the code of the first ‘getInstance’ method: we can see that the reference is in a throw exception, so the link is not valid and needs to be rejected as false:
public static XMLCipher getInstance(String transformation, String canon)
throws XMLEncryptionException
{
XMLCipher instance = XMLCipher.getInstance(transformation);
if (canon != null)
{
try
{
instance._canon = Canonicalizer.getInstance(canon);
}
catch (InvalidCanonicalizerException ice)
{
throw new XMLEncryptionException("empty", ice);
}
}
return instance;
}
Results with extension - the false links have been rejected:
Report contents
Below an example of the Microsoft Excel report generated by the extension:
The report contains several sheets/tabs:
- Automatic DLM: This sheet shows all the link information, corresponding actions and descriptions of the heuristics used.
- Remaining links: This sheet shows the links which haven’t been successfully validated or rejected and in this case, you will need to review the links manually.
- Summary: This sheet show numbers summarizing the results of the
process.
- Number of dynamic links
- Number of links handled by the extension
- Number of links validated, ignored or skipped
- Rates of handling, validating, ignoring or skipping links
- Conflicting links: Links assessed with conflicts. The links have been checked with clear results but with both validating and ignoring rules. These links will need to be reviewed manually as they are more likely to have an incorrect assessment.
How does it work - Mechanics of the validation process
- The extension checks the dynamic link against a series of heuristics
- Each heuristic gives a score (positive or negative) to each dynamic link
- All scores are added up to give a final score = θ.
- The decision to validate as true, reject as false or skip the links
is based on the value of θ:
- if θ > 0, the link is validated as true
- if θ < 0, the link is rejected as false
- if θ = 0, the link is skipped (generally this means that none of the heuristics can be applied to this link and in this case, you will need to review the links manually)
- A Microsoft Excel report is generated and stored in the LISA folder (Large Intermediate Storage Area) containing information about the status of each link after validation
General information about dynamic links
The application source code is parsed and investigated by analyzers. From this analysis, references to other objects are detected and links are created when appropriate. These links are tagged as “dynamic”. Not all links generated in this fashion are valid and their validation is therefore required. Note that the term ‘dynamic’ is ambiguous: calling them ‘grep’ would be more in accordance with the reality. These links are also described as ’not sure’ compared to links created by parsing associated with resolution. As a consequence they need “validation”. The most common example of such links concerns parsed strings:
std::string message = "SELECT * FROM table";
One of the main objectives of the presence of dynamic links is to be able to see links to a database, even when the SQL code is in client code strings and in unsupported frameworks.
Primary heuristic
Inside a program, a string may either be:
- a string that will be interpreted by a human: log, message, ui etc.
- a string that represents another code, or part of a code to be interpreted by a program : SQL, name of resource etc…
In the first case the dynamic link is incorrect. In the second case, it is ‘correct’; at least in the sense of ‘grep’.
Description of the heuristics used by the extension
Heuristic | Rationale |
---|---|
Ignore throws exception | String in a throw exception is always a message to be interpreted by a human, so the link is invalid. |
Skip reference finder | Reference finder link, the extension will skip them and not process any heuristic on it. |
Ignore message logging | Log messages are to be interpreted by human, so the link is invalid. |
Ignore SQL parameter | SQL parameter are not valid link. |
Ignore WPF property changed | Reference is RaisePropertyChanged(\"ObjectName\") , this is a classic WPF construct, so an invalid link. |
Validate or ignore when the Reference is a path | Validate a reference which is valid path file and the callee object is a file. |
Validate call to program | - |
Validate or ignore link to properties element | Validate or ignore link to JSP property |
Validate or ignore SQL query | Validate correct SQL query syntax |
Validate C# call procedure | Known functions call to database procedure. |
Validate link to Spring bean | - |
Ignore link JSP servlet mapping | Ignore link to JSP servlet mapping. |
Ignore link from properties element to properties element | Ignore link from JSP property to JSP property. |
Ignore properties element when it’s a message logging | Ignore link when caller is JSP_PROPERTY_MAPPING and its name contains a log marker |
Ignore link to natural language | Ignore link when the reference is in a string of natural language. |
Ignore link to directory | Ignore link to a directory (a directory is not an end point neither can it calls a link). |
Ignore link to a column of a table | Ignore link to a column table (the link should be to a table). |
Ignore link to synonym | Ignore link to a synonym (the link should be to a table). |
Ignore link to a wrong type of callee | Ignore link to a wrong type of callee. |
Validate .NET DataTable links | Validate link using method from ADO.Net DataTable. |
Ignore link when the caller is a sourceFile | Ignore link when caller is a sourceFile and the callee is not a sourceFile. |
Ignore link on database index | Ignore link when callee is an index of a database. |
Validate .NET ObjectContext methods | Validate link using method from .NET ObjectContext . |
Validate link to JPA Persistence XML | Validate link to JPA persistence XML file. |
Ignore link from JSP file to table | Ignore link from JSP file to table with callee in a tag. |
Validate link to SEARCHSTRING | Validate link to a REFIND_SEARCHSTRING callee. |
Validate link to a java applet | - |
Validate link from .NET object to ENTITY_WRAPPER object |
_ |
Ignore invalid Struts or Spring links | Ignore Struts or Spring links with wrong type of callee (it’s a common DLM rule). |
Validate or Ignore link from Java field to JPA | Validate link with JV_FIELD caller and JPA_NAMED_QUERY callee when the field is strictly equal to jpa_entity.jpa_named_query .Ignore link with JV_FIELD caller and JPA_ENTITY callee when the field is strictly equal to jpa_entity.jpa_named_query . |
Ignore link from toString methods |
- |
Validate or Ignore link to JspForward | Ignore link to callee of type JSP_FORWARD unless a part of the fullname is found in the source. |
Ignore link from wrong method or function | Some standard methods or functions can’t be used to call an object (typically manipulation of string, etc.) |
Ignore link to wrong type of callee object from another technology | Some object type can’t be called from “outside” their technology |
Ignore link with pattern callee_name.XXX or callee_nameXXX in code |
- |
Ignore link when caller is an exception handler | - |
Ignore link when caller is exception constructor | - |
Ignore link from java field to spring bean | Ignore link from java field caller to spring bean callee |
Ignore call to IMS Transaction when the caller is not a Cobol type | Ignore call to IMS Transaction when the caller is not a Cobol type |
Validate function call to named queries | Validate function call to named queries |
Conflicting links
As mentioned already, each heuristic rule computes a score for each link which can be positive or negative. A positive score will weight in favor of validating the link and a negative score in favor of rejecting it. A conflicting exists when a link obtains positive and negative scores regardless of the value of the final score. These links are worth mentioning because they are the ones where there is the highest risk of an incorrect assessment.
In the case of these links the difficulty lies in the estimation of the marks for each rule and some times a choice has been that deserves an explanation:
Validating rule | Ignoring rule | Results | Rationale |
---|---|---|---|
Link to properties element as argument of a call | Probably a log message | IGNORE | A log message is a dead end in transaction analysis and here we typically have an insertion of the property value in a log message. Yes, the link toward the “property” is real but as it is inserted in a log message it has no real value as a link. Moreover if we validate this link we create a risk: if an incorrect analysis is done of the content of the “property” then there is a risk of creating a false transaction. We choose the safe choice of ignoring these links which has more value in the global analysis of an application. |
Link to properties element | Probably a log message | IGNORE | A log message is a dead end in transaction analysis and here we typically have an insertion of the property value in a log message. Yes, the link toward the “property” is real but as it is inserted in a log message it has no real value as a link. Moreover if we validate this link we create a risk : if an incorrect analysis is done of the content of the “property” then there is a risk of creating a false transaction. We choose the safe choice of ignoring these links which has more value in the global analysis of an application. |
Link to properties element as argument of a call | This is probably natural language | IGNORE | Natural language is destined to be read by human, it is a dead end in transaction analysis and here we typically have an insertion of the property value in this message. Yes, the link toward the “property” is real but as it is inserted in a natural language message it has no real value as a link. Moreover if we validate this link we create a risk: if an incorrect analysis is done of the content of the “property” then there is a risk of creating a false transaction. We choose the safe choice of ignoring these links which has more value in the global analysis of an application. |
Link to properties element | This is probably natural language | IGNORE | Natural language is destined to be read by human, it is a dead end in transaction analysis and here we typically have an insertion of the property value in this message. Yes, the link toward the “property” is real but as it is inserted in a natural language message it has no real value as a link. Moreover if we validate this link we create a risk : if an incorrect analysis is done of the content of the “property” then there is a risk of creating a false transaction. We choose the safe choice of ignoring these links which has more value in the global analysis of an application. |
Link to properties element as argument of a call | This is a throw exception, so an invalid link | IGNORE | An exception message is a dead end in transaction analysis and here we typically have an insertion of the property value in an exception message. Yes, the link toward the “property” is real but as it is inserted in an exception message it has no real value as a link. Moreover if we validate this link we create a risk : if an incorrect wrong analysis is done of the content of the “property” then there is a risk of creating a false transaction. We choose the safe choice of ignoring these links which has more value in the global analysis of an application. |
Link to properties element | This is a throw exception, so an invalid link | IGNORE | An exception message is a dead end in transaction analysis and here we typically have an insertion of the property value in an exception message. Yes, the link toward the “property” is real but as it is inserted in an exception message it has no real value as a link. Moreover if we validate this link we create a risk : if an incorrect analysis is done of the content of the “property” then there is a risk of creating a false transaction. We choose the safe choice of ignoring these links which has more value in the global analysis of an application. |
Run the extension independently
This section present the method to run the extension independently of an analysis and directly on a knowledge base. It can be useful if you have already performed your analysis without having installed the extension or if you want to use a new version of the extension on an old analysis. You have two possibilities:
- run directly the extension via a python script with a list of arguments;
- run the batch script run.bat present in the extension.
Warning
It is strongly advised to use the python interpreter of your version of CAST Imaging Core. If not you take the risk of missing libraries (cast extension SDK for example). The interpreter can be found in the folder “ThirdParty\Python34” of CAIP.
The extension can be run independently only if the application has already been analyzed.
Using a python command line to run the extension
The script is located in main.py file of the extension folder. The command is the following:
/path to python interpreter/python /path to com.castsoftware.automaticlinksvalidator/main.py cmd kb_name application_name src_code_root_path [-l LOCAL_SRC_ROOT_PATH] [-r REPORT_PATH] [-p REPORT_PREFIX] [-n] [-a] [-d]
Where:
- cmd asks for the command line run (MANDATORY);
- kb_name is the name of the knowledge base used for the analysis (MANDATORY);
- application_name is the name of the application (MANDATORY);
- src_code_root_path is the path to the root folder of the code used for the analysis (MANDATORY);
- local_scr_root_path is the path to the root folder of the code if it is not the same used for the analysis (only interesting if you have retrieved a kb and the source code of the application);
- report_path path to the folder where you want the report to be put;
- report_prefix is the prefix for the report, by default it is the application name;
- -n specifies that you do not want the extension to modify the knowledge base (useful if you’re only interested in the report);
- -a specifies that you want the extension to check all dynamic links including those which are already validated or ignored (To be use with strong caution as it will probably changes results);
- -d specifies that you want the development report (only useful for developers of the extension);
Using the script run.bat
Fills the mandatory fields and the optional parameters in the script and run it.
- aip_path is the path to AIP (MANDATORY);
- automaticlinksvalidator_path is the path to the extension automaticlinksvalidator (MANDATORY);
- kb_name is the name of the knowledge base used for the analysis (MANDATORY);
- application_name is the name of the application (MANDATORY);
- kb_src_root_path is the path to the root folder of the code used for the analysis (MANDATORY);
- local_src_root_path is the path to the root folder of the code if it is not the same used for the analysis (only interesting if you have retrieved a kb and the source code of the application);
- report_path is the path to the folder where you want the report to be put;
- report_prefix is the prefix for the report, by default it is the application name;
- not_apply_validation specifies that you do not want the extension to modify the knowledge base (useful if you’re only interested in the report);
- review_all_dynamic_links specifies that you want the extension to check all dynamic links including those which are already validated or ignored (To be use with strong caution as it will probably changes results);
- development_report specifies that you want the development report (only useful for developers of the extension);