Introduction
Onboarding process
- Flow diagram
- Prerequisites
Main process description
- Process when presentation layer is supported
- Process requiring alternate input methods definition or SOA layer workaround
  - CAST Management Studio - build a batch script to kick-off DataflowRunner independently
How to ensure the blackbox file for alternate input methods worked well

Summary: Detailed instructions for onboarding an Application specifically to run User Input Security checks

Introduction

Previous versions of the Dataflow onboarding process used to put the focus on tooling (linq, Excel) for blackbox definitions production. Since AIP Core 8.3.3, the process has been simplified thanks to:

pass-through mode: since 8.3.0, when the engine encounters an unknown external method, it is considered as a collection (continue the flow), instead of stopping the current search.
automatic blackboxing: since 8.3.3, the DataflowRunner engine will decide "on-the-fly" the method's semantic, based on internal rules (signatures and advanced patterns). This does complement with a few (10 to couple of dozens definition) the predefined methods.

However, in case of unsupported presentation layer or SOA layer, the flow is stopped and cannot reach the input layer, and thus the definition of alternate input methods is still required. As a consequence, the process is "lighter", and has two main branches, depending on the presentation layer in the source code:

supported presentation layer (top flow)
unsupported presentation layer (bottom process)

Onboarding process

Flow diagram

The tasks in grey are AIA actions.
The tasks in blue are processing time.
The tasks in red undertaken by a less experienced SME means these tasks may require help of a more experienced consultant: SME or architect who has knowledge of the application's architecture.

Click to enlarge

Prerequisites

For logging your progression, please refer to the checklist. You should start this checklist from inception of the onboarding process, and not defer it to a final check (to avoid last minute surprises and inevitable re-work). There are four important actions to perform upfront (prerequisites), plus optionally one external tool (LINQPad) to download and install.

Architecture review (light)

This task requires knowledge of the application architecture. It consists of:

listing the frameworks used by the application that will have an impact on dataflow onboarding
creating an architecture diagram
depicting the interaction between layers, the middleware, as well as all the patterns of user input flows down to resources, especially to the database.

This review can be light: focus on presentation layer and potential blockers (internal SOA layer), and skip the multiple targets (SQL, LDAP, XPath etc.) gap analysis, since the latter will be covered by the rule-based blackboxing. The complete task is described in Architecture review.

Deploy Security for Java extension

When using the User Input Security feature with JEE source code, you should download and install the latest release of the Security for Java extension. This extension is not provided "out of the box" in AIP Core nor in Console.

Enable User Input Security option

For JEE, this will enable CASTIL production (in a ByteCode folder) + trigger DataflowRunner task.
For .NET, the production of CASTIL is already performed by default (for C/S link resolution and devirtualization), so the option will just trigger the DataflowRunner task.
DataflowRunner is the task that consumes CASTIL, performing a series of searches.

In Console:

In the CAST Management Studio, in the tab User Input Security:

Clean JEE / .NET Analyzer / Security for Java log

Ensure you have a "clean" Security for Java / JEE / .NET log - "clean" in terms of symbol resolution, meaning no or very few unresolved symbols. When the Security for Java extension is used for CASTIL generation, the check must happen in its own log only: located in the LISA folder and named for instance "Job-generation-YYYY-MM-DD.log", and located in Log subfolder of Bytecode folder :

\LISA\f05e07b9e6fc4c689d55aab58b7d004f\Scrb0733409f51346d79c3d81d02a713135\Log\Job-generation-2019-02-27.log.

Be aware that this log is cleaned before each new run. The missing imports are listed, if any - some examples are shown below:

Job-generation-2018-12-05.log, depicting missing (unresolved) import

08:14:11.342 [main] INFO  com.castsoftware.castil.translation.Main - START, destination = C:\CASTMS\LISA\fd5b3a6370ff42df9e3c3755e1de3e44\Scrad975bd6bc7546dbbce5f5665ab7bf7b
08:17:05.132 [main] INFO  com.castsoftware.castil.translation.Main - 0 fatal error(s)
08:17:05.134 [main] INFO  com.castsoftware.castil.translation.Main - Missing imports: 
  javax.jms
  javax.mail
  javax.transaction.UserTransaction
  javax.xml.rpc.handler.soap
  javax.xml.rpc.soap
  oracle.sql
  org.apache.commons.lang
  org.apache.xmlbeans
  org.jboss
  org.slf4j
08:17:05.135 [main] INFO  com.castsoftware.castil.translation.Main - END of main process
08:17:05.135 [main] INFO  com.castsoftware.castil.translation.Main - Execution time: 174296 ms

The remediation is to provide the missing JARs/assemblies or required dependencies between the two Analysis Units. Configuration must be made in CAST MS or Console, which are responsible to generate the project.xml file, holding the Security for Java analysis configuration.

Sample of clean "Job-generation-YYYY-MM-DD.log" :

Job-generation-2018-12-05.log, depicting clean log, with NO missing import

15:00:59.358 [main] INFO com.castsoftware.castil.translation.Main - START, destination = C:\CASTMS\LISA\f05e07b9e6fc4c689d55aab58b7d004f\Scrb0733409f51346d79c3d81d02a713135
15:01:01.123 [main] INFO com.castsoftware.castil.translation.Main - 0 fatal error(s)
15:01:01.124 [main] INFO com.castsoftware.castil.translation.Main - Missing imports: 
15:01:01.124 [main] INFO com.castsoftware.castil.translation.Main - END of main process
15:01:01.124 [main] INFO com.castsoftware.castil.translation.Main - Execution time: 2370 ms

Each "unresolved symbol" warning in the JEE log or Security for Java log could stop the flow, and thus prevent the engine from finding an end to end flow and in turn paving the road for false negative results:

In JEE Analyzer:
- the main unrevolved warning is JAVA124 - Cannot resolve '%NAME%' as %TYPE%%IN%%FROM% see JEE - Analysis messages.
- Also, syntax error, like JAVA044 - Syntax not recognized, will have severe consequence on CASTIL generation.
In Security for Java:
- the missing imports are listed in \Log\Job-generation-YYYY-MM-DD.log

In .NET Analyzer, the main unresolved warnings are the two warnings from Roslyn, namely:
- error CS0234: The type or namespace name 'ATypeOrNamespace' does not exist in the namespace 'aNamespace' (are you missing an assembly reference?)
- error CS0246: The type or namespace name 'ATypeOrNamespace' could not be found (are you missing a using directive or an assembly reference?)
- Since version 1.2.4 of .Net Analyzer, these message have been replaced by DOTNET.0150 - No definition found for the name 'xxxxx'. Therefore no link will be drawn to that object.

All unresolved symbol warnings should be fixed (you must add the required assemblies, NuGet packages or missing projects).

Consequence of missing import

Failure to perform this step of log cleaning will jeopardize the ability to find flaws, due to either :

interrupted flow : when the flow encounter such method call
undefined target : when the JAR defining the resource access is missing, there is no way to start a search from that target method

When to stop cleaning the log?

Depending on the context: easy access to dev team, length of the analysis process, nature of the missing import (and how much they are used in the application), the AIA can decide to stop remediating and go for the results. He must keep in mind the above consequences on result accuracy. However, any flaw found would not be affected by the missing import. Missing imports and unresolved symbols only affect potential false negatives.

Optional - External tools (for queries in CASTIL and custom blackbox generation from Excel)

Download LINQPad 4.x or 5.x (Linqpad download page). Ensure you choose the AnyCPU version from the More Download Options list.

This will download an installable executable with the name LINQPad5Setup-AnyCPU.exe
The default version of LINQPad5Setup.exe (other that AnyCPU) won't be able to connect to the API, due to the way the AIP Core .NET assemblies are compiled.
Regarding some LINQ scripts, you will have to adapt the path of AIP Core assemblies using the F4 dialog (Query Properties/Additional References).

Main process description

The process is:

either straightforward (limited to pre-requisites checking) : this is the upper branch when Presentation Layer is supported.
or requires additional configuration to cope with the application's architecture, especially thanks to the definition of alternate input methods.

Process when presentation layer is supported

As stated above, this process is straightforward, i.e. it does not require specific configuration. However, checking the SecurityAnalyzer.log and filling the checklist are 2 important checks to be performed, to ensure all goes well.

Checking the SecurityAnalyzer.log will provide useful information: how many searches have been started, how many found flaws, which input methods are called, the external methods without definition, the rule-based blackboxing decisions, etc.
The checklist ensures all necessary work is done and results are accurate.

Process requiring alternate input methods definition or SOA layer workaround

An initial run (analysis only) is required to produce the CASTIL. A first snapshot is also a means to figure out flaws found without configuration, and compare with additional flaws found in your custom configuration.

CAST Management Studio - build a batch script to kick-off DataflowRunner independently

When using Console, creating a batch script to run the DataflowRunner independently is not required. In Console you can run it using the start button highlighted below when you are in the Security Dataflow options screen:

Since one or more iterations could be required, having a bacth script will allow you to run DataflowRunner outside of a snapshot, saving some runtime. The batch script can be created in any work folder.

Build the RunSecurityAnalyzer.BAT script (expand for details)

Edit the SecurityAnalyzer log, and copy-paste the command line starting at line 3 (Starting with "PrimaryLog: Security analyzer called with :" from --jobId=3 --flawSpec ... till end of the command line that span several lines.
Save into a text file, with the extension .bat.
Change the --batch parameter as follows:

from --batch=<a file in LTSA folder> (LTSA is emptied at end of an analysis, or exactly when CMS task window is closed)
to -b=<path to the Bytcode folder in LISA>

Example:

--batch=D:\CASTMS\LILT833\LT\LTSA/afd0c86d5ff64763a9d18e4e777aaaa0/DataflowInput.txt
by

--b=D:\CASTMS\LILT833\SA\LISA\afd0c86d5ff64763a9d18e4e777aaaa0\Scr40d9debf8a13407ea81a23b7ba5f6af9

This full path being <the full path to your Bytecode folder>, taken from param -log, for instance, that is the folder where generated CASTIL resides.

Add DataflowRunner.exe in front of this command line (or full path if you cannot save the BAT script in same folder as where CASTIL is located : this is where you will get the results to inspect (the 2 logs and 2 results file BuildAgent.*)
You are done.

Run DataflowRunner from the BAT script.
Check results in SecurityAnalyzer.log using How to read SecurityAnalyzer log in one go
1. This check is about controlling that the execution went well from custom blackbox loading to detection of flaws.

Tagging a series of methods with LINQPad-Excel-LINQPad-Notepad++ :

Identify alternate methods to define (See Architecture review)
Generate a list of methods (with mangling) using a SQL query or CASTIL query
Copy the resultSet in column two of the Excel template workbook [ Alternate_input_methods_preparation.xlsx ]
Tag each method with the desired action : Input (User Input), Target-XYZ, Clear (sanitization = neutralization), Collection (all 3 types of collection).
1. Use the LOV values in column 1 to do that. (pick up 1 of the 11 possible values)
2. This is the functional part of the process, which remains manual and requires some Java or .NET skills to determine what the method does = its semantics
3. This requires also some knowledge of the 8/11 main CWE-xxx supported by AIP Core. Reading the existing standard blackboxes for rt.jar / mscorlib, as well as shared blackbox for popular frameworks is a good source of inspiration.
Save Excel file as .CSV format.
1. Say Yes to all dialogs.
2. Make sure all items have a tag in column 1. Failure to tag all methods will trigger an error in next step.
3. Make sure to remove any trailing blank line and anything in C column
4. Tip : edit in Notepad++
  1. the .linq script expects a ";" as a field delimiter. Change the regional settings in Windows Control Panel if necessary – change 'List Separator" to ";"
5. Close .csv file in Notepad++ (to prevent linq script to fail on lock).
Use LINQ script 2.GenerateBlackboxXMLfromCSV.linq to generate the custom blackbox XML file from the .csv file.
1. Edit and adapt string filename=@"D:\SOURCES\MYAPP\somefolder\my_input_methods.csv";
2. Run (F5) - If successful, a my_input_methods.csv.blackbox.xml file will be automatically generated and saved in the same location of the .csv file.
3. Open the my_input_methods.csv.blackbox.xml file, add/override with a user-friendly blackbox name and the mandatory namespace on line 2: <BlackBox name="a_unique_name" xmlns="http://tempuri.org/BlackBoxes.xsd">
4. Copy the generated file my_input_methods.csv.blackbox.xml (in same folder as above .csv) to your preferred custom blackbox folder (for instance: your_df_extension\configuration\blackboxes\)

How to ensure the blackbox file for alternate input methods worked well

The blackbox file loading action message must be seen in SecurityAnalyzer-Tiimestamp.log, and have no error.
The number of flaws found should raise dramatically (from 0 or a handful of flaws to several dozen, if no sanitization is in place).
1. If the number of flaws found is unchanged from the previous execution, it may mean either :
  1. there are some other blocking issue in the CASTIL:
    1. too many unresolved symbols lead to interrupted flows → fix the unresolved warning thanks to JEE or .NET analysis configuration.
    2. An SOA layer interrupts the flow → define 2 half-flows
  2. The application has really no injection flaw → perform all the remaining checks in the checklist, and when OK, the results are good to deliver.

User Input Security - Detailed onboarding process instructions