CMS Snapshot Analysis - Compute Snapshot - Performance issue

*Purpose

This page will help you to troubleshoot the performance problems, you may face in Compute snapshot step

Performance issue means either the compute snapshot is stuck or takes more time than expected to complete.

For more information, refer to:

Official documentation

8.3 Documentation - Snapshot generation and validation
8.2 Documentation - Snapshot generation and validation
8.1 Documentation - Snapshot generation and validation
8.0 Documentation - Snapshot generation and validation

CAST-MS help->How to -> Regular Audience ->Run Analysis only,Generate Snapshot
Information Guides > CMS Snapshot Analysis

*Observed in CAST AIP

Release	Yes/No
8.3.x
8.2.x
8.1.x
8.0.x
7.3.x

*Observed on RDBMS

RDBMS	Yes/No
Oracle Server
Microsoft SQL Server
CSS3
CSS2
CSS1

Step by Step scenario

Compute snapshot

Action Plan

Check if the minimum system configuration requirements for deploying CAST AIP is followed, for this refer to Minimum system requirements, if it's not the case then configure the system as required by CAST, then compute snapshot by skipping analysis
If the schemas are hosted on an Oracle Server, CAST recommends to force the use of the ALL_ROWS optimizer mode instead of the default RULE mode (RULE mode is the default configuration), to do so refer to Performance issues when using an Oracle Server host for the CAST AIP schemas
Check if the user character Sets are supported, for that compare the character Sets of your RDBMS by following the page RDBMS Information - Common to all RDBMS - Get the character set with the ones explained in the official documentation Supported Character Sets
Check if the server is configured according to CAST recommendation, for this perform the following
1. Run the following query on a Postgres session:
```
SELECT NAME, 
       setting 
FROM   pg_settings 
```
  As a query result sample, refer to PG_param.sql
2. Compare the server configuration retrieved from the previous step with the server configuration recommended by CAST: Server configuration
  If the configurations are not identical then change the parameter from postgresql.conf (the file is in the postgresql directory\db_data) as recommended by CAST, restart PgAdmin Service, then compute a snapshot by skiping analysis.
The issue can be due to the change in work_mem parameter.
Optimize the local and central schema by following the page CSS Tools - Optimize - Updating the statistics - using Cssoptimize exe on a schema then compute a snapshot by skipping analysis.
Vacuum Full the tables of local and central schema by following the the page Tools - How to VACUUM FULL tables of a schema then compute a snapshot by skipping analysis.
Check if there are indexes that have been dropped in the local and central:
1. Running the query below can provide a list of tables where index scans were not triggered. If this list includes large tables, it suggests that the tables may be missing indexes.
  DSS_HISTORY
```
  select relname, n_live_tup
  from pg_stat_all_tables
  where schemaname = <>
  and idx_scan=0
  order by 2 desc 
```
  Query result sample:
```
  | keys | 1234567890 |
```
  Query interpretation:
```
The query returns 2 fields
relname: table name
n_live_tup: number of rows
```
2. To verify missing indexes, compare the index counts between the current local and central schemas with a newly created local and central schema. For more details, refer to the SQL Queries - Common SQL Queries - How to retrieve the count of indexes used for a schema. If any indexes are missing, perform a component reinstall using the Server Manager Tool.
You can also find the procedure which is taking long time for the execution using the query mentioned in step-10 and get the involved table names in that particular procedure. Now check in your schema if those tables are missing any indexes. You can easily check in pgadmin.
Check the usage of CPU on RDBMS Server and analysis machine, for that you can use the Task manager:

If the usage of CPU has reached 100% then manage the processes that are currently in progress in order to allocate enough space for CAST AIP.
At this stage we need to check if the computation gets stuck in a specific stored procedure/command line or there are one or several stored procedure/command lines that is consuming long time. For this we start by getting the time processed by stored procedure/command line by having a Start status and an End Status:
DSS_HISTORY
```
SELECT Substr(START_HIST.description, 7)                 AS Action, 
       START_HIST.action_date                            Start_date, 
       END_HIST.action_date                              end_date, 
       ( END_HIST.action_date - START_HIST.action_date ) AS consumed_time 
FROM   dss_history START_HIST, 
       dss_history END_HIST 
WHERE  START_HIST.history_id + 1 = END_HIST.history_id 
       AND START_HIST.description LIKE 'Start%' 
       AND END_HIST.description LIKE 'End%' 
ORDER  BY ( END_HIST.action_date - START_HIST.action_date ) DESC 
```
Query result sample:
DSS_HISTORY
```
"action";                                          "start_date";             "end_date";               "consumed_time"
"DIAG_SCOPE_JAVANAM004 -3 61017 4736 4737";"2017-11-17 18:08:41.962";"2017-11-17 18:08:45.296";"00:00:03.334"
```
Query interpretation:
```
The query returns 4 fields
Action: Name and parameters of the stored procedure/Command line

Start_date: Start date of the action

End_date: End date of the action
Consumed_time: Time consumed by the action
```
1. If you are able to identify the action(s) that takes the longest time from the above query then we can confirm that the issue are due to the performance of the action(s).
2. Else If you are not able to identify the action(s) that takes the longest time then the issue maybe due to the fact that the computation is stuck in a specific action, In order to identify the action compute the following on the local and central schema:
  DSS_HISTORY
```
SELECT description 
FROM   dss_history 
ORDER  BY history_id DESC 
LIMIT  1 
```
  Query result sample:
  DSS_HISTORY
```
"description";                                         "action_date";           "history_id"
"Start DIAG_SCOPE_JAVANAM004 -3 61017 4736 4737";"2017-11-17 14:40:32.405";32
```
  Query interpretation:
```
The query returns the last action in progress, it returns the description of the action, the action date, and the Id of the action.
```
3. Else if you are not able to identify that the issue is due to the performance issue of an action or to the fact that the computation is stuck then this can be due to the fact that there is a query that stuck without history in the schema, in order to identify the query, do the following:
  1. Get the active query:
    1. By using statistic table, for that run the following on an SQL session:
      SELECT application_name, client_addr, query_start, state, query_start, query FROM pg_stat_activity WHERE client_addr = inet '<IP Addresse of the machine hosting the triplet >' AND state != 'idle'
      Query result sample:
      "pgAdmin III - Query Tool";"192.168.17.61";"2018-03-08 10:06:11.594+00";"idle";"2018-03-06 16:45:15.814+00";"SELECT CASE WHEN typbasetype=0 THEN oid else typbasetype END AS basetype FROM pg_type WHERE oid=20"
      Query interpretation
      The query returns the application name, the address IP of the machine hosting the triplet, query state (active or not active), the query start date and the the query
    2. By using serve activity view:
      connect to the CSS server. Click Tools, click Server Status like below :
      Then check in the server status the function that are currently in progress:
  2. From the central schema, get the start date of the snapshot as follow:
    DSS_HISTORY
    SELECT compute_start_date FROM dss_snapshots WHERE snapshot_id = -1
    Qury result sample:
    DSS_HISTORY
    "2018-03-06 16:45:15.814+00"
    Query result interpretation:
    
    The query returns the start date of the snapshot.
  3. If the active query takes the largest time of the snapshot execution then you have identified the root cause of the issue. contact CAST Technical Support. with the following Relevant input
If you have identified that the issue occured in a command line, for example:
DSS_HISTORY
```
Start DssRun -CONNECT_LOCAL('CASTStorageService',''@//host:port/postgres'',''user'',''*****'',''local_schema'') -LOCAL_IFPUG_GRAPH                1/31/2018 11:27:49 AM  314184       
DssRun.exe - command line arguments: -CONNECT_LOCAL('CASTStorageService',''@//host:port/postgres'',''user'',''*****'',''local_schema'') -LOCAL_IFPUG_GRAPH  1/31/2018 11:27:49 AM  314185      
```
1. Place the command line in a batch file with the option, -ShowSQL and redirect the output to a text file, in our example the command line will be as follows:
  DSS_HISTORY
```
 DssRun.exe -CONNECT_LOCAL('CASTStorageService',''@//host:port/postgres'',''user'',''*****'',''local_schema'') -LOCAL_IFPUG_GRAPH -ShowSQL > output.txt
```
2. Retrieve the last query from the output file.
3. Identify the tables involved in the SQL queries
4. Compare the primary keys of those tables with the primary keys from a fresh schema, for this refer to SQL Queries - Common SQL Queries - How to retrieve the primary key of a table, if it's the case then add the primary key on the table or reinstall the schema by following the page component reinstall using the Server Manager Tool. Note that this issue was encountered for the table objdsc, for more information refer to CMS Snapshot Analysis - Compute Snapshot - Performance issue - Snapshot stuck while IFPUG graph computation in DssEngine
  If the above steps do not resolve your issue, then generate a DUMP folder of the executable corresponding to the command line by following the page What is a dump, and how do I create one?, then contact Cast Technical Support with the following Relevant input
If you have identified that the issue occurs in stored procedure, for example:
DSS_HISTORY
```
"description";                                         "action_date";           "history_id"
"Start DIAG_SCOPE_JAVANAM004 -3 61017 4736 4737";"2017-11-17 14:40:32.405";32
```
1. Check If the stored procedure corresponds to a detailed or a total procedure of a Quality Rule, for this refer to the page SQL Queries - CAST Knowledge Base - Queries on metrics and diagnostics - How to list all Quality Rules and their definition from the Metric Tree, the page lists the Quality rules and the corresponding detailed and total procedures. If we take the example of the procedure "DIAG_SCOPE_JAVANAM004" mentioned in this page and if we search for this procedure in the column "detailed_proc_name" after applying the page SQL Queries - CAST Knowledge Base - Queries on metrics and diagnostics - How to list all Quality Rules and their definition from the Metric Tree then the result will be:
  DSS_HISTORY
```
"JEE";140029;4736;"Method naming convention - case control";0;" ";0;"NONE";"DIAG_SCOPE_JAVANAM004";"DIAG_JAVA_ANA_METHODS_TOTAL"
```
  This procedure corresponds to the Quality rule "Method naming convention - case control"
  As a work around you can deactivate the Quality rule as follows Activating/Deactivating a Metric/Quality Rule then compute a snapshot by skipping analysis. Note that as an impact of this the quality rule will be missing on Dashboards (CED, AED and AAD).
  Even after applying the workaround if the problem is not solved contact Cast Technical Support with the following Relevant input, in order to detect the deep root cause and provide a better work around if there is any.
  We list the following known cases:
  1. CMS Snapshot Analysis - Compute Snapshot - Performance issue - Snapshot performance problem due to DIA_MANY_UNREFFUNCTION SP in KB
  2. CMS Snapshot Analysis - Compute Snapshot - Performance issue - Snapshot performance problem due to DIAG_ALL_ANA_XLSQL_TOTAL SP in KB
2. Else IIf the stored procedure do not corresponds to a detailed or a total procedure of a Quality Rule then check the following known cases:
  1. If the stored procedure is LOCAL_INIT_IFPUG, then refer to the following page CMS Snapshot Analysis - Compute Snapshot - Performance Issue - Stuck at Compute Metrics - LOCAL_INIT_IFPUG
  2. Else If the stored procedure is DSSAPP_FP_APP_DATAFUNC , then refer to the following page CMS Snapshot Analysis - Compute Snapshot - Performance Issue - SQL error Local ORA-01008 not all variables bound
  3. Else If the stored procedure is dss_update_any_module_links, then refer to the following page CMS Snapshot Analysis - Compute Snapshot - Performance Issue - Stuck while executing the procedure DSS_UPDATE_ANY_MODULE_LINKS
  4. Else if the stored procedure is DSSAPP_INIT_SQL , then refer to the following page CMS Snapshot Analysis - Compute Snapshot - Performance issue - Snapshot stuck while executing procedure DSSAPP_INIT_SQL
  5. Else if the stored procedure is ADGCQ_COUNT_ALL_VIOLATIONS, then refer to the following page CMS Snapshot Analysis - Compute Snapshot - Performance issue - Snapshot stuck while executing procedure ADGCQ_COUNT_ALL_VIOLATIONS
Make adjustments to PostgreSQL's memory settings to optimize its performance according to your specific environment. You may refer to How to tune PostgreSQL for memory
If the above steps do not solve your issue contact CAST Technical Support. with the following Relevant input

Relevant input

CAST Support Tool (CST) - alias Sherlock export with the following options CAST Bases Checker, Export Logs, Export Configuration files, Export Computer Environment, Export CAST Bases with Management Base.
The DUMP folder If you have generated one after following the action plan.

Notes/comments

Related Pages