*Purpose

This page will help you to troubleshoot the performance problems, you may face in Compute snapshot step

Performance issue means either the compute snapshot is stuck or takes more time than expected to complete.

For more information, refer to:

Official documentation 

*Observed in CAST AIP

Release

Yes/No

8.3.x (tick) 
8.2.x (tick) 
8.1.x (tick) 
8.0.x (tick)
7.3.x(tick)
*Observed on RDBMS

RDBMS

Yes/No

Oracle Server(tick)
Microsoft SQL Server(tick)
CSS3(tick)
CSS2(tick)
CSS1(tick)
Step by Step scenario
  1. Compute snapshot
Action Plan
  1. Check if the minimum system configuration requirements for deploying CAST AIP is followed, for this refer to Minimum system requirements, if it's not the case then configure the system as required by CAST, then compute snapshot by skipping analysis
  2. If the schemas are hosted on an Oracle Server, CAST recommends to force the use of the ALL_ROWS optimizer mode instead of the default RULE mode (RULE mode is the default configuration), to do so refer to Performance issues when using an Oracle Server host for the CAST AIP schemas

  3. Check if the user character Sets are supported, for that compare the character Sets of your RDBMS by following the page RDBMS Information - Common to all RDBMS - Get the character set with the ones explained in the official documentation Supported Character Sets
  4. Check if the server is configured according to CAST recommendation, for this perform the following
    1. Run the following query on a Postgres session:

      SELECT NAME, 
             setting 
      FROM   pg_settings 

      As a query result sample, refer to PG_param.sql

    2. Compare the server configuration retrieved from the previous step with the server configuration recommended by CAST: Server configuration
      If the configurations are not identical then change the parameter from postgresql.conf (the file is in the postgresql directory\db_data) as recommended by CAST, restart PgAdmin Service, then compute a snapshot by skiping analysis.
    The issue can be due to the change in work_mem parameter.
  5. Optimize the local and central schema by following the page CSS Tools - Optimize - Updating the statistics - using Cssoptimize exe on a schema then compute a snapshot by skipping analysis.
  6. Vacuum Full the tables of local and central schema by following the the page Tools - How to VACUUM FULL tables of a schema then compute a snapshot by skipping analysis.
  7. Check if there are indexes that have been dropped in the local and central:
    1. Running the query below can provide a list of tables where index scans were not triggered. If this list includes large tables, it suggests that the tables may be missing indexes.

      DSS_HISTORY
        select relname, n_live_tup
        from pg_stat_all_tables
        where schemaname = <>
        and idx_scan=0
        order by 2 desc 

      Query result sample:

        | keys | 1234567890 |
      

      Query interpretation:

      The query returns 2 fields
      relname: table name
      n_live_tup: number of rows
    2. To verify missing indexes, compare the index counts between the current local and central schemas with a newly created local and central schema. For more details, refer to the  SQL Queries - Common SQL Queries - How to retrieve the count of indexes used for a schema. If any indexes are missing, perform a component reinstall using the Server Manager Tool. 
  8. You can also find the procedure which is taking long time for the execution using the query mentioned in step-10 and get the involved table names in that particular procedure. Now check in your schema if those tables are missing any indexes. You can easily check in pgadmin.
  9. Check the usage of CPU on RDBMS Server and analysis machine, for that you can use the Task manager:

    If the usage of CPU has reached 100% then manage the processes that are currently in progress in order to allocate enough space for CAST AIP.

  10. At this stage we need to check if the computation gets stuck in a specific stored procedure/command line or there are one or several stored procedure/command lines that is consuming long time. For this we start by getting the time processed by stored procedure/command line by having a Start status and an End Status:

    DSS_HISTORY
    SELECT Substr(START_HIST.description, 7)                 AS Action, 
           START_HIST.action_date                            Start_date, 
           END_HIST.action_date                              end_date, 
           ( END_HIST.action_date - START_HIST.action_date ) AS consumed_time 
    FROM   dss_history START_HIST, 
           dss_history END_HIST 
    WHERE  START_HIST.history_id + 1 = END_HIST.history_id 
           AND START_HIST.description LIKE 'Start%' 
           AND END_HIST.description LIKE 'End%' 
    ORDER  BY ( END_HIST.action_date - START_HIST.action_date ) DESC 

    Query result sample:

    DSS_HISTORY
    "action";                                          "start_date";             "end_date";               "consumed_time"
    "DIAG_SCOPE_JAVANAM004 -3 61017 4736 4737";"2017-11-17 18:08:41.962";"2017-11-17 18:08:45.296";"00:00:03.334"

    Query interpretation:

    The query returns 4 fields
    Action: Name and parameters of the stored procedure/Command line
    
    Start_date: Start date of the action
    
    End_date: End date of the action
    Consumed_time: Time consumed by the action


    1. If you are able to identify the action(s) that takes the longest time from the above query then we can confirm that the issue are due to the performance of the action(s).
    2. Else If you are not able to identify the action(s) that takes the longest time then the issue maybe due to the fact that the computation is stuck in a specific action, In order to identify the action compute the following on the local and central schema:

      DSS_HISTORY
      SELECT description 
      FROM   dss_history 
      ORDER  BY history_id DESC 
      LIMIT  1 

      Query result sample:

      DSS_HISTORY
      "description";                                         "action_date";           "history_id"
      "Start DIAG_SCOPE_JAVANAM004 -3 61017 4736 4737";"2017-11-17 14:40:32.405";32

      Query interpretation:

      The query returns the last action in progress, it returns the description of the action, the action date, and the Id of the action.
    3. Else if you are not able to identify that the issue is due to the performance issue of an action or to the fact that the computation is stuck then this can be due to the fact that there is a query that stuck without history in the schema, in order to identify the query, do the following:

      1. Get the active query:

        1. By using statistic table, for that run the following on an SQL session:

          SELECT application_name, 
                 client_addr, 
                 query_start, 
                 state, 
          	   query_start,
                 query 
          FROM   pg_stat_activity 
          WHERE  client_addr = inet '<IP Addresse of the machine hosting the triplet >' 
          AND    state != 'idle'

          Query result sample:

          "pgAdmin III - Query Tool";"192.168.17.61";"2018-03-08 10:06:11.594+00";"idle";"2018-03-06 16:45:15.814+00";"SELECT CASE WHEN typbasetype=0 THEN oid else typbasetype END AS basetype FROM pg_type WHERE oid=20"

          Query interpretation

          The query returns the application name, the address IP of the machine hosting the triplet, query state (active or not active), the query start date and the the query

        2. By using serve activity view:
          connect to the CSS server. Click Tools, click Server Status like below :

          Then check in the server status the function that are currently in progress:

      2. From the central schema, get the start date of the snapshot as follow:

        DSS_HISTORY
        SELECT compute_start_date 
        FROM   dss_snapshots 
        WHERE  snapshot_id = -1   

         Qury result sample:

        DSS_HISTORY
        "2018-03-06 16:45:15.814+00"

        Query result interpretation:

         

        The query returns the start date of the snapshot.

      3. If the active query takes the largest time of the snapshot execution then you have identified the root cause of the issue. contact CAST Technical Support. with the following Relevant input

  11. If you have identified that the issue occured in a command line, for example:

    DSS_HISTORY
    Start DssRun -CONNECT_LOCAL('CASTStorageService',''@//host:port/postgres'',''user'',''*****'',''local_schema'') -LOCAL_IFPUG_GRAPH                1/31/2018 11:27:49 AM  314184       
    DssRun.exe - command line arguments: -CONNECT_LOCAL('CASTStorageService',''@//host:port/postgres'',''user'',''*****'',''local_schema'') -LOCAL_IFPUG_GRAPH  1/31/2018 11:27:49 AM  314185      
    1.  Place the command line in a batch file with the option, -ShowSQL and redirect the output to a text file, in our example the command line will be as follows:

      DSS_HISTORY
       DssRun.exe -CONNECT_LOCAL('CASTStorageService',''@//host:port/postgres'',''user'',''*****'',''local_schema'') -LOCAL_IFPUG_GRAPH -ShowSQL > output.txt
    2. Retrieve the last query from the output file.

    3. Identify the tables involved in the SQL queries
    4. Compare the primary keys of those tables with the primary keys from a fresh schema, for this refer to SQL Queries - Common SQL Queries - How to retrieve the primary key of a table, if it's the case then add the primary key on the table or reinstall the schema by following the page component reinstall using the Server Manager Tool. Note that this issue was encountered for the table objdsc, for more information refer to CMS Snapshot Analysis - Compute Snapshot - Performance issue - Snapshot stuck while IFPUG graph computation in DssEngine

      If the above steps do not resolve your issue, then generate a DUMP folder of the executable corresponding to the command line by following the page What is a dump, and how do I create one?, then contact Cast Technical Support with the following Relevant input

  12. If you have identified that the issue occurs in stored procedure, for example:

    DSS_HISTORY
    "description";                                         "action_date";           "history_id"
    "Start DIAG_SCOPE_JAVANAM004 -3 61017 4736 4737";"2017-11-17 14:40:32.405";32
    1. Check If the stored procedure corresponds to a detailed or a total procedure of a Quality Rule, for this refer to the page SQL Queries - CAST Knowledge Base - Queries on metrics and diagnostics - How to list all Quality Rules and their definition from the Metric Tree, the page lists the Quality rules and the corresponding detailed and total procedures. If we take the example of the procedure "DIAG_SCOPE_JAVANAM004" mentioned in this page and if we search for this procedure in the column "detailed_proc_name" after applying the page SQL Queries - CAST Knowledge Base - Queries on metrics and diagnostics - How to list all Quality Rules and their definition from the Metric Tree then the result will be:

      DSS_HISTORY
      "JEE";140029;4736;"Method naming convention - case control";0;" ";0;"NONE";"DIAG_SCOPE_JAVANAM004";"DIAG_JAVA_ANA_METHODS_TOTAL"

      This procedure corresponds to the Quality rule "Method naming convention - case control"
      As a work around you can deactivate the Quality rule as follows Activating/Deactivating a Metric/Quality Rule then compute a snapshot by skipping analysis. Note that as an impact of this the quality rule will be missing on Dashboards (CED, AED and AAD).
      Even after applying the workaround if the problem is not solved contact Cast Technical Support with the following Relevant input,  in order to detect the deep root cause and provide a better work around if there is any.

      We list the following known cases:

      1. CMS Snapshot Analysis - Compute Snapshot - Performance issue - Snapshot performance problem due to DIA_MANY_UNREFFUNCTION SP in KB
      2. CMS Snapshot Analysis - Compute Snapshot - Performance issue - Snapshot performance problem due to DIAG_ALL_ANA_XLSQL_TOTAL SP in KB
    2. Else IIf the stored procedure do not corresponds to a detailed or a total procedure of a Quality Rule then check the following known cases:

      1. If the stored procedure is LOCAL_INIT_IFPUG, then refer to the following page  CMS Snapshot Analysis - Compute Snapshot - Performance Issue - Stuck at Compute Metrics - LOCAL_INIT_IFPUG

      2. Else If the stored procedure is DSSAPP_FP_APP_DATAFUNC , then refer to the following page CMS Snapshot Analysis - Compute Snapshot - Performance Issue - SQL error Local ORA-01008 not all variables bound

      3. Else If the stored procedure is  dss_update_any_module_links, then refer to the following page CMS Snapshot Analysis - Compute Snapshot - Performance Issue - Stuck while executing the procedure DSS_UPDATE_ANY_MODULE_LINKS

      4. Else if the stored procedure is DSSAPP_INIT_SQL , then refer to the following page CMS Snapshot Analysis - Compute Snapshot - Performance issue - Snapshot stuck while executing procedure DSSAPP_INIT_SQL

      5. Else if the stored procedure is ADGCQ_COUNT_ALL_VIOLATIONS, then refer to the following page CMS Snapshot Analysis - Compute Snapshot - Performance issue - Snapshot stuck while executing procedure ADGCQ_COUNT_ALL_VIOLATIONS
  13. Make adjustments to PostgreSQL's memory settings to optimize its performance according to your specific environment. You may refer to How to tune PostgreSQL for memory
  14. If the above steps do not solve your issue contact CAST Technical Support. with the following Relevant input

Relevant input

  • CAST Support Tool (CST) - alias Sherlock export with the following options CAST Bases Checker, Export Logs, Export Configuration files, Export Computer Environment, Export CAST Bases with Management Base.
  • The DUMP folder If you have generated one after following the action plan. 
Notes/comments


Related Pages