This is an internal documentation. There is a good chance you’re looking for something else. See Disclaimer.

Unreliable Application

Warning

This document is work in progress. Expect to find errors.

Slowness

This information should be gathered first:

  • What is slow?

    • Does it only affect a certain entity or selection? Which one(s)?

  • Is it slow consistently or sporadically?

  • Has it been faster in the past.

  • If possible, exact steps to reproduce.

Connectivity Issues (Application not available)

TODO: thread/memory dumps, analyzing logs

  1. This information should be gathered first:

    • Who is expiriencing the issue? (one persion, serveral, all, only people at one location, etc.)

    • Where does the issue appear? (intranet, extranet, specific URL, etc.)

    • How does the issue materialize? (particular error message, applicaiton not available, etc.)

    • What network is affected? (school network, city network)

    • What client was used:

      • OS (incl. version)

      • Browser (incl. version)

      • Proxies

    • Is there a way to trigger the issue? Are there any steps that need to be taken to reproduce the issue?

  2. If the issue cannot be reproduced, ask the customer to create an HAR file.

  3. Then check if there have been restarts:

    This is often caused by out-of-memory errors.

    $ oc get pods
    NAME               READY     STATUS              RESTARTS   AGE
    nice-69-kvjlf      2/2       Running             3          7d

If you see restarts, check the log for an OutOfMemoryError before the restart:

$ oc logs -c nice nice-69-kvjlf | n2log-unscramble 'OutOfMemoryError'
Terminating due to java.lang.OutOfMemoryError: Java heap space

If not, check for Thread starvation messages:

$ oc logs -c nice nice-69-kvjlf | n2log-unscramble 'Thread starvation'
2020-05-12 06:22:58.983 WARN  com.zaxxer.hikari.pool.HikariPool [HikariPool-1 housekeeper]
HikariPool-1 - Thread starvation or clock leap detected (housekeeper delta=1m9s640ms987µs196ns)

Technical note:

Thread starvation often happens because the GC threads are using all the resources. This usually happens shortly before an OutOfMemoryError.

  1. Check for unusual activities in logs:

    $ oc logs -c nice nice-69-kvjlf | n2log-unscramble
    

    Warnings and errors only:

    $ oc logs -c nice nice-69-kvjlf | n2log-unscramble -l warn
    
  2. Check for unusual events:

    $ oc describe pod nice-69-kvjlf | grep -A 999 Events:$
    
    TODO: For what does one look?
    
    Note: Liveness and Readiness probe failure are expected during application start.
    

Failing or Slow Logins

TODO (detecting REST use without nice_auth cookie)

  • In <2.25, this is frequently caused by using the REST API without nice_auth cookie. Check for frequent logins:

    $ oc logs -c nice nice-69-kvjlf | n2log-unscramble AuthenticationHandler
    ====================================================================================================
    2020-05-20 12:28:26 INFO  - thread: qtp1544300373-17493, logger: nice2.userbase.DbAuthenticationHandler
    
    Successful Login: Principal[PK:5020, username:rretep@tocco.ch] Session[PK:460890] IP:38.175.164.17
    ====================================================================================================
    2020-05-20 12:39:32 INFO  - thread: qtp1544300373-17551, logger: nice2.userbase.DbAuthenticationHandler
    
    Successful Login: Principal[PK:4478, username:data-import] Session[PK:460891] IP:52.127.123.220
    ====================================================================================================
    …
    

    Look for a high number of logins done using a single login. Also, look for login indicating a non-human client like data-import in the above example. If this happens please inform the customer that the nice_auth cookie needs to be set acoording to documentation (section nice_auth).

    Should this prevent user from logging in, consider deactivating the login temporarily. Worst case, deactivate it via SQL:

    UPDATE nice_principal
      SET fk_principal_status = (SELECT pk FROM nice_principal_status WHERE unique_id = 'inactive')
      WHERE username = '${USERNAME}';