Uploaded image for project: 'MidPoint'
  1. MidPoint
  2. MID-5904

Cache Invalidation causes Exceptions during (Docker) Node startup

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 4.0.1
    • Fix Version/s: 4.1, 4.0.2
    • Component/s: Repository
    • Labels:
      None
    • Environment:

      3 Dockerized Nodes with the same base-image.

    • Subscription:
      Active subscription
    • Milestone:
      M2

      Description

      Based on how docker works it creates a new Hostname whenever a node terminates and a new one spins up. In our docker environment it is not possible to specify static nodeIDs to all containers, as the number of replicas is not static. Only one "Debug" instance has a fixed ID.

      The containers reboot/recreate themselves whenever the healthcheck url is dead or during the weekly maintenance.

      This leads to two major problems:

      1) The "Nodes" list under Server Tasks gets polluted with "Turned Off" nodes that will never come back alive (I once had 1600! Nodes in the list)

      2) Once the Number of dead nodes starts to increase midpoint gets difficulties to start because of hundreds of networking related exceptions (see attached log snippet).

      As far as I understand midpoint tries to invalidate the objects on all other nodes after they were imported. Which obviously will fail for many of them. This leads to an increased startup time. Which will eventually exceed the startup timeout of 3 minutes. Docker then kills the container (but it is already registered as node). The next starting container then has one more dead node to check and will exceed the startup timeout again.. there we go: circle of death!

      I need do wipe the m_node table in the database then.

       

      The situation gets even worse when multiple node startup in parallel (eg. after automatic server maintenance) The nodes see each other before they can accept those cache invalidation calls. I disabled the healthcheck once but anyways the startup never completed because of this deadlock..

       

      Maybe you can improve the handling of docker nodes by adding a cleanup feature for old nodes and have a look at the exceptions.

       

      Best regards,

      Martin

       

       

       

        Attachments

          Activity

            People

            • Assignee:
              hoffm_ma Martin Hoffmann
              Reporter:
              hoffm_ma Martin Hoffmann
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: