Details
Description
We are running following task to recompute users on 2-node cluster with cordinator/worker tasks.
<task xmlns:apti="http://midpoint.evolveum.com/xml/ns/public/common/api-types-3" xmlns:c="http://midpoint.evolveum.com/xml/ns/public/common/common-3" xmlns:gen159="http://prism.evolveum.com/xml/ns/public/debug" xmlns:icfs="http://midpoint.evolveum.com/xml/ns/public/connector/icf-1/resource-schema-3" xmlns:q="http://prism.evolveum.com/xml/ns/public/query-3" xmlns:ri="http://midpoint.evolveum.com/xml/ns/public/resource/instance-3" xmlns:t="http://prism.evolveum.com/xml/ns/public/types-3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" oid="d05baf59-0534-498b-a5d7-0033cad8b2bf" xmlns="http://midpoint.evolveum.com/xml/ns/public/common/common-3"> <name>Recompute Users Buckets</name> <extension xmlns:mext="http://midpoint.evolveum.com/xml/ns/public/model/extension-3" xsi:type="c:ExtensionType"> <mext:objectType>c:UserType</mext:objectType> <mext:objectQuery> <q:filter> <q:not> <q:inOid> <q:value>00000000-0000-0000-0000-000000000002</q:value> <!-- filter out administrator --> </q:inOid> </q:not> </q:filter> </mext:objectQuery> </extension> <ownerRef oid="00000000-0000-0000-0000-000000000002" type="c:UserType" /> <category>Recomputation</category> <executionStatus>suspended</executionStatus> <binding>loose</binding> <recurrence>recurring</recurrence> <schedule> <cronLikePattern>0 0 0 * * ?</cronLikePattern> <misfireAction>executeImmediately</misfireAction> </schedule> <threadStopAction>restart</threadStopAction> <handlerUri>http://midpoint.evolveum.com/xml/ns/public/task/workers-creation/handler-3</handlerUri> <workManagement> <taskKind>coordinator</taskKind> <buckets> <oidSegmentation> <depth>2</depth> </oidSegmentation> </buckets> <workers> <handlerUri>http://midpoint.evolveum.com/xml/ns/public/model/synchronization/task/recompute/handler-3</handlerUri> <workersPerNode> <count>8</count> </workersPerNode> </workers> </workManagement> </task>
There are some problems around:
- Paralelization works nicely, but cordinator task usually gets to 99% fast and then lingers for hours before being completed
- The number of users processed in each worker summed together is way above total number of users in the system (67K)
- Usually at least one worker gets stalled
- (Coordinator task header should report progress in %, instead of 1/1)
Logs provided privately.