Shall you have a WebLogic domain running on a K8s cluster that, for some reason, is not starting properly and what is really happening is that the liveness probe is fencing the pods on and on.
After taking a look to the logs, the root problem was the different lock file mechanisms that WebLogic uses for several checks. Probably the situation was originated by an improperly shutdown operation.
This is the way I tried to solve it in a hurry.
Stop the domain
kubectl edit domain <yourdomainname> -n <yournamespace>
Get into the shared file system
ssh to a machine that has the shared files system mounted (the persistent volume in which the WebLogic domain is stored)
Execute the following
WORKINGDIR=<your domain dir> rm $WORKINGDIR/servers/*/tmp/*.lok rm $WORKINGDIR/servers/*/data/ldap/ldapfiles/*.lok rm $WORKINGDIR/servers/*/data/nodemanager/*.lck rm $WORKINGDIR/servers/*/data/store/default/*.DAT rm $WORKINGDIR/servers/*/data/store/diagnostics/*.DAT
Start the WebLogic domain again
Edit the domain resource and put IF_NEEDED in the serverStartPolicy.
Comments & disclaimer
The recipe here deletes several lock and data files that WebLogic utilises. Probably not all them should be deleted depending on the use case, anyway what this recipe does is a complete reset. If you don’t know what you are doing you’d better open a Service Request in My Oracle Support.
That’s all folks, hope it helps and stay safe! -:)