Troubleshooting in a hurry a WebLogic domain running on Kubernetes over persistent volume that is continuously bouncing and not starting properly


Problem:

Shall you have a WebLogic domain running on a K8s cluster that, for some reason, is not starting properly and what is really happening is that the liveness probe is fencing the pods on and on.

Diagnosis:

After taking a look to the logs, the root problem was the different lock file mechanisms that WebLogic uses for several checks. Probably the situation was originated by an improperly shutdown operation.

Solution:

This is the way I tried to solve it in a hurry.

Stop the domain

kubectl edit domain <yourdomainname> -n <yournamespace>

Get into the shared file system

ssh to a machine that has the shared files system mounted (the persistent volume in which the WebLogic domain is stored)

Execute the following

WORKINGDIR=<your domain dir>
rm $WORKINGDIR/servers/*/tmp/*.lok
rm $WORKINGDIR/servers/*/data/ldap/ldapfiles/*.lok
rm $WORKINGDIR/servers/*/data/nodemanager/*.lck
rm $WORKINGDIR/servers/*/data/store/default/*.DAT
rm $WORKINGDIR/servers/*/data/store/diagnostics/*.DAT

Start the WebLogic domain again

Edit the domain resource and put IF_NEEDED in the serverStartPolicy.

Comments & disclaimer

The recipe here deletes several lock and data files that WebLogic utilises. Probably not all them should be deleted depending on the use case, anyway what this recipe does is a complete reset. If you don’t know what you are doing you’d better open a Service Request in My Oracle Support.

That’s all folks, hope it helps and stay safe! -:)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.