Eureka’s self-preservation mode can be a lifesaver for your peers and microservices, so let’s dig into the rationale and math to see how it works.
When it comes to Eureka, self-preservation could be defined as “Eureka servers stop expiring clients from the registry when they do not receive heartbeats (from peers and client microservices) beyond a certain threshold.”
Let’s take a look at the diagram below.
Suppose EC2 used to invoke EC4 after discovering it from a Eureka registry. Due to a network partition, EC4 and EC5 lost connectivity with the servers. Suppose as per our threshold configuration, after two of the clients go down, Eureka servers enter self-preservation mode. From then onward, Eureka servers stop expiring the registry — EC3 has gone down, but it’s still reflected in the registry.
Servers are not receiving heartbeats could be due to a poor network partition (i.e. does not necessarily mean the clients are down) .
Even though the connectivity is lost between servers and a client, clients might have connectivity with each other
Suppose:
The number of registered application instances at some point in time = N
The configured heartbeat threshold (to turn on self-preservation) = 0.85 (default)
Number of heartbeats expected from one client instance/min
2
Number of heartbeats expected from N instances/min
2 * N
Minimum expected heartbeat threshold / min
2 * N * 0.85
Since N is a variable, 2 * N * 0.85 is calculated in every 15 minutes (default) by a scheduler.
Self-preservation incorrectly assumes few down microservice instances to be a poor network partition.
Self-preservation never expires, until and unless the down microservices are brought back (or the network glitch is resolved) .
With self-preservation mode on, we cannot fine-tune the client heartbeat interval, since self-preservation assumes heartbeats are received at intervals of 30 seconds.
Unless these kinds of network glitches are common in your environment, self-preservation can be turned-off (even though most people recommend to keep it on) .