Exponential Backoff and Jitter

table of contents
"Take this, Exponential Backoff!!"
"What kind of move is that!?"
It seems like a rather confusing story is about to unfold... but
thank you for your hard work.
This is Matsuyama, the living corpse from the Systems Development Department.
I thought I'd update my blog after a long time, but
I didn't have the energy to write any sample code and create something that works, so
this time I'll write about a little bit of algorithms.
for example
When an error is returned from the server during communication between the client and the server,
what are some possible behaviors on the client side?
① Displays an error message and cancels the process.
② Attempts to perform the same communication again (retry).
③ Becomes unresponsive.
③ is simply a bug and therefore out of the question, but I think most cases will be either ① or ②.
In particular, in the case of a flow where processing cannot be interrupted, you will likely be hoping for recovery through ②.
retry
Let's think a bit about retries.
retry because the response was an error
If we simply
If the server experiences a failure for any reason,
all users will continue retrying (essentially a DoS attack), potentially leading to
server crash
and even a
I have bitter memories of this actually happening…
Therefore, the algorithm used to distribute such high loads
is the "Exponential Backoff" algorithm mentioned at the beginning.
Exponential Backoff
A direct translation would be:
exponential = exponential function
, backoff = backward
. This algorithm exponentially delays (backs down) the retry interval.
If an error occurs, the retry
1 second, 2 seconds, 4 seconds, 8 seconds, and so on
, thereby reducing the overall number of retries and achieving efficient retrying.
AWS also provides explanations.
AWS explanation
jitter
However, a simple exponential function can result in many requests occurring simultaneously
(because users accessing at the same time will have the same retry interval)
. This is where "jitter" comes in, used in conjunction with exponential functions.
Literally translated
a time lag
seems to mean
The idea is to distribute simultaneous retries by introducing a range of random values into the retry interval.
an article on the AWS Solution Architect blog
A detailed analysis of the findings can be found in
summary
In reality, we also need to consider limits on the number of retries and timeouts, but
for now, we will only introduce exponential backoff and jitter.
In the case of temporary errors or failures, retries often resolve the issue.
When recovery takes some time, instead of performing continuous retries,
that using the efficient retries described here can improve the situation without placing unnecessary load on the
remember
By the way,
"Exponential backoff sounds like a special move, doesn't it?"
I used the opening line as a little joke because I had heard someone say,
It's simple, but that's all for now
16
