Exponential Backoff and Jitter

"Take that, Exponential Backoff!!"
"What, what technique is that!?"

looks like a story that's hard to understand is about to unfold... but
thank you for your hard work.
This is Matsuyama, the Living Corpse, from the Systems Development Department.

I thought I'd update my blog for the first time in a while, but
I didn't have the energy to write any sample code and make something that works, so
this time I'll write about a little algorithm.

for example

When an error is returned from the server during communication between the client and the server,
the possible behaviors on the client side are:

① Display an error message and cancel the process
② Execute the same communication again (retry)
③ The process becomes impossible

③ is just a bug, so it's out of the question, but I think most cases will be ① or ②.
In particular, in the case of a flow where processing cannot be interrupted, I think you will hope for recovery by ②.

retry

Let's think about retries for a moment.

If we simply
retry because the response was an error If there is a problem on the server side for some reason,
all users will keep retrying, which could
the server load rises sharply (a situation no different from a DoS attack) and the server eventually goes down
.
I have bitter memories of this actually happening to me...

The algorithm used to distribute such high loads
is the "exponential backoff" mentioned at the beginning.

Exponential Backoff

Literally,
exponential
backoff
is an algorithm that exponentially backs off (delays) the retry interval.

If an error occurs, the retry
1 second, 2 seconds, 4 seconds, 8 seconds, etc.
, reducing the overall number of retries and achieving efficient retries.

AWS also explains it.
AWS explanation

jitter

However, a simple exponential function can result in many simultaneous requests
(because users who access at the same time will have the same retry interval)
. This is why "jitter" is used in conjunction with exponential.
Literally
, jitter = time lag
.
By using a random value to give a range to the retry interval, you can spread out simultaneous retries.


A detailed summary of the verification results is available in a post on the AWS Solution Architect blog

summary

In reality, you will need to consider things like the maximum number of retries and timeouts, but
this time we will only introduce exponential backoff and jitter.

In the case of temporary errors or failures, retrying can often resolve the issue.
If it will take some time to recover,
that by performing efficient retries as introduced here
on the system.

By the way,

I used the opening line as a little joke to grab the attention of the audience, because I had heard someone say, "Exponential backoff sounds like a special move."

It's simple, but that's all for now

If you found this article useful, please click [Like]!
16
Loading...
16 votes, average: 1.00 / 116
6,527
X Facebook Hatena Bookmark pocket

The person who wrote this article

About the author

Matsuyama Kensho

He has long worked in programming and project management at a game development company.
In 2019, he joined Beyond Inc. and works in the Yokohama office.
He is primarily responsible for project management of server-side development work (occasionally programming).
His hobbies are cycling (road racing) and watching horse racing.