[Osaka/Yokohama/Tokushima] Looking for infrastructure/server side engineers!

[Osaka/Yokohama/Tokushima] Looking for infrastructure/server side engineers!

[Deployed by over 500 companies] AWS construction, operation, maintenance, and monitoring services

[Deployed by over 500 companies] AWS construction, operation, maintenance, and monitoring services

[Successor to CentOS] AlmaLinux OS server construction/migration service

[Successor to CentOS] AlmaLinux OS server construction/migration service

[For WordPress only] Cloud server “Web Speed”

[For WordPress only] Cloud server “Web Speed”

[Cheap] Website security automatic diagnosis “Quick Scanner”

[Cheap] Website security automatic diagnosis “Quick Scanner”

[Reservation system development] EDISONE customization development service

[Reservation system development] EDISONE customization development service

[Registration of 100 URLs is 0 yen] Website monitoring service “Appmill”

[Registration of 100 URLs is 0 yen] Website monitoring service “Appmill”

[Compatible with over 200 countries] Global eSIM “Beyond SIM”

[Compatible with over 200 countries] Global eSIM “Beyond SIM”

[If you are traveling, business trip, or stationed in China] Chinese SIM service “Choco SIM”

[If you are traveling, business trip, or stationed in China] Chinese SIM service “Choco SIM”

[Global exclusive service] Beyond's MSP in North America and China

[Global exclusive service] Beyond's MSP in North America and China

[YouTube] Beyond official channel “Biyomaru Channel”

[YouTube] Beyond official channel “Biyomaru Channel”

Exponential Backoff And Jitter story

“Damn it, exponential backoff!!”
“What is that technique?”

A story that I don't understand is about to unfold...
Thank you for your hard work.
This is Matsuyama, also known as the Living Dead, from the System Development Department.

It's been a while since I last updated my blog, but
I didn't have the motivation to write some sample code and make something work, so
I thought I'd write a little bit about algorithms this time.

for example

Client ↔︎ If an error is returned from the server during communication between the server,
possible behavior on the client side is:

① Display an error message and cancel the process
② Execute the same communication again (retry)
③ Become unable to proceed

③ is out of the question since it's just a bug, but I think it's mostly ① or ②.
Especially in the case of a flow where the processing cannot be interrupted, I think you would expect recovery from ②.

retry

Let's think about retry for a moment.
The response was simply an error, so I'll try again.
In this case, the process of request → response (error) → retry will be repeated endlessly.
If a failure occurs on the server side for some reason,
all users will keep retrying (the situation is no different from a DoS attack),
the load on the server will suddenly increase, and the server will eventually go down
. This could lead to the worst situation.
It is a bitter memory that it actually developed...

The algorithm used to distribute such high loads
is the ``Exponential Backoff'' mentioned at the beginning.

exponential backoff

Directly translated,
・Exponential = Exponential
・Backoff = Backward
, which means that it is an algorithm that moves backwards (delays) the retry interval exponentially.

If an error occurs, retrying
1 second, 2 seconds, 4 seconds, 8 seconds, etc.
, thereby reducing the overall number of retries and achieving efficient retries. Masu.

It is also explained on AWS.
AWS explanation

jitter

However, a simple exponential function may result in many requests occurring at the same time.
(Users who access at the same time will have the same retry interval.)
Therefore, "jitter" is used in conjunction with exponential.
Directly translated
, jitter = time lag
.
By giving a width to the retry interval using a random value, it becomes possible to distribute simultaneous retries.


Detailed verification results are summarized in an article on the AWS Solution Architect blog

summary

In reality, you need to consider things like the upper limit on the number of retries and timeouts, but
this time I'll just introduce exponential backoff and jitter.

In the case of temporary errors or failures, I think that retrying often resolves the problem.

Please keep in mind if it takes some time to recover,
you can improve the situation without putting unnecessary load on it by doing efficient retries like the one introduced here I'm lucky.


, I used the one at the beginning
``exponential backoff seems like a special move.''

It's simple, but that's all for today's story.

If you found this article helpful , please give it a like!
14
Loading...
14 votes, average: 1.00 / 114
4,931
X facebook Hatena Bookmark pocket
[2025.6.30 Amazon Linux 2 support ended] Amazon Linux server migration solution

[2025.6.30 Amazon Linux 2 support ended] Amazon Linux server migration solution

The person who wrote this article

About the author

Kensho Matsuyama

For a long time, he worked at a game development company, working in programs and project management.
Joined Beyond Co., Ltd. in 2019.
He works in the Yokohama office. He is mainly responsible for project management of server-side development work.
(Sometimes programming) His hobbies are cycling (road racer) and watching horse races.