[Osaka/Yokohama/Tokushima] Looking for infrastructure/server side engineers!

[Osaka/Yokohama/Tokushima] Looking for infrastructure/server side engineers!

[Deployed by over 500 companies] AWS construction, operation, maintenance, and monitoring services

[Deployed by over 500 companies] AWS construction, operation, maintenance, and monitoring services

[Successor to CentOS] AlmaLinux OS server construction/migration service

[Successor to CentOS] AlmaLinux OS server construction/migration service

[For WordPress only] Cloud server “Web Speed”

[For WordPress only] Cloud server “Web Speed”

[Cheap] Website security automatic diagnosis “Quick Scanner”

[Cheap] Website security automatic diagnosis “Quick Scanner”

[Reservation system development] EDISONE customization development service

[Reservation system development] EDISONE customization development service

[Registration of 100 URLs is 0 yen] Website monitoring service “Appmill”

[Registration of 100 URLs is 0 yen] Website monitoring service “Appmill”

[Compatible with over 200 countries] Global eSIM “Beyond SIM”

[Compatible with over 200 countries] Global eSIM “Beyond SIM”

[If you are traveling, business trip, or stationed in China] Chinese SIM service “Choco SIM”

[If you are traveling, business trip, or stationed in China] Chinese SIM service “Choco SIM”

[Global exclusive service] Beyond's MSP in North America and China

[Global exclusive service] Beyond's MSP in North America and China

[YouTube] Beyond official channel “Biyomaru Channel”

[YouTube] Beyond official channel “Biyomaru Channel”

[Microsoft de:code 2019] We participated in de:code 2019!

Hello.

The other day, Beyond was invited by Microsoft to participate in de:code 2019!
I would like to share with you my impressions of the sessions I participated in and the de:code 2019 party.
This time, since I am an infrastructure engineer, I mainly watched sessions for infrastructure rather than developers.

keynote speech

We heard from the following people.
Mr. Takuya Hirano (Microsoft Japan Co., Ltd.)
Mr. Jared Spataro (Microsoft Corporation)
White (Microsoft Corporation)
Mr. Alex Kipman (Microsoft Corporation)
The keynote speech can be viewed on Youtube from the link below.
https://youtu.be/ GtVcDo1G8r8

Mr. Hirano mainly talks about AI, and is working on projects such as a device to measure the age of Ashura statues and a project to create a 3D visual model to repair Notre Dame Cathedral, which was destroyed by fire. That was it.
In addition, in order to expand the open source route, it seems that collaboration and joint development with RedHatOpenShift and VMware is progressing.

Jared Spataro talked about WindowsTerminal.
I would like to try using Windows Terminal and try connecting on both Linux and Windows.

Julia White gave a detailed explanation about the latest services and trends in Azure.
In particular, when he talked about Azure DevOps, he spoke in a really excited tone, mainly focusing on Azure Kubernetes.
Alex was demonstrating HoloLens2, and the virtual Alex next to him felt very real.
If I ever go there again, I would like to try out HoloLens2!

Participation session

Understanding the mechanism of Azure Kubernetes Service (AKS) ~Understanding the basics of Kubernetes from a developer's perspective~

Honestly, what is Kubernetes after all?
and have a deeper understanding of
what Kubernetes is and The explanation was very easy to understand and included important points to set, so I would like to review the material again...!
As you said at the end, in order to use Kubernetes, the most important thing is to first thoroughly understand how it works...I think you're right.

HashiCorp Terraform Azure Provider Tutorial

During the short 20-minute session, he explained Terraform, its benefits, and how convenient it is.
Our company's Teraoka likes Terraform, and it was a good review of what he had previously learned at an in-house study session.
He said that he initially found it troublesome to create configuration files with HCL, but now he says that building without Terraform is more troublesome.

Managed Kubernetes real production operation in ZOZOTOWN

That ZOZOTOWN.
Personally, I am always indebted to you.

ZOZOTOWN seems to have many on-premises servers, and as the service grows, the number of servers increases and the
number of servers that can withstand sales is constantly prepared, which causes a high operational load.

The reasons why I chose Kubernetes are:
(1) Flexible increase/decrease
of units ZOZOTOWN holds regular sales, so it is necessary to be able to scale flexibly. (
2) I want to use containers because the settings required to run the application are troublesome.

) Kubernetes can autoscale. The purpose of the adoption is to increase the availability of ZOZOTOWN by ensuring stable system operation, reducing operational load, and designing with a view to multi-cloud integration with microservices.

Apparently, System Replace has been running with 5 members since August 2017...and they are currently looking for members.

Regarding the operation of Kubernetes, it is true that there are many aspects that make operation easier, but there is a wide range of knowledge to remember, from low to high layers.

When asked why they chose Microsoft's Azure, it seems that the great Microsoft unified support was a big factor.
What appealed to them was that they could contact Microsoft as many times as they wanted, solve problems with Microsoft engineers, and discuss on-premises issues.
It's true that if you contact support and receive an immediate response, it's faster and more likely to be accurate than researching it yourself.

Log management and security in the DevSecOps era

The talk was about log management from a DevOps perspective.
DevOps requires managing three types of logs: development logs, operational logs, and audit logs.
I myself have no experience with DevOps, but log management is also extremely important for MSPs.
If you weren't keeping logs in the first place, you would only be able to check the information at the moment when a problem occurred, such as troubleshooting.

He taught us the key points of three types of logs: development logs, operation logs, and audit logs.

Development logs
- Decide what kind of logs you want according to the work performed by the developer
- In an automated test environment, there may be no need to design the development logs as they are included in the development environment
- How long the development logs are retained Although it does not have to be long-term, development logs can be kept for a certain period of time to analyze the way workers work
⇒ Even after development is completed, it is possible to continue learning and growing towards improving the efficiency of development. It is better to keep work records for

Operation log
- Design a "dashboard" and "self-service interface" that allow operators to make their own decisions and work
- Operators' work must also be recorded as a work log
- In operational work, "Changes"
⇒ It would be very helpful to have a dashboard
so that the work can be rolled back At first glance, you may be able to determine that this is the cause, and it may actually match the cause.
⇒I think it's important to keep a work log.
If something goes wrong as a result of that work, it may take a long time to switch back, or it may not be possible to switch back.
I think it's best to record the work details as a procedure in advance.

Audit logs
- Create log details so that auditors can create reports that can be used to make judgments
- In security audits, it is important to not only check that the data is "encrypted," but also to ensure that the data is being handled appropriately by authorized personnel.
・Prove the objectivity of logs by using remote journaling etc.

Practical NoOps - Will NoOps really change the way we work?

I was able to hear about how major companies have actually put it into practice from the perspective of NoOps (No Uncomfortable Ops) (eliminating the "unhappy" aspects of operation and maintenance).

`Things that don't make you happy'' in system operation and maintenance
1. Achieving system operation and maintenance that does not interfere with the user's experience
(downs due to failures, planned outages, performance degradation during load concentration, etc.)
2. Things that occur during system operation and maintenance
Optimization
of system operation and maintenance costs
(no surplus resources, appropriate quality, overtime work, human resource utilization, etc. )

Toil seems to mean "labor".
that is manual, repetitive, can be automated, has no tactical or long-term value, and is O(n) for service growth
is called toil.

He explained that a system is made up of "value" to the user and "burden" to the provider.
Indeed, as an operator,
ideally I would like to increase the ``value'' and minimize the ``burden''.
However, in reality, the ``value'' is small and the ``burden'' is large.

That's why NoOps is about reducing the ``unpleasant'' ``burden'' of operations.
At our company, I belong to a team that is working to reduce waste and streamline operations, so I was intrigued by the title.

NoOps seems to have "defense" and "offense".
Features of "defensive" NoOps
- Automation of monitoring notifications
- Automation of retries
- Automation of configuration changes
- Standardization of methods
- Visualization of status
(other SRE activities, etc.)

about "aggressive" NoOps features
: Containers
, Microservices
, Serverless.
Let's design a system that structurally does not require Ops.

Next is a story about a company that is actually working on NoOps.
Fuji Film Software Co., Ltd.
is working on NoOps with the system of
``IMAGE WORKS, a cloud service for managing and sharing photos and designs He spoke from the perspective of an engineer and a manager.

Engineer's perspective

■Before NoOps
- Want to reduce the release work load
The system can only be stopped late at night, and as the number of servers increases, the release work will also increase.
Someone's large-scale operations pose a risk to the entire service. Investigate who did what. It is also difficult to recover

■What do you want to do with NoOps

Use AppService and AzureFunction
to reduce the load of release work Automate build and deployment using AzureDevOps

- Want to reduce the amount of time required to respond to failures
- Disassemble services for each process instead of collecting functions in one AppService
- Place a standby machine in another region so that it can be switched at any time

■Good things about NoOps
- I want to reduce the burden of release work. I can
release at any time without stopping the service. I can
release to the production environment with a single button.

・I want to reduce the number of man-hours required to respond to a failure.Even
if an unexpected error occurs, the entire service will not stop.Since
the standby machine in another region operates, there is more time for recovery.

■Things that went wrong with NoOps
- I want to reduce the load on release work.Confirmation
is necessary as there is a 10% chance of
failure, so visual confirmation is essential as the status will be normal even if it fails.

・You want to reduce the number of man-hours required to deal with failures.The
more you separate processes, the more items you need to monitor and check.Developers
and operators must understand the process flow accordingly.

Being able to release things at any time has become very convenient, but on the
other hand, it's not all good.
It is true that the more items you have to check, the more time it will take.

Manager's perspective

Don’t rush the results of NoOps

At first, you said that the only thing you would do is move to a more expensive location.
Specifically,
before NoOps initiatives,
business costs (60%), maintenance costs (30%), improvement costs (10%),
initial
business costs (60%), maintenance costs (20%),
This includes
improvement costs (20%) The goal of NoOps
business costs (70%), maintenance costs (10%), and improvement costs (20%),
reducing maintenance costs and redirecting them to business costs.

It was said that it would not work to reduce the overall cost as shown below.
Business costs (40%), maintenance costs (10%), improvement costs (10%)

About the composition of the SRE team

- After all, things don't go as planned.The
required skills and mindset are different from traditional development/operation.It

's impossible to write a program that
requires everything from development to support .Function development is the star after all (Dev members) Opinion:
I often hear this, and it is said that it is difficult to aim for what the SRE book says.

- Convert or develop new ones?
Referring to the world's top soccer teams and the J League, he said that even if you only imitate the tactics of the top league, it will not work unless you match the players' abilities.
I see...that's a very easy to understand example.

It seems that team composition is still being explored.

Asahi Pro Management Co., Ltd.
ITOCHU Techno Solutions Co., Ltd.
What we did
- Fully managed from on-premises
- Automatic recovery architecture
- Autonomous operation

Even fully managed architectures seem to have operational challenges.
From the operational front
, "Customers' operations are becoming more and more automated, but our own operations are not automated..."
⇒ [Challenge] Let's leave it to RPA
⇒ [Result] Succeeded in automating routine tasks
⇒ [Challenge]
, which requires maintenance of the robot execution environment and often causes processing delays, but it seems to be very useful for routine tasks (Excel work, etc.).
NoOps is not just about configuring infrastructure.

"Adjust resources according to system usage status"
⇒ [Challenge] Realize scalable architecture
⇒ [Result] Reduce work during peak usage

NoOps is a scalable architecture that cannot be applied to systems in operation. I got the impression that it was normal.

summary

The article has become quite long.
It was such a rich event.
I also participated in the following sessions, all of which were interesting and filled with things I didn't know.
"How do you migrate that on-premises DB to Azure SQL Database? - Case study of Benesse Shinkenzemi -"
"His PDCA in the era of 100-year lifespan - Old and new PDCA methods that can be used for career and work style reform
~” “Azure Serverless for AWS Engineers”

A luxurious party was held at the end of the first day!
We enjoyed the lively party atmosphere while eating lots of food and sweets.
The DJ was really serious, and I thought it was really club music.

By the way, Keisuke Honda also came as a guest.
I originally followed you on Twitter, but I was a little moved to see the real thing.
It wasn't "Junichi Davidson." It's real.

There were only sessions that I was glad to attend.
I hope our juniors will come and try it again next year.

If you found this article helpful , please give it a like!
0
Loading...
0 votes, average: 0.00 / 10
376
X facebook Hatena Bookmark pocket
[2025.6.30 Amazon Linux 2 support ended] Amazon Linux server migration solution

[2025.6.30 Amazon Linux 2 support ended] Amazon Linux server migration solution

The person who wrote this article

About the author

Kenta Miyazaki

I joined Beyond in 2017 as a new graduate.

We provide 24-hour, 365-day operation, maintenance, and monitoring services for servers and clouds used by companies that primarily provide web-based services.
I belong to the System Solutions Department, and my job is to improve Beyond's operations so that our customers can focus on their business.

Certifications: AWS Certified Solutions Architect, GCP Professional Cloud Architect, Linuc1