[Microsoft de:code 2019] I participated in de:code 2019!

Hello

The other day, Beyond was invited by Microsoft to attend de:code 2019!
I'd like to share my impressions of the sessions I attended and the de:code 2019 party.
Since I'm an infrastructure engineer, I mainly attended the infrastructure-oriented sessions rather than the developer-oriented ones.

keynote speech

We listened to the following speakers:
Takuya Hirano (Microsoft Japan Co., Ltd.)
, Jared Spataro (Microsoft Corporation
), Julia White (Microsoft Corporation),
and Alex Kipman (Microsoft Corporation).
You can watch the keynote speech on YouTube at the following link:
https://youtu.be/GtVcDo1G8r8

Mr. Hirano mainly talked about AI, and mentioned that he is currently working on a project to develop a device to measure the age of the Ashura statue, and to create a 3D visual model to repair the Notre Dame Cathedral after it was burned. He
also mentioned that he is expanding his open source approach by collaborating and developing jointly with Red Hat OpenShift and VMware.

Jared Spataro mentioned Windows Terminal.
I would like to try using Windows Terminal to connect to both Linux and Windows.

Julia White gave a detailed explanation of the latest Azure services and trends.
In particular, when talking about Azure DevOps, she spoke in a truly exciting tone, focusing on Azure Kubernetes.
Alex demonstrated the HoloLens 2, and the virtual Alex next to him felt very real.
If I go again, I would like to try out the HoloLens 2 myself!

Participating Sessions

Understanding Azure Kubernetes Service (AKS) - Understanding the basics of Kubernetes from a developer's perspective

To be honest, I was still confused about what Kubernetes was, but after listening to this session, I was able to
understand it better thanks to the explanation and demos of what it is and what it can do.
The explanation was very easy to understand, and the important points to set up were also mentioned, so I'm going to review the materials again!
At the end, you said that the most important thing to do in using Kubernetes is to first thoroughly understand how it works...and I think you're right.

HashiCorp Terraform Azure Provider Tutorial

During the short 20-minute session, he explained Terraform, its benefits, and how convenient it is.
Teraoka from our company likes Terraform, and this was a review of what he had learned at a previous in-house study session. He said that
at first, he found it troublesome to create configuration files in HCL, but now he says that it would be more troublesome to build without Terraform.

Managed Kubernetes in production at ZOZOTOWN

It's ZOZOTOWN, which
I'm personally always grateful for.

ZOZOTOWN appears to have many on-premise servers, and as the service grows, the number of servers increases, and the
need to always have enough servers on hand to handle sales events has led to a high operational load.

The reasons for adopting Kubernetes are:
1) flexibility in increasing or decreasing the number of units;
ZOZOTOWN holds regular sales, so it needs to be able to scale flexibly;
2) the application is difficult to set up, so they wanted to containerize it;
and 3) the ability to auto-scale
. The purpose of adopting Kubernetes is to increase the availability of ZOZOTOWN by designing it with a view to stable system operation, reducing operational load, and multi-cloudization that comes with the shift to microservices.

Apparently, the system replacement has been underway since August 2017 with five members, and they are currently still recruiting new members

Regarding the operation of Kubernetes, it is true that there are many aspects that make operation easier, but there is a wide range of knowledge that needs to be remembered, from low to high layers

When asked why they chose Microsoft Azure, it seems that the generous Microsoft Unified Support was a major factor. They
were attracted by the fact that they could contact support as many times as they wanted, have problems solved by Microsoft engineers, and even consult on-premise issues. It's
certainly true that if you can contact support and get a quick response, it's likely to be faster and more accurate than if you were to look into it yourself, which is a big attraction.

Log Management and Security in the DevSecOps Era

This was a talk about log management from a DevOps perspective.
In DevOps, you need to manage three logs: development logs, operation logs, and audit logs.
I don't have any experience with DevOps myself, but log management is also very important for MSPs. If you
don't keep logs in the first place, you can only check information about troubleshooting or the moment a problem occurs at that moment.

We were taught the key points of three logs: development logs, operation logs, and audit logs

Development logs
: Decide what kind of logs you want based on the work that developers do
. In an automated testing environment, the development log is included in the development environment, so designing it may not be necessary
. Development logs do not need to be stored for a long period of time, but they can be kept for a certain period of time to analyze the work style of workers.
⇒ Even after development is completed, it is a good idea to keep work records for learning and growth to improve development efficiency.

Operational logs
: Design a "dashboard" and "self-service interface" that allows operators to make decisions and take action themselves.
Operator actions must also be recorded as work logs
. Operational work is performed in accordance with the "change management" process, so design the work so that it can be rolled back.
⇒ A dashboard is extremely helpful.
At a glance, you can sometimes determine the cause and match it to the actual cause.
⇒ I think it's also important to keep a work log.
If a problem occurs as a result of that work, it may take a long time to revert, or it may not be possible to revert.
I think it's best to record the work details as procedures in advance.

Audit logs
- Create detailed logs so that auditors can make decisions based on the reports they create.
- In security audits, it is not enough to simply say that data is encrypted; it is also necessary to be able to explain that authorized personnel are handling the data appropriately.
- Utilize remote journaling, etc. to prove the objectivity of logs.

Practical NoOps - Will NoOps really change the way we work?

We heard from a major company about how they have actually put the NoOps (No Uncomfortable Ops) approach into practice

"Unpleasant" aspects of system operation and maintenance
1. Achieving system operation and maintenance that does not disrupt the user experience
(downtime during failures, planned shutdowns, performance degradation during concentrated loads, etc.)
2. Minimizing the "toil" that occurs in system operation and maintenance
(release procedures, patch application, resource monitoring, standby, etc.)
3. Optimizing system operation and maintenance costs
(not having excess resources, appropriate quality, overtime work, human resource utilization, etc.)

Toil seems to mean "labor." It
to manual work, repetitive work, work that can be automated, tactical work, work that has no long-term value, and
work that is O(n) relative to the growth of the service.

We were told that a system is made up of "value" for users and "burden" for providers.
Indeed, as an operator,
the ideal would be to increase "value" and reduce "burden."
However, in reality, "value" is often small and "burden" is often large.

So, NoOps is about reducing the unpleasant burden of operations.
I'm part of a team at our company that is working to reduce waste and streamline operations, so the title itself intrigued me.

NoOps seems to have both "defensive" and "offensive" aspects.
The characteristics of "defensive" NoOps
: automated monitoring notifications
, automated retries,
automated configuration changes
, standardized methods
, and status visualization
(as well as SRE activities).

The "offensive" characteristics of NoOps
- Containers
, Microservices
, Serverless -
are about designing a system that does not require Ops structurally.

Next, we hear from a company that is actually working on NoOps.
Fujifilm Software Co., Ltd.
is working on NoOps for their
photo and design management and sharing cloud service, IMAGE WORKS We asked them to talk from the perspectives of both an engineer and a manager.

From an engineer's perspective

■ Before NoOps
: We want to reduce the load of release work.
The system can only be stopped late at night. As the number of servers increases, the amount of release work also increases.
Mass operations by someone pose a risk to the entire service. It is difficult to investigate who did what and to recover.

■What do you want to do with NoOps?
・To reduce the burden of release work.
Use AppService and Azure Functions (this is where Azure comes in).
Automate build and deployment using Azure DevOps.

・We want to reduce the amount of work required to respond to failures
. ・Rather than consolidating functions into one AppService, we will separate the services into separate processes
. ・We will place a standby machine in another region so that we can switch over at any time.

■ The benefits of NoOps:
・I want to reduce the burden of release work. ・
Release at any time without stopping the service. ・
Release to the production environment with the push of a button.

・To reduce the man-hours required for responding to failures.
Even if an unexpected error occurs, the entire service will not stop.
Standby machines in other regions will be operational, allowing for ample time for recovery responses.

■ Bad things about NoOps
・I want to reduce the burden of release work.
There is a 10% chance of failure, so confirmation is necessary.
Even if it fails, the status will be normal, so visual confirmation is essential.

・To reduce the man-hours required to respond to failures
The more processes are divided, the more monitoring and checking items there are
Developers and operators must understand the process flow to the extent that it is divided

While it's very convenient to be able to release at any time, it
's not all good.
It's true that the more items you have to check, the more time it takes.

Manager's perspective

Don't rush NoOps results

You said that initially, it would just be a matter of moving areas where costs were incurred.
Specifically,
before NoOps was implemented
, business costs were 60%, maintenance costs were 30%, and improvement costs were 10%.
At the beginning of NoOps implementation
, business costs were 60%, maintenance costs were 20%, and improvement costs were 20%
.
The goal with NoOps is
business costs (70%), maintenance costs (10%), and improvement costs (20%),
which means reducing maintenance costs and shifting them to business costs.

It was found that it would not be possible to reduce the overall costs as follows:
Business costs (40%), Maintenance costs (10%), Improvement costs (10%).

SRE team organization

・After all, things don't always go as planned.
The skills and mindset required are different from traditional development/operations.
impossible to do everything from development to support.
You can't write programs (opinion of an Ops member)
. Feature development is still the highlight (opinion of a Dev member).
I often hear this, but it's difficult to follow the SRE book.

・Convert or develop new players?
He compared the world's top soccer teams to the J-League and said that simply imitating the tactics of the top leagues is no good unless the players' abilities match.
I see... that's a very easy-to-understand analogy.

The team composition is still being explored

Asahi Professional Management Co., Ltd.
Itochu Techno-Solutions Corporation
What we have done:
Fully managed from on-premise,
auto-recoverable architecture, autonomous
operation

Even with a fully managed architecture, operational challenges seem to exist.
From the operations front, we hear,
"Customers' work is increasingly being automated, but ours isn't..."
⇒ [Challenge] Let's try RPA
⇒ [Result] Successfully automate routine tasks
⇒ [Challenge] Maintenance of the robot execution environment is required, and processing delays often occur.
I knew very little about RPA, but it seems to be playing a major role in routine tasks (such as Excel work).
So NoOps isn't just about tweaking infrastructure configuration.

"Adjust resources according to system usage"
⇒ [Challenge] Realize a scalable architecture
⇒ [Result] Reduce workload during peak usage

I got the impression that scalable architecture that cannot be applied to running systems is commonplace

summary

This article ended up being quite long.
That's how rich the event was.
I also attended the following sessions, all of which were interesting and filled with new information.
"How do you migrate your on-premise database to Azure SQL Database? - Benesse Shinken Zemi Case Study"
"PDCA in the Era of 100-Year Life - An old and new way to use PDCA for career and work style reform"
"Azure Serverless for AWS Engineers"

At the end of the first day, a luxurious party was held!
We all enjoyed the lively party atmosphere while eating lots of food and sweets.
The DJ was professional, and the music was really club-like, making me feel like I was in a club.

By the way, the famous "Keisuke Honda" was also there as a guest.
I was already following him on Twitter, so I was a little moved to see the real thing.
It wasn't "Junichi Davidson." It was the real thing.

There were only sessions that I'm glad I attended.
I would like my junior colleagues to also attend next year.

If you found this article useful, please click [Like]!
0
Loading...
0 votes, average: 0.00 / 10
544
X Facebook Hatena Bookmark pocket

The person who wrote this article

About the author

Kenta Miyazaki

I joined Beyond as a new graduate in 2017

We provide 24/7 operation, maintenance, and monitoring services for servers and clouds used by companies that primarily provide web-based services. I
work in the System Solutions Department, with the goal of improving Beyond's operations so that our customers can focus on their business.

Certifications: AWS Certified Solutions Architect, GCP Professional Cloud Architect, Linuc1