[Microsoft de:code 2019] I participated in de:code 2019!

2019.07.17

others

table of contents

0.1 keynote speech

1 Participating Sessions

Hello

Recently, Beyond was invited by Microsoft to attend de:code 2019!
I'd like to share my impressions of the sessions I attended and what the de:code 2019 party was like.
Since I'm an infrastructure engineer, I mainly attended sessions geared towards infrastructure rather than developers.

keynote speech

We heard from the following speakers:
Takuya Hirano (Microsoft Japan Co., Ltd.)
, Jared Spataro (Microsoft Corporation
), Julia White (Microsoft Corporation),
and Alex Kipman (Microsoft Corporation).
The keynote speeches can be viewed on YouTube at the following link:
https://youtu.be/GtVcDo1G8r8

Mr. Hirano spoke mainly about AI, mentioning projects such as a device to measure the age of the Ashura statue and a project to create a 3D visual model to restore Notre Dame Cathedral after it was damaged by fire. He
also mentioned expanding the open-source approach, collaborating and jointly developing with Red Hat OpenShift and VMware.

Jared Spatarou mentioned Windows Terminal.
I'd like to try using Windows Terminal and test connecting to both Linux and Windows.

Julia White gave a detailed explanation of the latest Azure services and trends.
In particular, her talk about Azure DevOps, focusing on Azure Kubernetes, was truly exciting.
Alex gave a HoloLens 2 demo, and the virtual Alex next to him felt incredibly real.
If I ever go again, I'd love to try HoloLens 2 myself!

Participating Sessions

Understanding Azure Kubernetes Service (AKS) - Understanding the basics of Kubernetes from a developer's perspective

To be honest, even someone like me who didn't really know what Kubernetes was, was
able to deepen my understanding by listening to this session, where they explained what Kubernetes is and what it can do with demonstrations.
The explanation was very easy to understand, and they even included points to configure, so I would like to review the materials again!
At the end, they said that the most important thing to do when using Kubernetes is to first thoroughly understand how it works, and I thought that was absolutely right.

HashiCorp Terraform Azure Provider Tutorial

In a short 20-minute session, we received an explanation of Terraform, including its benefits and how convenient it is.
Our employee, Teraoka, is a big fan of Terraform, and this session served as a review of what he had taught us at a previous internal study session. He
initially found creating configuration files with HCL cumbersome, but now he says he would rather not build a system without Terraform.

Managed Kubernetes in production at ZOZOTOWN

That's ZOZOTOWN.
I personally use their services all the time.

ZOZOTOWN apparently uses a lot of on-premise servers, and
it seems that a challenge they faced was the high operational burden caused by the increasing number of servers required as the service grew, as well as the need to constantly maintain enough servers to handle sales.

The reasons for adopting Kubernetes are:
① Flexible scaling of the number of servers
: ZOZOTOWN holds regular sales, so it is necessary to be able to scale flexibly;
② The setup process for running applications is cumbersome, so they wanted to containerize them;
③ It can autoscale
. The purpose of adopting Kubernetes is to improve the availability of ZOZOTOWN by ensuring stable system operation, reducing operational burden, and designing it with a view to multi-cloud migration accompanying microservices.

Apparently, the system replacement has been underway since August 2017 with five members, and they are currently still recruiting new members

Regarding the operation of Kubernetes, it is true that there are many aspects that make operation easier, but there is a wide range of knowledge that needs to be remembered, from low to high layers

When asked why they chose Microsoft Azure, a major factor seemed to be the extensive Microsoft Unified Support. The ability to
contact support as many times as needed, resolve problems with Microsoft engineers, and even get advice on on-premises issues was a big draw.
Indeed, being able to contact support and receive a quick response is likely to be faster and more accurate than researching it yourself, which is a significant advantage.

Log Management and Security in the DevSecOps Era

This was a discussion about log management from a DevOps perspective.
In DevOps, it is necessary to manage three types of logs: development logs, operational logs, and audit logs.
I myself have no experience with DevOps, but log management is also extremely important in MSPs.
After all, if you don't keep logs, you can only check information about incident response and the moment an incident occurred, and only at that moment.

We were taught the key points of three logs: development logs, operation logs, and audit logs

the development logs
and the tasks performed by the developers
. In automated testing environments, development logs may not need to be designed as they are included in the development environment
. Development logs do not need to be kept for a long period, but it is good to keep them for a certain period to analyze how workers work.
⇒ Even after development is complete, it is good to keep work records for learning and growth to improve development efficiency.

Design a "dashboard" and "self-service interface" that allow operators to make their own decisions and perform tasks based on operational logs . Operator tasks should also be recorded as work logs . Operational tasks should follow a "change management" process, and the system should be designed to allow for rollback. ⇒ A dashboard is extremely helpful. At a glance, it's possible to pinpoint the cause, and sometimes the cause actually matches. ⇒ Keeping work logs is also important. If a problem occurs as a result of the work, rolling back may take a long time, or it may be impossible. It's best to record the work process as a procedure beforehand.

audit logs
and reports that auditors can use to make informed decisions, detailed log information must be prepared.
In security audits, it's not enough to simply state that data is encrypted; it's also necessary to explain that authorized personnel are handling the data appropriately.
Remote journaling and other methods must be used to demonstrate the objectivity of the logs.

Practical NoOps - Will NoOps really change the way we work?

We heard from a major company about how they have actually put the NoOps (No Uncomfortable Ops) approach into practice

Things that are "unpleasant" in system operation and maintenance:
1. Achieving system operation and maintenance that does not disrupt the user experience
(downtime due to failures, planned shutdowns, performance degradation during periods of high load, etc.)
2. Minimizing "toil" that occurs in system operation and maintenance
(release procedures, patch application, resource monitoring, standby, etc.)
3. Optimizing system operation and maintenance costs
(avoiding surplus resources, appropriate quality, overtime work, personnel utilization, etc.)

"Toil" seems to mean "labor" or "hard work."
tasks that are manual, repetitive, can be automated, tactical, lack long-term value, and have an O(n) timescale relative to the growth of the service
.

I was told that a system is made up of "value" for the user and "burden" for the provider.
Certainly, as someone in charge of operation,
the ideal is to maximize "value" and minimize "burden."
But in reality, it's often the case that "value" is small and "burden" is large.

So, NoOps is about reducing the "unwelcome" burden of operations.
I was interested in the title from the start because I belong to a team at my company that is working to reduce waste and streamline operations.

NoOps seems to have two aspects: "defense" and "offense."
Characteristics of "defense" NoOps
: automation of monitoring and notifications
, automation of retries
, automation of configuration changes
, standardization of methods
, and visualization of status
(and other SRE activities).

The characteristics of "proactive" NoOps
are Containers
, Microservices
, and Serverless
. The idea is to design systems that inherently do not require operations.

Next, we heard from a company that is actually implementing NoOps. Fujifilm Software Co., Ltd. is implementing NoOps with their "IMAGE WORKS" cloud service for managing and sharing photos and designs . They shared their perspectives from both an engineer's and a manager's point of view.

From an engineer's perspective

■Before NoOps
- We want to reduce the load of release work.
The system can only be stopped at night, and as the number of servers increases, the release work also increases.
Massive operations by one person pose a risk to the entire service, and investigating who did what and recovering is difficult.

■What we want to achieve with NoOps
: Reduce the workload of release operations.
Utilize App Service and Azure Functions (this is where Azure comes in).
Automate build and deployment using Azure DevOps.

- We want to reduce the effort required to respond to failures
. - We want to separate services by process, rather than consolidating functions into a single App Service.
- We want to have a standby machine in another region so that we can switch over at any time.

■ Benefits of NoOps:
- Reduced the workload of release work -
Ability to release anytime without service downtime -
One-click release to production environment

- We want to reduce the effort required to respond to failures.
Even if an unexpected error occurs, the entire service will not stop.
A standby machine in another region will operate, giving us more time to recover.

■ What was bad about NoOps
? - We wanted to reduce the workload of release work. -
It failed about 10% of the time, so verification was necessary. -
Even if it failed, the status would still be normal, so visual verification was essential.

- The more you divide up processes to reduce the effort required to handle failures , the more monitoring and verification items you need to have . Developers and operators must understand the processing flow for each division.

While being able to release at any time is incredibly convenient,
it's not without its drawbacks.
Certainly, having more items to check means more time is required.

Manager's perspective

Don't rush NoOps results

Initially, they said it would just be a matter of shifting the areas where costs are incurred. Specifically, before implementing NoOps , business-related costs were 60%, maintenance costs 30%, and improvement costs 10%. In the initial stages of implementing NoOps , business-related costs were 60%, maintenance costs 20%, and improvement costs 20% . The goal of NoOps is business-related costs 70%, maintenance costs 10%, and improvement costs 20%. This means reducing maintenance costs and reallocating them to business-related costs.

They said that it wouldn't work out to reduce the overall cost as follows:
Business-related costs (40%), maintenance-related costs (10%), improvement-related costs (10%).

SRE team organization

- Things don't always go as planned.
The necessary skills and mindset are different from traditional development/operations.
impossible to do everything from development to support.
I can't write programs (opinion from an Ops member).
Feature development is still the glamorous part (opinion from a Dev member).
I hear this a lot; it's difficult to aim for what's written in the SRE book.

Should we convert players or develop new ones?
He used the top teams in the world of soccer and the J.League as examples, explaining that simply imitating the tactics of the top league won't work if the players' abilities aren't there.
I see... that's a very clear example.

The team composition is still being explored

Asahi Pro Management Co., Ltd. and
Itochu Techno-Solutions Corporation:
What we did
- From on-premise to fully managed
- Automated recovery architecture
- Autonomous operation

Even with a fully managed architecture, operational challenges seem to exist.
From the operations team:
"Our customers' tasks are being automated more and more, but our tasks aren't..."
⇒ [Challenge] Let's try using RPA
⇒ [Result] Successfully automated routine tasks
⇒ [Challenge] Maintenance of the robot execution environment is required, and processing delays occur frequently.
I didn't know much about RPA, but it seems to be very useful for routine tasks (Excel work, etc.).
NoOps is not just about changing the infrastructure configuration.

"Adjusting resources according to system usage"
⇒ [Challenge] Realizing a scalable architecture
⇒ [Result] Reduced workload during peak usage
⇒ [Issue] Not yet applicable to running systems
. I got the impression that scalable architecture is commonplace in NoOps.

summary

This article has turned out to be quite long.
That's because it was such a substantial event.
I also attended the following sessions, all of which were interesting and full of new information:
"How to Migrate Your On-Premise Database to Azure SQL Database? – A Case Study from Benesse Shinkenzemi"
"PDCA in the Era of 100-Year Lifespans – An Old Yet New Way to Implement PDCA for Career and Work Style Reform"
"Azure Serverless for AWS Engineers"

A fantastic party was held at the end of the first day!
We enjoyed the lively party atmosphere while eating lots of food and sweets.
The DJ was really professional; the music was so good I thought we were in a club.

By the way, Keisuke Honda was also there as a guest.
I already followed him on Twitter, but I was a little moved to see the real thing.
It wasn't Junichi Davidson. It was the real deal.

Every session I attended was worthwhile.
I would definitely like my junior colleagues at our company to attend again next year.

If you found this article helpful,please give it a "Like"!

The person who wrote this article

About the author

Kenta Miyazaki

I joined Beyond as a new graduate in 2017

We provide 24/7/365 operation, maintenance, and monitoring services for servers/clouds primarily used by companies that develop web-based services. I
belong to the System Solutions Department, and my work is driven by the desire to improve Beyond's operations so that our customers can focus on their own businesses.

Certifications: AWS Certified Solutions Architect, GCP Professional Cloud Architect, Linuc1

[New Event] We held a bowling tournament!! [Company Event] The 6th company to visit the former Seikai Ryokan! We crashed Aquaweb's social gathering! ~Medaka no Ie~