Hiyoko SRE's story about participating in HashiCorp Meetup
My name is Teraoka and I am an infrastructure engineer.
He is a member of the SRE team within the company and is responsible for the design, construction, and operation of infrastructure.
By the way, SRE generally stands for "Site Reliability Engineering," but
we call it "Service Reliability Engineering" because "we look at the entire service, not just the site."
This time I'm going to talk about my participation in HashiCorp's Meetup.
It was a short trip as the venue was Shibuya Hikarie in Tokyo
about the HashiCorp tools that are currently a hot topic that supports DevOps It takes about 3 hours from Osaka by Shinkansen, so it's quite far. . .
We are looking forward to the next event in Osaka!
■ Bullet points of impressions
- Raw ham and beer are the best
- Mr. Mitchell Hashimoto likes puns (he gave him chopsticks by saying HashiCorp)
- Approximately 80% of people use Terraform
- Packer is about 40%
- Few people seem to still use Cousul/vault/nomad.
- I'm interested in Consul Connect (communication between services can be encrypted)
- I want to use sentinel (Configuration as Code), so I want it to be released as OSS!
DeNA provided us with prosciutto and beer, which were delicious.
We also received raw ham logs! pic.twitter.com/gkvdEB20zK
— HashiCorp Japan (@hashicorpjp) September 12, 2018
Mitchell Hashimoto's story about puns was interesting.
I came here because they were handing out chopsticks at the venue to fulfill their word.
In terms of technology, the popularity of Terraform is tremendous.
Has anyone used it? Most people raised their hands in response to this question.
Is Packer finally about half? That said
, I personally think it's a best practice tool for managing machine images, so
you should be able to use it.
At our company, we only use Terraform/Packer in some projects, so we need to
first create a system by modularizing and standardizing the code, and
then start verifying the implementation of Consul for tools that have not been introduced. . . I thought.
So I will write about the contents of the speakers.
■Active in AI & analysis platform! HashiCorpProducts
Mr. Hidenori Matsuki, AI System Department, DeNA Co., Ltd. System Division
- Responsible for the infrastructure of AI & analysis infrastructure
- Builds/operates infrastructure that AI R&D engineers can easily and freely develop.
・Solve AI & analysis infrastructure issues using HashiCorp tools
AI infrastructure
AI projects include multiple small-scale projects with 2 to 3 people.
Each uses AWS and GCP to handle highly confidential data and huge amounts of data, so
it is necessary to quickly prepare infrastructure for each platform.
Solving AI infrastructure issues
・Using Terraform
・Using Terraform as an infrastructure provisioning tool
・Starting instances with Terraform and installing middleware with itamae
・Instances can now be easily constructed using tools
・Starting GPU-equipped instances for machine learning build infrastructure
・Use of Packer (proposal)
・We are moving forward with Dockerization of our AI infrastructure
・We want to be able to use Packer to manage it in the AWS/GCP Docker repository
・We also want to write Packer in HCL (I found myself nodding my head because it was so obvious) )
analysis infrastructure
・Collect service logs with fluentd
・Analyze the collected logs with hadoop
・Put the results into BigQuery with embulk
・Visualize from ArgusDashboard
・Upgrading Hadoop is difficult. . .
・It is difficult to even verify the analysis platform in the first place.
Solving analytical infrastructure issues
・Try verifying the analysis platform with Vagrant!
・But it's still difficult. . . (It seems there are remaining assignments)
Impressions
・I thought that Terraform, which is a multi-provider, matched the requirements very well.
・We all agreed not to use Terraform's provisioner (although we use ansible, not itamae)
・Packer also agreed that we wanted to write in HCL
■HashiCorp software that supports container infrastructure
Tomoo Oda, Principal Engineer, Technology Infrastructure Team, Technology Department, GMO Pepabo, Inc.
Lolipop! What is managed cloud?
・Container-based PaaS
・Cloud-based rental server with operation
・Dynamically scales according to the container load
・K8S or Docker are not used
・Uses a unique container environment called Oohori/FastContainer/Haconiwa
・All independently developed (developed with golang)
Why develop and use your own
・Containers need to be highly integrated to provide an inexpensive container environment
・User-managed containers need to be continuously secure
・Flexible settings for container resources and permissions are possible. They need to be active (the container itself dynamically changes resources)
- in short, they were necessary to meet the requirements!
overall features
・Hybrid environment of OpenStack and Baremetal
・Images created with Packer are minimal and used in common for all roles
・Knife-Zero is used instead of Terraform's Provisioner
・Development environment is Vagrant
・Major specification changes that occur frequently and stateless Strategy to discard Immutable Infrastructure due to large number of rolls
Terraform
・I try to reuse it in modules as much as possible
・tfstate is managed with S3 after all
・workspace is used in Production and Staging
・Lifecycle settings are essential to prevent accidents
・I try to do fmt in CI
Consul
・Main is service discovery
・Divide roles with Mackerel depending on external monitoring or service monitoring
・Name resolution using Consul's internal DNS and Unbound
・Used as Vault storage backend
・ConsulAgent/Prometheus Consul/node/blackbox exporter for all nodes Contains
・Multi-stage SSH becomes convenient by resolving names with Consul on the stepping stone server
・Round robin of Proxy Upstream is possible with Consul DNS
vault
・Use PKI and Teansit secrets
・Encrypt all secret information held in the DB with Vault
・Distribute the issued secret information with Chef
・Use tokens while extending TTL
・Expires at max-lease-ttl
PKI RootCA distribution issue
What should I do when connecting to a Vault server with TLS?
- If you use Consul as storage, vault.service.consul will be automatically set for the active Vault
- Set Vault as a certificate authority and issue the server certificate yourself
-Vault will be sealed when restarted
Using Console Template 2
・Reload the server certificate with the SIGHUP signal in the vault
・Seal is not performed with SIGHUP
・Can also be used for logrotate of the audit log
・Root CA issued by the vault is distributed with Chef
Application Token distribution issue
・Application authenticates to Vault approle
・When authenticated, a Token with a specified TTL is issued
・Application uses Token to communicate with Vault
・It doesn't matter if Application has role_id and secret_id of approle No
/ Issue Token outside the Application process
Application Token distribution problem solution
・Execute the command to approle auth when deploying the Application
・The command is POST the TTL extension of secret_id, which is the authentication PW
・Continue to perform authentication and obtain the Token
・Place the obtained Token in a path that can be read by the Application
Vault Token expires and fails
- It is set as custom secret_id to renew Approle's secret_id
- The token set for custom secret_id is issued by auth/token -
There is a max-lease-ttl setting for each mounted secret, and the max-lease-ttl setting is set for the entire system. ttl exists
- If both are not set, the upper limit is 32 days, which is the system's max ttl
- When the token expired, 403 errors frequently occurred in audit.log
- Monitor audit.log with Consul checks on the Vault server Added
/detected items are notified to Slack with Consul Alert.
Impressions
・It's amazing that they are developing their own without using k8s or Docker.
・Many companies do not use Terraform's Provisioner, but instead use other configuration management tools.
・tfstate writes a backend and manages it with S3. is stable
- I don't usually use lifecycle descriptions, so I thought I needed to memorize them
- Multi-stage SSH from a stepping stone is a project I'm in charge of, so I'd like to use Consul
- To be honest, I don't understand vault very well, so I haven't studied it enough. So I'll remember it again. . .
■terraform/packer that supports applibot's DevOps
Planning and Development Div. System Operations Engineer , Applibot Co., Ltd. (CyberAgent Group)
・Team = Company system for each application
・There are only two SYSOPs (persons who prepare the server environment)!
・SYSOP often collaborates with server engineers in the application team
・Stories of solving past issues using terraform/packer
Case1 Image creation
・Using RUNDECK/Ansible/Packer
・Creating machine images in conjunction with Ansible
・Advantage: Image creation flow can be created as a template
・Before
・Start EC2 from AMI
・Apply configuration changes with Ansible
・Delete the imaged server
・Execute everything manually
One shot with After
Case02 Load test construction
・Using terraform/hubot
・SYSOP builds/changes configuration/reviews of environment for load testing
・Things to be prepared around the infrastructure
・Finally,
create an environment equivalent to the
・Create it at the timing of the test ・It will be costly to maintain it at the same scale as the actual production, so create it for each test
・
Small when not in use during the exam period a server that places a load on the server , and have separate networks and accounts.
・Master-slave configuration. Scale slaves with AutoScalng
- Create each test, delete when not in use
- Various monitoring
- grafana/prometheus/kibana+elasticsearch
・Before
・Running awscli for the management instance
・Adjusting the startup order with a script
・The secret sauce for the finished product
・Supporting expansion/reduction every morning and night on the sysop side
- After
- All components are converted to terraform
- Already existing resources can be handled by terraform import
- If consistency is unavoidable, modify tfstate directly
- Boot management has been consolidated
- You can see the configuration by looking at the terraform code
- Creation and scaling can now be executed with terraform apply
- Specify the configuration order with depends_on
- Switch vars to reduce or expand the infrastructure
- Everything can now be completed with just notifications from slack
Case3 New environment construction
・Using terraform/gitlab
・Before
・Root account sealing
・Consolidated Billing settings
・IAMUser creation, group creation, switch creation
・Creation of S3 bucket for CloudTrail/log
・Network settings, opening ports for monitoring
・After
・You can now manage the settings of the account itself
・Seal the root account
・Configure Consolidated Billing
・Create an IAM user for terraform
OK to run terraform
・Setting changes can also be managed with MergeRequest
・It is important to create templates for routine work! !
Impressions
・Creating a machine image with Packer + Ansible is really convenient
・While creating a machine image with Packer, you can manage the configuration of commands to be executed with Ansible
・In addition to Ansible, Packer can also execute shell scripts
・Load Our company also uses terraform to build a test environment, so it was helpful.
・I didn't have chat notifications using hubot, so I'd like to incorporate it.
・All the processing necessary when building a new environment is done using module. It seems to be changing, so it was helpful in many ways.
■tfnotify - Show Terraform execution plan beautifully on GitHub
b4b4r07, Software Engineer, Mercari, Inc.
What is tfnotify?
・Go CLI tool
・Parse Terraform execution results and notify any notification destination
・Use by piping as terraform apply | tfnotify apply
Why is it necessary?
・Using Terraform in the Microservices area
・From the perspective of ownership, review merging should be done by each Microservices team, but there are cases where the Platform team should review it
・The Platform/MicroServices team understands the importance of IaC and uses Terraform I want to get into the habit of coding and looking at the execution plan every time.I
want to save the hassle of going to the CI screen (I want to quickly check it on GitHub)
Hashicorp tools on Mercari
・Over 70 MicroServices
・Use Terraform to build infrastructure for all MicroServices and their platforms
・Encourage developers to practice IaC
・Number of tfstates: 140
・Manage all Terraform code in one central repository are doing
Repository configuration
・Separate directories for each MicroServices
・Separate tfstate for each Service
・Separate files for each Resource
・Delegation of authority with CODEOWNERS
delegation of authority
・Functions of GitHub
・Cannot be merged until it is approved by the person listed in CODEOWNERS
・Cannot be changed without permission
Implementation of tfnotify
-Uses io.TeeReader
-Structures Terraform execution results using its own parser
-POST messages can be written in Go templates
-Settings are in YAML
Impressions
・Terraform is managed in a central repository, and I was worried about how to manage code with Git, so this was helpful.
I would be even more happy if tfnotify supported GitlabCI and chat work.
・Practice IaC for developers as well. It's important to receive it, and it's important to check Terraform's execution plan thoroughly every time.I
want to use Github's CODEOWNERS function (I want to migrate to Github first...)
About using HashiCorp OSS in dwango
Mr. Eigo Suzuki, Dwango Co., Ltd. Second Service Development Division Dwango Cloud Service Department
Consulting Section Second Product Development Department First Section
dwango infrastructure
・VMware vSphere
・AWS
・Bare metal
・Some other Azure/OpenStack etc.
HashiCorp tools in dwango
・Used to create Packer
/Vagrant boxes
・Used to create AMI as AWS environment has increased
・
Used as Consul/ServiceDiscovery
・Rewrites application configuration files when MySQL server goes down
・Used as KVS and healthcheck
・Used for Ansible playbook Used as a dynamic inventory
Used for configuration management of
Terraform - Used in NicoNico Survey, friends.nico, Nikonare, part of N Preparatory School, etc.
CI in Terraform (tools used)
・github
・Jenkins
・Terraform
・tfenv
CI (flow) in Terraform
1. PR is generated
2. Plan -> apply -> destroy pr_test environment in Jenkins
3. If 2 passes, plan -> apply to sandbox environment
4. If everything passes, merge
What's nice about CI in Terraform
・You can see whether the apply was successful or failed on github.
・Since the environment configuration is dev/prod, you can confirm in advance that the apply can be performed safely.
・Using tfenv, you can check when updating the version of terraform itself. easy to do
CD in Terraform
・If CI passes, I want to automatically deploy it to dev etc.
・I create a job in Jenkins so that it can be done safely
・No matter who does the work, it can be deployed without any problems
・Results are notified via Slack So you can easily check
Impressions
・Packer is convenient for creating AMI
・Since we also use Ansible, we would like to use Consul as a dynamic inventory
・It is convenient to switch versions of tfenv
・Even if the plan passes with terraform, an error may occur when applying it. I thought it would be a good idea to deploy to the pr_test environment.
- The CD is Uncle Jenkins after all. It is easier and more reliable to create a job than to execute a command directly.
. . . The above is mostly bullet points, but it is a summary.
We would like to actively use HashiCorp's tools to realize IaC at our company!