[MySQL compatible] About the distributed SQL database “TiDB” [OSS]
table of contents
This is Ohara from the technical sales department.
This time, I will write about TiDB
TiDB overview
■ TiDB PingCAP , an open source developer , and is currently managed by the CNCF
■ TiDB is an open source NewSQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads.
■It is compatible with "MySQL" and has horizontal scalability, strong consistency, and high availability. It covers OLTP (online transaction processing) and OLAP (online analytical processing) with HTAP and is suitable for a variety of use cases that require high availability and strong consistency with large-scale data.
■ As an example of implementation at a Japanese company, it is also used in the infrastructure of PayPay, which operates a QR payment service.
◇ Quote: Payment platform engineer supporting transactions
TiDB features
Horizontal distribution scale out/scale in
- TiDB architecture design that separates compute from storage allows compute/storage capacity to be independently scaled out/scaled in online as needed.
Multi-replica and high availability
・Replicas that store data in multiple replicas use the Multi-Raft protocol to obtain transaction logs.
This ensures strong consistency and availability even if a small number of replicas go down, as transactions only commit if the data is successfully written to the majority of replicas.
- You can configure the number of regions and replicas as needed to meet the requirements of various disaster resilience levels.
Real-time HTAP
provides two storage engines: TiKV -based storage engine TiFlash , a columnar storage engine
TiFlash uses the Multi-Raft Learner protocol to replicate data from TiKV in real time, ensuring data consistency between the TiKV row-based storage engine and the TiFlash columnar storage engine.
TiKV and TiFlash can be deployed on different machines as needed to solve HTAP resource separation issues.
Cloud-native distributed database
TiDB is a distributed database designed for the cloud, providing flexible scalability, reliability, and security for cloud platforms, allowing users to flexibly scale TiDB to match their workload requirements.
TiDB has at least three replicas of each data and can be scheduled in different cloud availability zones to tolerate data center-wide outages.
TiDB Operator helps manage TiDB on Kubernetes and automates tasks related to operating TiDB clusters, making it easy to deploy TiDB on any cloud that provides managed Kubernetes.
TiDB Cloud fully managed TiDB service that allows you to deploy and run a TiDB cluster in the cloud with just a few clicks.
*TiDB Cloud is a managed service (paid service) deployed within cloud platforms such as AWS, Azure, and GCP.
Compatible with MySQL5.7 protocol and MySQL ecosystem
・TiDB is compatible with the MySQL 5.7 protocol, common MySQL features, and the MySQL ecosystem, so migrating existing applications to TiDB can be done with a small amount of code without having to make many code changes. Just change it.
TiDB also has data migration tools
TiDB architecture
◇ Quote: TiDB Architecture
As a distributed database, TiDB is designed to be composed of multiple components. These components communicate with each other and form a complete TiDB system.
TiDB server
・TiDB Server is a stateless SQL layer that exposes connection endpoints of the MySQL protocol to the outside world. The TiDB server receives SQL requests, performs SQL parsing and optimization, and ultimately generates a distributed execution plan.
- Horizontally scalable and provides integration interfaces to the outside world via load balancing components such as Linux Virtual Server (LVS)/HAProxy/F5. No data is stored, it is only for computing and SQL analysis, and it sends actual data read requests to TiKV nodes (or TiFlash nodes).
PD (Placement Driver) server
・The PD server is a component that is responsible for managing metadata for the entire cluster and consists of at least three nodes.
- Stores the real-time data distribution metadata of every single TiKV node and the topology structure of the entire TiDB cluster, provides the TiDB dashboard management UI, and assigns transaction IDs to distributed transactions.
・The PD server not only stores cluster metadata but also sends data scheduling commands to specific TiKV nodes according to the data distribution status reported by the TiKV nodes in real time.
storage server
◇ TiKV server
・TiKV is a distributed transactional key-value storage engine, and TiKV servers are responsible for storing data.
・Each region stores data for a specific key range, which is the left-closed and right-open interval from StartKey to EndKey, and each TiKV node has multiple regions. The TiKV API provides native support for distributed transactions at the key-value pair level and supports snapshot isolation level isolation by default.
- After processing the SQL statement, the TiDB server converts the SQL execution plan into an actual call to TiKVAPI. Because data is stored in TiKV and all data in TiKV is automatically maintained across multiple replicas (3 replicas by default), TiKV has native high availability and supports automatic failover.
◇ TiFlash Server
-TiFlash server is a special type of storage server. Unlike regular TiKV nodes, TiFlash stores data column by column and is primarily designed to speed up analytical processing.
summary
It is a highly available service that is open source (OSS), horizontally distributed, and compatible with MySQL, so it may be interesting to add it as a database for web services such as social games and EC sites. .