[DWH] Snowflake Features and Architecture [Big Data]

table of contents
This is Ohara from the Technical Sales Department
of the data warehouse (DWH) "SnowflakeI will describe the features and architecture
Well-known cloud-based DWHs include "Google BigQuery" on GCP and "Amazon Redshift" on AWS, but recently "Snowflake" has also been gaining popularity.
Furthermore, Snowflake allows you to specify the AWS, GCP, or Azure platform and run Snowflake services on their infrastructure.
*Information as of September 2020
Snowflake Features
● All data in one source
Snowflake creates a single, query-ready source for effectively managing all your data, including JSON and XML, with nearly unlimited, low-cost cloud storage. You can also access and provide shared data to your customers and partners through a unique private data exchange
● Fully SQL compatible / Multi-cluster
Support unlimited concurrent users and queries with near-limitless concurrency on multi-cluster computing resources. Query semi-structured data directly with SQL, fully ANSI SQL compatible and natively supported, leveraging your choice of analytics and machine learning tools
● Near-zero maintenance
Automatic updates with no planned downtime eliminate system administration and maintenance. Snowflake usage also scales automatically up and down with per-second pricing, enabling global data access and cross-cloud data synchronization
Snowflake architecture
A key feature of the Snowflake architecture is its "three-tier design," which uses separate layers for storage, computing, and cloud services.
Although computing and storage resources are physically separated, they form a logically unified data platform system, enabling uninterrupted scaling.
● Service:
It consists of stateless computing resources running across multiple Availability Zones.
This layer provides a highly available and distributed metadata store for global state management, enabling services such as data pruning, data exchange, and cross-cloud data replication.
The service layer provides security and encryption key management, enabling all SQL, DML, and DDL functions, including:
of user sessions
Provide authentication and management
- Apply security features
- Compile and optimize queries
- Coordinate all transactions
For example, to perform data pruning, the service tier compiles query metadata to determine which micropartitions need to be scanned to complete queries quickly.
This ensures that only the data necessary to complete the query is scanned, resulting in improved performance.
Additionally, automated metadata processing is performed by a separate integrated subsystem, which collects statistics and performs other metadata operations without requiring user computing resources
● Compute:
The compute layer is the backbone of Snowflake: a computing engine designed to process large amounts of data quickly and efficiently, performing all data processing
Retrieves the minimum data required from the storage tier to satisfy queries as dictated by Snowflake's data pruning algorithms
Snowflake's unique multi-computing engines work on the same data simultaneously with system-wide transactional consistency and full ACID compliance, ensuring
consistent data is always referenced in read operations (SELECT) as isolated workloads.
(Write operations will never block the Reader.)
• Locally cache data and query results to significantly improve performance and reduce costs.
(No computing charges are incurred for cached query results.)
● Storage:
The storage layer performs the following operations when processing data:
- Divide data into micropartitions, creating hundreds of thousands of partitions for each data file
. - Extract metadata (such as timestamps and minimum/maximum values) to enable efficient query processing.
- Compress micropartitions to save on storage and space costs. -
Fully encrypt data using a secure key hierarchy.
summary
Snowflake is a service that relies on the infrastructure of a cloud platform, and
while there are areas of competition with AWS, GCP, and Azure, which already have their own data warehouse services,
it's interesting to consider using Snowflake's dedicated data cloud service depending on your needs.
3
