[DWH] Snowflake Features and Architecture [Big Data]

This is Ohara from the Technical Sales Department

of the data warehouse (DWH) "SnowflakeI will describe the features and architecture
Well-known cloud-based DWHs include "Google BigQuery" on GCP and "Amazon Redshift" on AWS, but recently "Snowflake" has also been gaining popularity.
Furthermore, Snowflake allows you to specify the AWS, GCP, or Azure platform and run Snowflake services on their infrastructure.

*Information as of September 2020

Snowflake Features

● All data in one source

Snowflake creates a single, query-ready source for effectively managing all your data, including JSON and XML, with nearly unlimited, low-cost cloud storage. You can also access and provide shared data to your customers and partners through a unique private data exchange

● Fully SQL compatible / Multi-cluster

Support unlimited concurrent users and queries with near-limitless concurrency on multi-cluster computing resources. Query semi-structured data directly with SQL, fully ANSI SQL compatible and natively supported, leveraging your choice of analytics and machine learning tools

● Near-zero maintenance

Automatic updates with no planned downtime eliminate system administration and maintenance. Snowflake usage also scales automatically up and down with per-second pricing, enabling global data access and cross-cloud data synchronization

Snowflake architecture

A key feature of the Snowflake architecture is its "three-tier design," which uses separate layers for storage, computing, and cloud services.
Although computing and storage resources are physically separated, they form a logically unified data platform system, enabling uninterrupted scaling.

● Service:

It consists of stateless computing resources running across multiple Availability Zones.
This layer provides a highly available and distributed metadata store for global state management, enabling services such as data pruning, data exchange, and cross-cloud data replication.

The service layer provides security and encryption key management, enabling all SQL, DML, and DDL functions, including:

of user sessions
Provide authentication and management
- Apply security features
- Compile and optimize queries
- Coordinate all transactions

For example, to perform data pruning, the service tier compiles query metadata to determine which micropartitions need to be scanned to complete queries quickly.
This ensures that only the data necessary to complete the query is scanned, resulting in improved performance.

Additionally, automated metadata processing is performed by a separate integrated subsystem, which collects statistics and performs other metadata operations without requiring user computing resources

● Compute:

The compute layer is the backbone of Snowflake: a computing engine designed to process large amounts of data quickly and efficiently, performing all data processing

Retrieves the minimum data required from the storage tier to satisfy queries as dictated by Snowflake's data pruning algorithms

Snowflake's unique multi-computing engines work on the same data simultaneously with system-wide transactional consistency and full ACID compliance, ensuring
consistent data is always referenced in read operations (SELECT) as isolated workloads.
(Write operations will never block the Reader.)

• Locally cache data and query results to significantly improve performance and reduce costs.
(No computing charges are incurred for cached query results.)

● Storage:

The storage layer performs the following operations when processing data:

- Divide data into micropartitions, creating hundreds of thousands of partitions for each data file
. - Extract metadata (such as timestamps and minimum/maximum values) to enable efficient query processing.
- Compress micropartitions to save on storage and space costs. -
Fully encrypt data using a secure key hierarchy.

summary

Snowflake is a service that relies on the infrastructure of a cloud platform, and
while there are areas of competition with AWS, GCP, and Azure, which already have their own data warehouse services,
it's interesting to consider using Snowflake's dedicated data cloud service depending on your needs.

If you found this article helpful,please give it a "Like"!
3
Loading...
3 votes, average: 1.00 / 13
15,865
X Facebook Hatena Bookmark pocket

The person who wrote this article

About the author

Ohara

He started his career in the telecommunications industry as a salesperson responsible for the implementation of IT products such as corporate network services, office equipment, and groupware

He then worked at a system integrator-affiliated data center company as a pre-sales engineer for physical servers and hosting services, and as a customer engineer for SaaS-based SFA/CRM and B2B e-commerce, before joining Beyond, where he currently works

I am currently stationed in China (Shenzhen) and my daily routine is watching Chinese dramas and Billbill

Qualifications: Bookkeeping Level 2