[DWH] Snowflake features and architecture [Big Data]

This is Ohara from the technical sales department.

I'll be writing about the features and architecture
of the data warehouse (DWH) " Snowflake " Well-known cloud-based DWHs include "Google BigQuery" on GCP and "Amazon Redshift" on AWS, but "Snowflake" has recently been gaining popularity.
Snowflake also allows you to specify the AWS, GCP, or Azure platform and run the Snowflake service on that infrastructure.

*Information as of September 2020

Snowflake Features

● All data in one source

Snowflake creates a single, query-ready source for effectively managing all your data, including JSON and XML, with nearly unlimited, low-cost cloud storage. You can also access and provide shared data to your customers and partners through a unique private data exchange

● Fully SQL compatible / Multi-cluster

Support unlimited concurrent users and queries with near-limitless concurrency on multi-cluster computing resources. Query semi-structured data directly with SQL, fully ANSI SQL compatible and natively supported, leveraging your choice of analytics and machine learning tools

● Near-zero maintenance

Automatic updates with no planned downtime eliminate system administration and maintenance. Snowflake usage also scales automatically up and down with per-second pricing, enabling global data access and cross-cloud data synchronization

Snowflake Architecture

The Snowflake architecture is characterized by a three-tier design that uses separate layers for storage, computing, and cloud services.
Although computing resources and storage resources are physically separated, they are logically integrated into a single data platform system, enabling non-disruptive scaling.

● Service:

Comprised of stateless computing resources running in multiple Availability Zones,
this layer provides a highly available and distributed metadata store for global state management, enabling services such as data pruning, data exchange, and cross-cloud data replication.

The service layer provides security and encryption key management, enabling all SQL, DML, and DDL functions, including:


Provides authentication and management
of user sessions Enforces security functions Performs
query compilation and optimization
Coordinates all transactions

For example, to perform data pruning, the service layer compiles query metadata to determine which micro-partitions need to be scanned to complete a query quickly
, resulting in better performance as only the data necessary to complete the query is scanned.

Additionally, automated metadata processing is performed by a separate integrated subsystem, which collects statistics and performs other metadata operations without requiring user computing resources

● Compute:

The compute layer is the backbone of Snowflake: a computing engine designed to process large amounts of data quickly and efficiently, performing all data processing

Retrieves the minimum data required from the storage tier to satisfy queries as dictated by Snowflake's data pruning algorithms

Snowflake's unique multiple compute engines operate on the same data simultaneously with system-wide transactional consistency and full ACID compliance,
ensuring isolated workloads and ensuring that read operations (SELECT) always see consistent data
(write operations never block readers)

Cache data and query results locally, significantly improving performance and reducing costs
(no compute charges for cached query results)

● Storage:

The storage layer performs the following operations when processing data:

Divide data into micro-partitions, creating hundreds of thousands of partitions per data file
. Extract metadata (such as timestamps and min/max values) to enable efficient query processing.
Compress micro-partitions to
save on storage and space costs. Fully encrypt data using a secure key hierarchy.

summary

Snowflake is a service that assumes the use of cloud platform infrastructure, and
there is some competition as AWS, GCP, and Azure already have their own DWH services, but
since it is a service dedicated to Snowflake's data cloud, it can be interesting to try using it depending on the purpose.

If you found this article helpful , please give it a like!
2
Loading...
2 votes, average: 1.00 / 12
15,631
X facebook Hatena Bookmark pocket

The person who wrote this article

About the author

ohara

I started my career in the telecommunications industry as a salesperson in charge of introducing IT products such as NW services, OA equipment, and groupware for corporations.

After that, he worked as a pre-sales engineer for physical servers/hosting services and as a customer engineer for SaaS-type SFA/CRM/BtoB e-commerce at an SIer-based data center business company, before joining his current company, Beyond.

Currently, I am stationed in China (Shenzhen) and watch Chinese dramas and billbilville.

Qualification: Second class bookkeeping