What is Google's ultra-fast data warehouse "BigQuery"?
table of contents
This is Ohara from the China office.
This time, we will focus on the fully managed data warehouse "BigQuery" provided by Google.
BigQuery | Google Cloud Platform — Google Cloud Platform
What is BigQuery anyway?
BigQuery is a big data analysis service provided by Google, and was announced as an official service at Google I/O (an event for developers hosted by Google) in 2012.
Originally, there was a data analysis system called Dremel that was used internally at Google, and it has a history of being improved and made available to external users.
Japanese system vendors also offer many services, including big data analysis services and software, but BigQuery is similar to SQL for data sets that range from several TB (terabytes) or several PB (petabytes). This involves executing a query, performing processing in just a few seconds or tens of seconds, and returning search results.
How BigQuery is fast
BigQuery is fast because it uses two mechanisms:
Column structured datastore
Traditional RDBS stores data in rows, and record-oriented (= row-oriented) stores the entire record in the same storage.
However, with column orientation (column orientation), by dividing one record into columns and placing them in separate storage, it is possible to "minimize traffic" and store data with "high compression ratio". This enables high-speed data reference during query execution.
○ "Record = row-oriented" in traditional RDBS
○ "Column = column-oriented" in BigQuery
* Information source: Dremel: Interactive Analysis of Web-Scale Datasets
tree architecture
BigQuery has a tree-style distributed processing structure.
The root server retrieves the query from the client, and by passing the directly intermediate servers, leaf servers executes the query processing, parallelizing the data arranged in the column orientation above, and quickly aggregates the results read there. and outputs the results of the query.
(It seems that even a huge amount of data in the petabyte class, which is over 5-1 billion rows, some information shows results in just a few seconds.
○ Column Structure DataStand
○ Tree Architecture
* Information source: Dremel: Interactive Analysis of Web-Scale Datasets
The above two points are the reasons why BigQuery is fast.
What are the prices you are interested in?
However, even if you use BigQuery, the cost is still a concern, so I've summarized it briefly.
BigQuery's pricing structure has undergone major changes since 2023 and consists of two factors: on-demand pricing and capacity pricing. This time we will list the prices in the Tokyo Region.
● On-demand fee = $7.5 (per TiB)
Up to 1 TiB of free query data is charged based on the number of bytes each query has been processed on BigQuery
●Capacity fee = $0.051 (for Standard Edition)
- Charges will be charged for query processing capacity (in slot (virtual CPU)).
*Prices vary depending on the edition.
For more information, please visit the official BigQuery pricing website
summary
It's cheap anyway, so why not give it a try? (If you have a Google account, you can start right away)
▼ For more information on BigQuery's services, click here ▼
https://cloud.google.com/bigquery/?hl=ja
If you want to consult a cloud professional
Since its founding, our company Beyond has developed technology as a multi-cloud integrator and managed service provider (MSP) and has designed, constructed, and migrated it using a variety of cloud server platforms, including AWS, GCP, Azure, and Oracle Cloud.
We offer a custom-made cloud server environment optimized for customers according to the specifications and functions of the system and application that we are looking for, so if you are interested in the cloud, please feel free to contact us.
● Cloud/server design/construction
● Cloud/server migration/migration
● Cloud/server operation, maintenance, and monitoring (24 hours a day, 365 days a year)