What is Google's super-fast data warehouse, BigQuery?

table of contents
This is Ohara from the China office
This time, we will focus on BigQuery, a fully managed data warehouse provided by Google
BigQuery | Google Cloud Platform — Google Cloud Platform
First of all, what is BigQuery?
BigQuery is a big data analysis service provided by Google, and was officially announced as a service at Google I/O (a developer event hosted by Google) in 2012
Originally, there was a data analysis system called Dremel that was used within Google, which was then improved for external users and made available as a service
Japanese system vendors also offer a wide range of services, including big data analysis services and software, but BigQuery runs SQL-like queries on data sets of several TB (terabytes) or even PB (petabytes), processing them in just a few seconds or even a few tens of seconds and returning search results
How BigQuery is fast
BigQuery is fast because it uses the following two mechanisms:
Column-structured data store
Traditional RDBS stores data row by row, while record-oriented (= row-oriented) stores the entire record in the same storage
However, with column-oriented databases, a single record is divided into columns and placed in separate storage, which minimizes traffic and allows for high-compression data storage, enabling high-speed data lookup when executing queries
○ Traditional RDBS's "record = row-oriented"
○ BigQuery's "column = column-oriented"
*Source: Dremel: Interactive Analysis of Web-Scale Datasets
Tree Architecture
BigQuery has a tree-based distributed processing structure
The root server receives queries from clients, passes them through the intermediate servers directly below, and the leaf servers execute the query processing, processing the column-oriented data in parallel, quickly aggregating the results and providing the query results.
(There is also information that results can be obtained in a few seconds even for huge amounts of data on the petabyte scale, such as 500 million to 1 billion rows.)
○ Column structure datastore
○ Tree architecture
*Source: Dremel: Interactive Analysis of Web-Scale Datasets
The above two points are the reasons why BigQuery is fast
Curious about the price?
However, even if you use BigQuery, the cost is still a concern, so I have put together a brief summary
BigQuery's pricing structure will undergo significant changes from 2023 onwards, and will consist of two components: on-demand pricing and capacity pricing. This article lists the prices for the Tokyo region
● On-demand pricing = $7.5 (per TiB)
- Charges are based on the number of bytes processed for each query on BigQuery.
Up to 1 TiB of query data per month is free.
●Capacity fee = $0.051 (for Standard Edition)
・Charges are incurred for query processing capacity (per slot (virtual CPU)).
*Prices vary depending on the edition.
For more information, please see the official BigQuery pricing website
summary
It's cheap, so why not give it a try? (If you have a Google account, you can get started right away.)
▼ For details on BigQuery services, click here ▼
https://cloud.google.com/bigquery/?hl=ja
If you want to talk to a cloud professional
Since our founding, Beyond has used the technical capabilities we have cultivated as a multi-cloud integrator and managed service provider (MSP) to design, build, and migrate systems using a variety of cloud server platforms, including AWS, GCP, Azure, and Oracle Cloud
We provide a custom-made cloud server environment optimized for our customers based on the specifications and functions of the systems and applications they require, so if you are interested in the cloud, please feel free to contact us
● Cloud / Server design and construction
● Cloud / Server migration
● Cloud / Server operation, maintenance and monitoring (24 hours a day, 365 days a year)
1