What is Google's ultra-fast data warehouse "BigQuery"?
table of contents [非表示]
This is Ohara from the technical sales department.
This time, we will focus on the fully managed data warehouse "BigQuery" provided by Google.
BigQuery | Google Cloud Platform — Google Cloud Platform
What is BigQuery anyway?
BigQuery is a big data analysis service provided by Google, and was announced as an official service at Google I/O (an event for developers hosted by Google) in 2012.
Originally, there was a data analysis system called Dremel that was used internally at Google, and it has a history of being improved and made available to external users.
Japanese system vendors also offer many services, including big data analysis services and software, but BigQuery is similar to SQL for data sets that range from several TB (terabytes) or several PB (petabytes). This involves executing a query, performing processing in just a few seconds or tens of seconds, and returning search results.
How BigQuery is fast
BigQuery is fast because it uses two mechanisms:
Column structured datastore
Traditional RDBS stores data in rows, and record-oriented (= row-oriented) stores the entire record in the same storage.
However, with column orientation (column orientation), by dividing one record into columns and placing them in separate storage, it is possible to "minimize traffic" and store data with "high compression ratio". This enables high-speed data reference during query execution.
○ "Record = row-oriented" in traditional RDBS
○ "Column = column-oriented" in BigQuery
* Information source: Dremel: Interactive Analysis of Web-Scale Datasets
tree architecture
BigQuery has a tree-style distributed processing structure.
The root server retrieves the query from the client, and by passing the directly intermediate servers, leaf servers executes the query processing, parallelizing the data arranged in the column orientation above, and quickly aggregates the results read there. and outputs the results of the query.
(It seems that even a huge amount of data in the petabyte class, which is over 5-1 billion rows, some information shows results in just a few seconds.
○ Column Structure DataStand
○ Tree Architecture
* Information source: Dremel: Interactive Analysis of Web-Scale Datasets
The above two points are the reasons why BigQuery is fast.
What are the prices you are interested in?
However, even if you use BigQuery, the cost is still a concern, so I've summarized it briefly.
● Storage capacity = $0.020 / GB / month
- Data capacity stored in BigQuery
* Even with 1 TB of data, "$20 = approximately 2,000 yen / month"
● Processing capacity by query = $5 / TB
・Amount of data scanned when executing a query
*Data processing by query is free for up to 1 TB per month
● Streaming insert = $0.01 / 200MB
・This is an API used for real-time data collection, and you are charged for the amount of data inserted into the table.
summary
It's cheap anyway, so why not give it a try? (If you have a Google account, you can start right away)
▼ For more information on BigQuery's services, click here ▼
https://cloud.google.com/bigquery/?hl=ja