What is Google's ultra-fast data warehouse "BigQuery"?
table of contents
This is Ohara from the technical sales department.
This time, we will focus on BigQuery, a fully managed data warehouse provided by Google.
BigQuery | Google Cloud Platform — Google Cloud Platform
What is BigQuery anyway?
BigQuery is a big data analysis service provided by Google, and was announced as an official service at Google I/O (an event for developers hosted by Google) in 2012.
Originally, there was a data analysis system called Dremel that was used internally at Google, and it has a history of being improved and made available to external users.
Japanese system vendors also provide many services such as big data analysis services and software, but
BigQuery can handle data sets of several TB (terabytes) or PB (petabytes) using an SQL-like method. It executes a query, performs processing in seconds or tens of seconds, and returns search results.
How BigQuery is fast
BigQuery is fast because it uses two mechanisms:
Column structured datastore
Traditional RDBS stores data in rows, and record-oriented (= row-oriented) stores the entire record in the same storage.
However, with column orientation (column orientation), by dividing one record into columns and placing them in separate storage, it is possible to "minimize traffic" and store data with "high compression ratio". This enables high-speed data reference during query execution.
* Left diagram: Conventional RDBS "record = row oriented"
* Right diagram: BigQuery "column = column oriented"
Image source: Dremel: Interactive Analysis of Web-Scale Datasets
tree architecture
BigQuery has a tree-style distributed processing structure.
The root server obtains queries from clients, passes them through the intermediate servers directly under it, and the leaf servers process the queries in parallel, processing the data arranged in the above column-oriented manner and quickly aggregating the results read there. and outputs the result of the query.
Image source: Dremel: Interactive Analysis of Web-Scale Datasets
- Column structured datastore
- tree architecture
The above two points are the reasons why BigQuery is fast.
*There also seems to be information that results can be obtained in a few seconds even with huge petabyte-class data of 500 to 100 million rows or more.
What are the prices you are interested in?
However, even if you use BigQuery, the cost is still a concern, so I've summarized it briefly.
● Storage capacity = $0.020 / GB / month
- Data capacity stored in BigQuery
* Even with 1 TB of data, "$20 = approximately 2,000 yen / month"
● Processing capacity by query = $5 / TB
・Amount of data scanned when executing a query
*Data processing by query is free for up to 1 TB per month
● Streaming insert = $0.01 / 200MB
・This is an API used for real-time data collection, and you are charged for the amount of data inserted into the table.
summary
It's cheap anyway, so why not give it a try? (If you have a Google account, you can start right away)
■ For details on BigQuery services ▼ [Click here] ▼
https://cloud.google.com/bigquery/?hl=ja