[Osaka/Yokohama/Tokushima] Looking for infrastructure/server side engineers!

[Deployed by over 500 companies] AWS construction, operation, maintenance, and monitoring services

[Successor to CentOS] AlmaLinux OS server construction/migration service

[For WordPress only] Cloud server “Web Speed”

What is Google's ultra-fast data warehouse "BigQuery"?

2016.03.12

table of contents [非表示]

1 What is BigQuery anyway?
2 How BigQuery is fast
- 2.1 Column structured datastore
- 2.2 tree architecture
3 What are the prices you are interested in?
4 summary

This is Ohara from the technical sales department.

This time, we will focus on the fully managed data warehouse "BigQuery" provided by Google.

BigQuery | Google Cloud Platform — Google Cloud Platform

What is BigQuery anyway?

BigQuery is a big data analysis service provided by Google, and was announced as an official service at Google I/O (an event for developers hosted by Google) in 2012.

Originally, there was a data analysis system called Dremel that was used internally at Google, and it has a history of being improved and made available to external users.

Japanese system vendors also offer many services, including big data analysis services and software, but BigQuery is similar to SQL for data sets that range from several TB (terabytes) or several PB (petabytes). This involves executing a query, performing processing in just a few seconds or tens of seconds, and returning search results.

How BigQuery is fast

BigQuery is fast because it uses two mechanisms:

Column structured datastore

Traditional RDBS stores data in rows, and record-oriented (= row-oriented) stores the entire record in the same storage.

However, with column orientation (column orientation), by dividing one record into columns and placing them in separate storage, it is possible to "minimize traffic" and store data with "high compression ratio". This enables high-speed data reference during query execution.

○ "Record = row-oriented" in traditional RDBS
○ "Column = column-oriented" in BigQuery

* Information source: Dremel: Interactive Analysis of Web-Scale Datasets

tree architecture

BigQuery has a tree-style distributed processing structure.

The root server retrieves the query from the client, and by passing the directly intermediate servers, leaf servers executes the query processing, parallelizing the data arranged in the column orientation above, and quickly aggregates the results read there. and outputs the results of the query.
(It seems that even a huge amount of data in the petabyte class, which is over 5-1 billion rows, some information shows results in just a few seconds.

○ Column Structure DataStand
○ Tree Architecture

* Information source: Dremel: Interactive Analysis of Web-Scale Datasets

The above two points are the reasons why BigQuery is fast.

What are the prices you are interested in?

However, even if you use BigQuery, the cost is still a concern, so I've summarized it briefly.

● Storage capacity = $0.020 / GB / month
- Data capacity stored in BigQuery
* Even with 1 TB of data, "$20 = approximately 2,000 yen / month"

● Processing capacity by query = $5 / TB
・Amount of data scanned when executing a query
*Data processing by query is free for up to 1 TB per month

● Streaming insert = $0.01 / 200MB
・This is an API used for real-time data collection, and you are charged for the amount of data inserted into the table.

summary

It's cheap anyway, so why not give it a try? (If you have a Google account, you can start right away)

▼ For more information on BigQuery's services, click here ▼
https://cloud.google.com/bigquery/?hl=ja

If you found this article helpful , please give it a like!

[2026.6.30 Amazon Linux 2 end of support] Amazon Linux server migration solution

The person who wrote this article

About the author

ohara

I started my career in the telecommunications industry as a salesperson in charge of introducing IT products such as NW services, OA equipment, and groupware for corporations.

After that, he worked as a pre-sales engineer for physical servers/hosting services and as a customer engineer for SaaS-type SFA/CRM/BtoB e-commerce at an SIer-based data center business company, before joining his current company, Beyond.

Currently, I am stationed in Shenzhen, China, the Silicon Valley of Asia, and my daily routine is to watch Chinese dramas and billbill.

Qualification: Second class bookkeeping

This may be an estimate of AWS fees! ? Check the amount of data transfer on a website Try out various PHP execution environments! Docker outfits are the perfect choice for 2016!