I participated in GCP's Big Data and Machine Learning training "CPB100"

My name is Ito and I am an infrastructure engineer

In my opinion,
Google Cloud Platform (GCP) .

In December 2016, I attended a GCP seminar called Google Cloud OnBoard,
which apparently had over 1,000 participants.

Participated in Google Cloud OnBoard | Beyond Inc

The reason for this expansion in Japan is likely due to the introduction of the Tokyo region.
Google Cloud Region in Tokyo | Google Cloud Platform

I think it was around October or November 2016

That was a long introduction, but as the title of this post suggests,
I participated in the GCP training course "CPB100."
The training focused on big data and machine learning.


Google Cloud Platform Free Training Tour | Topgate Co., Ltd

About Big Data


Although we did not perform any specific operations in this seminar, we were given a lot of information about the general overview

When it comes to Google's big data services, the first thing that comes to mind is "BigQuery."
BigQuery - Analytics Data Warehouse | Google Cloud Platform

Of course, this was also discussed at OnBoard

BigQuery is capable of completing regular expression replacement of 10 billion rows in just under 10 seconds.
So, how does it work internally?
It splits the data and stores it on HDDs. When a query is run, the data is extracted and a container is created for each.
Since disk I/O is a bottleneck when running a query, BigQuery splits the data into many containers to enable high-speed analysis.

By the way, BigQuery does not create an index but performs a full scan. Apparently,
creating an index is more difficult because the data is too large.

Participated in Google Cloud OnBoard | Beyond Inc

Other Services

GCP AWS overview
Cloud Dataflow Amazon Elastic Map Reduce Managed services such as batch processing
Cloud Dataproc Amazon Elastic Map Reduce Managed Spark and Hadoop services
Cloud Pub/Sub Amazon Simple Notification Service A simple messaging service

Here are some examples.
I've included a comparison with AWS services to make it easier to understand.

I guess the flow is something like this

  1. Distributing data by process using Compute Engine
  2. Data is stored in Cloud Storage
  3. Receive CloudStorage data with Pub/Sub and send it to the appropriate location
  4. Process data with Dataflow or Dataproc
  5. Also, the processed data is stored in Cloud Storage

For simple data, Compute Engine can handle it, but
it becomes a single point of failure.
I think that using managed services effectively is what makes the cloud work.

About Machine Learning


Machine learning is present in everyday situations when you use Google

Take Gmail for example. It's currently only available in English, but
it uses machine learning to infer context and suggest replies.

Computer, respond to this email: Introducing Smart Reply in Inbox by Gmail

In addition, Google has used machine learning to adjust the cooling power consumption of its data centers, successfully reducing it by 40%.
News - Google reduces data center cooling power consumption by 40%, using DeepMind's AI: ITpro

Various APIs

Google provides what it has cultivated so far as an API, and
of course Google Translate also uses a machine learning API (Translation API).

For example, this.
Speech API - Speech Recognition | Google Cloud Platform

It transcribes what you say into text.
Google apps and YouTube also have this feature.

There are image recognition and character recognition technologies, but Google Cloud Next '17 , the following was announced:
Cloud Video Intelligence - Video Content Analysis | Google Cloud Platform

It's a video version of image recognition. It's in public beta, so you'll need to sign up if you want to try it out

Make it yourself

With existing APIs, if you run an image of a person through an image recognition API, it can recognize that the person is a "person" or "male," but it
cannot recognize things like the person's name.

This is because the APIs already provided by Google do not learn personal names

This is a fairly well-known example, but there is a machine learning library provided by Google called TensorFlow, and
it was used to sort "good cucumbers" and "big cucumbers."
Google Cloud Platform Japan Official Blog: TensorFlow connects cucumber farmers with deep learning

Roughly speaking, this is the flow you need.
You'll have to write it down quite thoroughly.

  1. Prepare the training data, create an algorithm, and create a "trained model"
  2. Use a pre-trained model
  3. Continue learning to improve accuracy

However, implementing the algorithm is quite difficult,
which is where TensorFlow comes in.

TensorFlow is a library for implementing Deep Learning.
As mentioned earlier, it was developed by Google, released as a GCP service, and made open source.

C++ and Python APIs are available

Additionally, Machine Learning requires a very large amount of resources when learning.
These are mainly GPUs and CPUs. (It does things like image recognition.)
For this purpose, there is a "Cloud Machine Learning Engine."
Since Machine Learning only requires resources "when learning," it is very well suited to the cloud.
GPUs can be used, and a large number of GPU-specialized machines are set up in the background.

Predictive Analytics - Cloud Machine Learning Engine | Google Cloud Platform

If you are interested in TensorFlow, there is a TensorFlow User Group, so
it might be a good idea to join their study group.
TensorFlow User Group Tokyo - connpass

However, it was incredibly popular, with 200 people participating in a study group that was meant to have around 20 people...!!

summary

There was a lot more to say than what was said here, but that was only available to those who attended

Lunch boxes were provided. They were delicious.
No need to tell me to take photos that make them look more delicious.

Oh, GCP is often compared to AWS, but the following part made me think, "I see!"

AWS "puts products already available as open source (e.g., Memcached, ElasticSearch, etc.) on AWS and provides them in an easy-to-use format for users," while
GCP "uses products that they have developed themselves and
provides them to users as GCP services."

For example, MapReduce, developed by Google, has evolved into Dremel and has been released as GCP's "BigQuery," and MapReduce is now available as open source under the name Hadoop

GCP basically takes the opposite approach to AWS

If you found this article useful, please click [Like]!
0
Loading...
0 votes, average: 0.00 / 10
617
X Facebook Hatena Bookmark pocket

The person who wrote this article

About the author