I participated in GCP's Big Data and Machine Learning training "CPB100"

My name is Ito and I am an infrastructure engineer

One cloud service that has been growing rapidly recently (or so I personally believe) is
Google Cloud Platform (GCP).

I attended a Google Cloud OnBoard seminar in December 2016, and
it seems there were over 1,000 participants.

Participated in Google Cloud OnBoard | Beyond Inc

The reason for this level of expansion in Japan is likely the emergence of the Tokyo region.
(Tokyo Google Cloud Region | Google Cloud Platform)

I think it was around October or November 2016

So, after that long introduction, as the title suggests,
I participated in the GCP training "CPB100".
The training mainly covers big data and machine learning.


Google Cloud Platform Free Training Tour | Topgate Co., Ltd

About Big Data


Although we did not perform any specific operations in this seminar, we were given a lot of information about the general overview

When it comes to Google's big data services, "BigQuery" is the one that comes to mind.
BigQuery - Analytics Data Warehouse | Google Cloud Platform

Of course, this was also discussed at OnBoard

BigQuery has the capability to perform regular expression replacements on 10 billion rows in just under 10 seconds.
Internally, BigQuery works by
splitting the data, storing each section on an HDD, retrieving the data when a query is run, and creating separate containers for each section.
Since disk I/O becomes a bottleneck when running queries, splitting the data into numerous containers enables high-speed analysis.

By the way, I heard that BigQuery doesn't create indexes and instead performs full scans. Apparently,
it's more difficult to create indexes because the data is so large.

Participated in Google Cloud OnBoard | Beyond Inc

Other Services

GCP AWS overview
Cloud Dataflow Amazon Elastic Map Reduce Managed services such as batch processing
Cloud Dataproc Amazon Elastic Map Reduce Managed Spark and Hadoop services
Cloud Pub/Sub Amazon Simple Notification Service A simple messaging service

These are some of the options.
I've included a comparison with AWS services for easier understanding.

I guess the flow is something like this

  1. Distributing data by process using Compute Engine
  2. Data is stored in Cloud Storage
  3. Receive CloudStorage data with Pub/Sub and send it to the appropriate location
  4. Process data with Dataflow or Dataproc
  5. Also, the processed data is stored in Cloud Storage

Simple data can be handled entirely with ComputeEngine, but
that becomes a single point of failure.
I believe that knowing how to effectively use managed services is key to "making good use of the cloud."

About Machine Learning


Machine learning is present in everyday situations when you use Google

Take Gmail, for example. While it's currently only available in English,
it uses machine learning to suggest replies based on context.

Computer, respond to this email: Introducing Smart Reply in Inbox by Gmail

Furthermore, Google has successfully reduced the cooling power of its data centers by 40% by using machine learning.
News - Google reduces data center cooling power by 40% using DeepMind's AI: ITpro

Various APIs

Google provides what it has cultivated over the years as an API.
Google Translate, of course, also uses a machine learning API (Translation API).

For example, this:
Speech API - Speech Recognition | Google Cloud Platform

It's exactly what it sounds like: it transcribes what you say into text.
Google apps and YouTube also have this feature, right?

While image recognition and character recognition already exist,Google Cloud Next '17it seems that something like this was announced at
Cloud Video Intelligence - Video Content Analysis | Google Cloud Platform

It's a video version of image recognition. It's in public beta, so you'll need to sign up if you want to try it out

Make it yourself

With existing APIs, if you pass a person's image through an image recognition API, it can recognize things like "person" and "male," but it
can't identify "personal name," right?

This is because the APIs already provided by Google do not learn personal names

This is a fairly well-known example, but it involves using TensorFlow, a machine learning library provided by Google,
to sort cucumbers into categories like "good cucumbers" and "large cucumbers."
Google Cloud Platform Japan Official Blog: Connecting Cucumber Farmers with Deep Learning - TensorFlow

Roughly speaking, this is the kind of flow you'll need.
You'll have to write quite a lot of detailed stuff.

  1. Prepare the training data, create an algorithm, and create a "trained model"
  2. Use a pre-trained model
  3. Continue learning to improve accuracy

However, implementing the algorithm is quite difficult.
That's where TensorFlow comes in.

TensorFlow is a library for implementing deep learning.
As I mentioned earlier, it was developed by Google, then released as a GCP service, and later open-sourced.

C++ and Python APIs are available

Furthermore, machine learning requires a very large amount of resources during the training phase,
mainly GPUs and CPUs (as it performs image recognition, for example).
That's why a "Cloud Machine Learning Engine" is available.
Since machine learning only requires resources during training, it's very well-suited to the cloud.
It enables the use of GPUs, and a large number of GPU-dedicated machines run in the background.

Predictive Analytics - Cloud Machine Learning Engine | Google Cloud Platform

If you're interested in TensorFlow, there's a TensorFlow User Group, and
I recommend checking out their study sessions.
(TensorFlow User Group Tokyo - connpass)

However, it was incredibly popular, with 200 people participating in a study group that was meant to have around 20 people...!!

summary

There was a lot more to say than what was said here, but that was only available to those who attended

Lunch was provided in the form of a bento box. It was delicious.
Let's not talk about how I should have taken a more appetizing photo.

Oh, GCP is often compared to AWS, but the following part made me think, "I see!"

AWS provides products that are already available as open source (e.g., Memcached and Elasticsearch) on AWS in an easy-to-use format for users, while
GCPdevelops its own products, uses them extensively, and then
those products to users as GCP services
offers

For example, MapReduce, developed by Google, has evolved into Dremel and has been released as GCP's "BigQuery," and MapReduce is now available as open source under the name Hadoop

GCP basically takes the opposite approach to AWS

If you found this article helpful,please give it a "Like"!
0
Loading...
0 votes, average: 0.00 / 10
659
X Facebook Hatena Bookmark pocket

The person who wrote this article

About the author