[Osaka/Yokohama/Tokushima] Looking for infrastructure/server side engineers!

[Deployed by over 500 companies] AWS construction, operation, maintenance, and monitoring services

[Successor to CentOS] AlmaLinux OS server construction/migration service

[For WordPress only] Cloud server “Web Speed”

Participated in GCP's BigData and Machine Learning training "CPB100"

2017.03.14

others

table of contents [非表示]

1 About BigData
- 1.1 Other services
2 About MachineLearning
- 2.1 Various APIs
- 2.2 make it yourself
3 summary

My name is Ito and I am an infrastructure engineer.

Google Cloud Platform (GCP) one of the cloud technologies that has been rapidly growing (in my opinion) recently .

In December 2016, I attended a GCP seminar called Google Cloud OnBoard, and
there seemed to be over 1,000 participants at that time.

Participated in Google Cloud OnBoard | Beyond Inc.

I think it is the emergence of the Tokyo region that has made it so widespread in Japan.
Google Cloud region in Tokyo | Google Cloud Platform

I think it was around October or November 2016.

This is a long introduction, but as the title says,
I participated in the GCP training "CPB100".
The training will mainly be about big data and machine learning.

Google Cloud Platform Free Training Tour | Top Gate Co., Ltd.

About BigData

Although we did not perform any specific operations in this seminar, we were given a variety of overview information.

Speaking of Google's big data service, it is definitely "BigQuery".
BigQuery- Analytics Data Warehouse | Google Cloud Platform

Of course, this was also mentioned on OnBoard.

BigQuery has the ability to replace 10 billion rows of regular expressions in less than 10 seconds.
So, how does BigQuery work internally?
Data is divided and stored on each HDD, and when a query is run, the data is retrieved and a container is created for each.
Disk I/O becomes a bottleneck when running queries, so we divide it into many containers to enable high-speed analysis.

By the way, BigQuery doesn't add any indexes and performs a full scan.
It seems difficult to index because the data is too large.

Participated in Google Cloud OnBoard | Beyond Inc.

Other services

GCP	AWS	overview
Cloud Dataflow	Amazon Elastic Map Reduce	Managed services such as batch processing
Cloud Dataproc	Amazon Elastic Map Reduce	Managed services for Spark and Hadoop
Cloud Pub/Sub	Amazon Simple Notification Service	simple messaging service

There is this area.
I have included a comparison with AWS services for easy understanding.

Is this what the flow looks like?

Sort data by processing with Compute Engine
Data is stored in CloudStorage
Receive CloudStorage data with Pub/Sub and throw it in the appropriate place
Process data with Dataflow or Dataproc
Also save processed data to CloudStorage

For simple data, you can use ComputeEngine to complete the process, but
it becomes a single point of failure.
I believe that how to effectively use managed services will lead to ``effective use of the cloud.''

About MachineLearning

Machine learning exists in everyday places when you use Google.

For example, GMail. This feature is currently limited to English, but
it uses machine learning to suggest replies based on context.

Computer, respond to this email: Introducing Smart Reply in Inbox by Gmail

Additionally, we used machine learning to adjust cooling power at Google's data centers, successfully reducing it by 40%.
News - Google reduces data center cooling power by 40%, leverages DeepMind's AI: ITpro

Various APIs

Google provides what it has cultivated so far as an API.
Of course, Google Translate also uses machine learning API (Traslation API).

For example this.
Speech API - Speech Recognition | Google Cloud Platform

It's still the same, but it turns what you say into sentences.
Google apps and YouTube also have this feature.

There are image recognition and character recognition, but this seems to have been announced at
Google Cloud Next '17 Cloud Video Intelligence - Video Content Analysis | Google Cloud Platform

This is a video version of image recognition. It's a public beta, so if you want to try it out, you'll need to sign up for now.

make it yourself

With existing APIs, if you pass an image of a person through the image recognition API, you can recognize things like "person" and "male," but
you cannot recognize things like "personal name."

This is because the API already provided by Google does not learn personal names.

A fairly famous example is a machine learning library provided by Google called TensorFlow,
which was used to sort out ``good cucumbers'' and ``big cucumbers.''
Google Cloud Platform Japan Official Blog: TensorFlow connecting cucumber farmers and deep learning

Roughly speaking, the following flow is required.
I have to write very carefully.

Prepare training data, create an algorithm, and create a "trained model"
Use a trained model
Learn more and more to improve accuracy

However, the algorithm is quite difficult to implement.
That's where TensorFlow comes into play.

TensorFlow is a library for implementing DeepLeraning.
As I said earlier, it is "something developed by Google that appeared as a GCP service and became open source."

C++ and Python APIs are available.

Also, MachineLearning requires a very high amount of resources when learning.
Mainly GPU, CPU, etc. (Because it does image recognition, etc.)
"Cloud Machine Learning Engine" is available for that purpose.
Machine Learning requires resources only when learning, so it is well-suited to the cloud.
GPUs are now available, and a large number of GPU-specific machines are being launched behind the scenes.

Predictive Analytics - Cloud Machine Learning Engine | Google Cloud Platform

If you are interested in TensorFlow, there is a TensorFlow User Group, so
I think it would be a good idea to join a study session there.
TensorFlow User Group Tokyo - connpass

However, it was extremely popular, with 200 people participating in a study session for about 20 people...! ! is.

summary

There were many more stories than this, but they were only for those who participated. .

A boxed lunch was provided at noon. It was delicious.
Don't ask me to take better photos.

Ah, GCP is often compared to AWS, but the following part made sense to me as I listened to the talk.

AWS provides "products that are already provided as open source (e.g. Memcached, ElasticSearch, etc.) on AWS in an easy-to-use state for users," but
with GCP, "products that we have developed ourselves"
and provide that product to users as a GCP service."

For example, MapReduce developed by Google has evolved from Dremel and has been released as GCP's "BigQuery," and MapReduce is now available as open source as Hadoop.

GCP is basically the opposite approach from AWS.

If you found this article helpful , please give it a like!