[Osaka/Yokohama/Tokushima] Looking for infrastructure/server side engineers!

[Deployed by over 500 companies] AWS construction, operation, maintenance, and monitoring services

[Successor to CentOS] AlmaLinux OS server construction/migration service

[For WordPress only] Cloud server “Web Speed”

Introducing phirehose, which allows you to easily handle Twitter Streaming API with PHP!

2017.03.28

Web system development

table of contents [非表示]

1 Obtain an access token and access secret for Twitter API
2 Introducing phirehose to easily use the Streaming API
3 Public streams
4 User streams
5 Site Streams
6 Sample of JSON data that can be obtained
7 summary

Hello.
I'm Mandai, in charge of Wild on the development team.

I'm not sure if Twitter is popular these days, or if it's already established, but the Twitter Streaming API , so I gave it a try.
Writing a program from scratch can be quite difficult, so I tried using a convenient library and it turned out to be very easy, so I would like to introduce it to you.

Obtain an access token and access secret for Twitter API

Without this, nothing will start, so get it quickly.
You can either get a dedicated account or get one as an existing user.

you log in to Twitter and here , there will be a link called "Create New App", so register from there.

The Streaming API has no limit on the number of accesses (or rather, since it's streaming, you can keep getting data without leaving the connection), so you don't have to worry about getting banned if you overdo it.
However, if you retry many times even though communication is not established, it may be stopped, so you will need to create a program for reconnection, response monitoring, etc.

It takes time to create it, so this time I will introduce a library that covers that area.

Introducing phirehose to easily use the Streaming API

phirehose is a PHP library that interacts with the Twitter Streaming API.

If you use this, there are classes that will handle everything from authentication to connection and data acquisition, so just copy and paste the sample and fill in the access key information, and run it from the console to get the data.

The repository is published on github, so you can use it immediately if you git clone it.

1	`# git clone https://github.com:fennb/phirehose.git in the directory within the project`

That's all, so I would like to introduce some more classes that correspond to the Streaming API endpoints.

Public streams

Public Streams is an API that retrieves data from the entire Twitter timeline.
There are two endpoints.

POST statuses/filter

POST statuses/filter is defined using the POST method, it also reads GET parameters, making it a relatively well-designed API.

Narrow down the tweets to be read based on Twitter user ID, keywords, and location information.

follow (user ID)

User ID to filter.
In phirehose, pass it as an array from the setFollow() method.

track (keyword)

Keyword to filter.
In phirehose, pass it as an array from the setTrack() method.

location (location information)

Location information to filter.
The selection is based on a rectangle represented by two points, the lower left and upper right, so if you try to filter only within Japan, you will need to connect multiple pieces of location information.
In phirehose, pass it as an array from the setLocations() method.

Also, it should always be a 2-dimensional array

Longitude of lower left (southwestmost)
Latitude of lower left (southwestmost)
Longitude of upper right (northeast)
Upper right (northeastmost) latitude

You will pass an array whose value is an array containing values in the order of .
Therefore, by setting multiple rectangles, it is possible to create a range that covers Japan.

In addition, it is also possible to filter by targeting only multiple large cities.

GET statuses/sample

GET statuses / sample is an API that randomly extracts and obtains a small number of samples from all tweets on Twitter.
It seems that the amount of data is about 1%, but it still seems to have a lot of momentum, so be careful.

There are no options here.

User streams

This is an API that targets a single user and retrieves timelines, profile updates, events, etc.
If you stream for an unspecified number of user accounts via this API, you may be subject to restrictions on connections from the same IP address, so we will consider another option.

The User Streams API also has options for including replies to target users, retweets, etc.

* In phirehose, there is a dedicated method for the track option, but there seems to be no dedicated method for with and replies, but since I did not use the User Streams API this time, I have not investigated the setting method, so I will not mention it ( (This is only a translation of the document provided by Twitter).

with (handling of another account that the user follows)

By default, "with=followings" means that the stream will contain data about the user and the users they follow.

If you specify "with=user", only the account user's data will be included.

replies (handling of replies)

By default, only replies from users who are followers of each other will be streamed, but by setting "replies=all" you can receive all replies.

track (additional tweets by keyword)

By setting a track, you can additionally stream tweets that match keywords.

Site Streams

The Site Stream API is currently in beta (for quite some time now), but unlike the User Streams API, which streams the timelines of multiple users in one stream. Masu.

As it is a beta version, there are various restrictions, but these are as follows. Will there ever be a day when I become a GA?

A single connection allows you to access 100 users' timelines (and also distributes data such as profile updates). Increase up to 1000 users using Control Streams
Up to 25 connections/sec. You need to implement exponentially back-off in case you run into errors due to too many calls etc.
If you open more than about 1000 connections, you will need to coordinate testing and launch with the Twitter Platform team (I looked up what Twitter Platform Team means, but I don't know. Is it a team within Twitter? (This is a very questionable translation, so please take it with a grain of salt.)

Sample of JSON data that can be obtained

The obtained JSON data sample looks like the following.
The data per item is relatively large, and a variety of data can be collected.

array(25) { ["created_at"]=> string(30) "Mon Mar 27 04:41:07 +0000 2017" ["id"]=> float(0.000000000000E+17) ["id_str"]=> string(18) "0000000000000000" ["text"]=> string(78) "xxxxxxxxxxxxxxxx" ["source"]=> string(82) "xxxxxxxxxxxxxxxx" ["truncated"]=> bool(false) ["in_reply_to_status_id" ]=> NULL ["in_reply_to_status_id_str"]=> NULL ["in_reply_to_user_id"]=> NULL ["in_reply_to_user_id_str"]=> NULL ["in_reply_to_screen_name"]=> NULL ["user"]=> array(38) { [" id"]=> float(0000000000) ["id_str"]=> string(10) "0000000000" ["name"]=> string(13) "xxx xxx" ["screen_name"]=> string(8) " xxxxxxxx" ["location"]=> string(27) "xxxxxxxxxxxxxxxx" ["url"]=> NULL ["description"]=> string(135) "xxxxxxxxxx" ["protected"]=> bool(false) [ "verified"]=> bool(false) ["followers_count"]=> int(669) ["friends_count"]=> int(533) ["listed_count"]=> int(1) ["favourites_count"]=> int(2267) ["statuses_count"]=> int(3727) ["created_at"]=> string(30) "Fri Mar 20 09:23:52 +0000 2015" ["utc_offset"]=> NULL ["time_zone "]=> NULL ["geo_enabled"]=> bool(true) ["lang"]=> string(2) "ja" ["contributors_enabled"]=> bool(false) ["is_translator"]=> bool( false) ["profile_background_color"]=> string(6) "C0DEED" ["profile_background_image_url"]=> string(48) "xxxxxxxxxxxx" ["profile_background_image_url_https"]=> string(49) "xxxxxxxxxxxx" ["profile_background_tile"]= > bool(false) ["profile_link_color"]=> string(6) "1DA1F2" ["profile_sidebar_border_color"]=> string(6) "C0DEED" ["profile_sidebar_fill_color"]=> string(6) "DDEEF6" ["profile_text_color "]=> string(6) "333333" ["profile_use_background_image"]=> bool(true) ["profile_image_url"]=> string(74) "xxxxxxxxxx" ["profile_image_url_https"]=> string(75) "xxxxxxxxxx" ["profile_banner_url"]=> string(59) "xxxxxxxxxx" ["default_profile"]=> bool(true) ["default_profile_image"]=> bool(false) ["following"]=> NULL ["follow_request_sent"]= > NULL ["notifications"]=> NULL } ["geo"]=> NULL ["coordinates"]=> NULL ["place"]=> array(9) { ["id"]=> string(16) "5ab538af7e3d614b" ["url"]=> string(56) "https://api.twitter.com/1.1/geo/id/5ab538af7e3d614b.json" ["place_type"]=> string(4) "city" [ "name"]=> string(16) "Yokohama City Asahi Ward" ["full_name"]=> string(23) "Kana Yokohama City Asahi Ward" ["country_code"]=> string(2) "JP" [" country"]=> string(6) "Japan" ["bounding_box"]=> array(2) { ["type"]=> string(7) "Polygon" ["coordinates"]=> array(1) { [0]=> array(4) { [0]=> array(2) { [0]=> float(139.488892) [1]=> float(35.440878) } [1]=> array(2) { [ 0]=> float(139.488892) [1]=> float(35.506665) } [2]=> array(2) { [0]=> float(139.570535) [1]=> float(35.506665) } [3] => array(2) { [0]=> float(139.570535) [1]=> float(35.440878) } } } } ["attributes"]=> array(0) { } } ["contributors"]=> NULL ["is_quote_status"]=> bool(false) ["retweet_count"]=> int(0) ["favorite_count"]=> int(0) ["entities"]=> array(4) { ["hashtags" ]=> array(0) { } ["urls"]=> array(0) { } ["user_mentions"]=> array(0) { } ["symbols"]=> array(0) { } } [ "favorited"]=> bool(false) ["retweeted"]=> bool(false) ["filter_level"]=> string(3) "low" ["lang"]=> string(2) "ja" [ "timestamp_ms"]=> string(13) "1490589667759" }

summary

In the past, there were many things that had to be implemented in the Twitter API, such as implementing OAuth authentication, which is a pain and has become commonplace now, but with handling of access tokens, etc., but it took less than 5 minutes. However, no matter how much you worry about it, it's nice to have a library that allows you to get data that satisfies you in just an hour.

That's it.

If you found this article helpful , please give it a like!

[2026.6.30 Amazon Linux 2 end of support] Amazon Linux server migration solution

The person who wrote this article

About the author

Yoichi Bandai

My main job is developing web APIs for social games, but I'm also fortunate to be able to do a lot of other work, including marketing.
Furthermore, my portrait rights in Beyond are treated as CC0 by him.

[fluentd] A story about how I got hooked on posting to IDCF object storage! Improving security in 5 minutes! Set up two-step verification