Introducing phirehose, which allows you to easily handle Twitter Streaming API with PHP!
table of contents
Hello.
I'm Mandai, in charge of Wild on the development team.
I'm not sure if Twitter is popular these days, or if it's already established, but the Twitter Streaming API , so I gave it a try.
Writing a program from scratch can be quite difficult, so I tried using a convenient library and it turned out to be very easy, so I would like to introduce it to you.
Obtain an access token and access secret for Twitter API
Without this, nothing will start, so get it quickly.
You can either get a dedicated account or get one as an existing user.
you log in to Twitter and here , there will be a link called "Create New App", so register from there.
The Streaming API has no limit on the number of accesses (or rather, since it's streaming, you can keep getting data without leaving the connection), so you don't have to worry about getting banned if you overdo it.
However, if you retry many times even though communication is not established, it may be stopped, so you will need to create a program for reconnection, response monitoring, etc.
It takes time to create it, so this time I will introduce a library that covers that area.
Introducing phirehose to easily use the Streaming API
phirehose is a PHP library that interacts with the Twitter Streaming API.
If you use this, there are classes that will handle everything from authentication to connection and data acquisition, so just copy and paste the sample and fill in the access key information, and run it from the console to get the data.
The repository is published on github, so you can use it immediately if you git clone it.
# git clone https://github.com:fennb/phirehose.git in the directory within the project
That's all, so I would like to introduce some more classes that correspond to the Streaming API endpoints.
Public streams
Public Streams is an API that retrieves data from the entire Twitter timeline.
There are two endpoints.
POST statuses/filter
POST statuses/filter is defined using the POST method, it also reads GET parameters, making it a relatively well-designed API.
Narrow down the tweets to be read based on Twitter user ID, keywords, and location information.
follow (user ID)
User ID to filter.
In phirehose, pass it as an array from the setFollow() method.
track (keyword)
Keyword to filter.
In phirehose, pass it as an array from the setTrack() method.
location (location information)
Location information to filter.
The selection is based on a rectangle represented by two points, the lower left and upper right, so if you try to filter only within Japan, you will need to connect multiple pieces of location information.
In phirehose, pass it as an array from the setLocations() method.
Also, it should always be a 2-dimensional array
- Longitude of lower left (southwestmost)
- Latitude of lower left (southwestmost)
- Longitude of upper right (northeast)
- Upper right (northeastmost) latitude
You will pass an array whose value is an array containing values in the order of .
Therefore, by setting multiple rectangles, it is possible to create a range that covers Japan.
In addition, it is also possible to filter by targeting only multiple large cities.
GET statuses/sample
GET statuses / sample is an API that randomly extracts and obtains a small number of samples from all tweets on Twitter.
It seems that the amount of data is about 1%, but it still seems to have a lot of momentum, so be careful.
There are no options here.
User streams
This is an API that targets a single user and retrieves timelines, profile updates, events, etc.
If you stream for an unspecified number of user accounts via this API, you may be subject to restrictions on connections from the same IP address, so we will consider another option.
The User Streams API also has options for including replies to target users, retweets, etc.
* In phirehose, there is a dedicated method for the track option, but there seems to be no dedicated method for with and replies, but since I did not use the User Streams API this time, I have not investigated the setting method, so I will not mention it ( (This is only a translation of the document provided by Twitter).
with (handling of another account that the user follows)
By default, "with=followings" means that the stream will contain data about the user and the users they follow.
If you specify "with=user", only the account user's data will be included.
replies (handling of replies)
By default, only replies from users who are followers of each other will be streamed, but by setting "replies=all" you can receive all replies.
track (additional tweets by keyword)
By setting a track, you can additionally stream tweets that match keywords.
Site Streams
The Site Stream API is currently in beta (for quite some time now), but unlike the User Streams API, which streams the timelines of multiple users in one stream. Masu.
As it is a beta version, there are various restrictions, but these are as follows. Will there ever be a day when I become a GA?
- A single connection allows you to access 100 users' timelines (and also distributes data such as profile updates). Increase up to 1000 users using Control Streams
- Up to 25 connections/sec. You need to implement exponentially back-off in case you run into errors due to too many calls etc.
- If you open more than about 1000 connections, you will need to coordinate testing and launch with the Twitter Platform team (I looked up what Twitter Platform Team means, but I don't know. Is it a team within Twitter? (This is a very questionable translation, so please take it with a grain of salt.)
Sample of JSON data that can be obtained
The obtained JSON data sample looks like the following.
The data per item is relatively large, and a variety of data can be collected.
array(25) { ["created_at"]=> string(30) "Mon Mar 27 04:41:07 +0000 2017" ["id"]=> float(0.000000000000E+17) ["id_str"]=> string(18) "0000000000000000" ["text"]=> string(78) "xxxxxxxxxxxxxxxx" ["source"]=> string(82) "xxxxxxxxxxxxxxxx" ["truncated"]=> bool(false) ["in_reply_to_status_id" ]=> NULL ["in_reply_to_status_id_str"]=> NULL ["in_reply_to_user_id"]=> NULL ["in_reply_to_user_id_str"]=> NULL ["in_reply_to_screen_name"]=> NULL ["user"]=> array(38) { [" id"]=> float(0000000000) ["id_str"]=> string(10) "0000000000" ["name"]=> string(13) "xxx xxx" ["screen_name"]=> string(8) " xxxxxxxx" ["location"]=> string(27) "xxxxxxxxxxxxxxxx" ["url"]=> NULL ["description"]=> string(135) "xxxxxxxxxx" ["protected"]=> bool(false) [ "verified"]=> bool(false) ["followers_count"]=> int(669) ["friends_count"]=> int(533) ["listed_count"]=> int(1) ["favourites_count"]=> int(2267) ["statuses_count"]=> int(3727) ["created_at"]=> string(30) "Fri Mar 20 09:23:52 +0000 2015" ["utc_offset"]=> NULL ["time_zone "]=> NULL ["geo_enabled"]=> bool(true) ["lang"]=> string(2) "ja" ["contributors_enabled"]=> bool(false) ["is_translator"]=> bool( false) ["profile_background_color"]=> string(6) "C0DEED" ["profile_background_image_url"]=> string(48) "xxxxxxxxxxxx" ["profile_background_image_url_https"]=> string(49) "xxxxxxxxxxxx" ["profile_background_tile"]= > bool(false) ["profile_link_color"]=> string(6) "1DA1F2" ["profile_sidebar_border_color"]=> string(6) "C0DEED" ["profile_sidebar_fill_color"]=> string(6) "DDEEF6" ["profile_text_color "]=> string(6) "333333" ["profile_use_background_image"]=> bool(true) ["profile_image_url"]=> string(74) "xxxxxxxxxx" ["profile_image_url_https"]=> string(75) "xxxxxxxxxx" ["profile_banner_url"]=> string(59) "xxxxxxxxxx" ["default_profile"]=> bool(true) ["default_profile_image"]=> bool(false) ["following"]=> NULL ["follow_request_sent"]= > NULL ["notifications"]=> NULL } ["geo"]=> NULL ["coordinates"]=> NULL ["place"]=> array(9) { ["id"]=> string(16) "5ab538af7e3d614b" ["url"]=> string(56) "https://api.twitter.com/1.1/geo/id/5ab538af7e3d614b.json" ["place_type"]=> string(4) "city" [ "name"]=> string(16) "Yokohama City Asahi Ward" ["full_name"]=> string(23) "Kana Yokohama City Asahi Ward" ["country_code"]=> string(2) "JP" [" country"]=> string(6) "Japan" ["bounding_box"]=> array(2) { ["type"]=> string(7) "Polygon" ["coordinates"]=> array(1) { [0]=> array(4) { [0]=> array(2) { [0]=> float(139.488892) [1]=> float(35.440878) } [1]=> array(2) { [ 0]=> float(139.488892) [1]=> float(35.506665) } [2]=> array(2) { [0]=> float(139.570535) [1]=> float(35.506665) } [3] => array(2) { [0]=> float(139.570535) [1]=> float(35.440878) } } } } ["attributes"]=> array(0) { } } ["contributors"]=> NULL ["is_quote_status"]=> bool(false) ["retweet_count"]=> int(0) ["favorite_count"]=> int(0) ["entities"]=> array(4) { ["hashtags" ]=> array(0) { } ["urls"]=> array(0) { } ["user_mentions"]=> array(0) { } ["symbols"]=> array(0) { } } [ "favorited"]=> bool(false) ["retweeted"]=> bool(false) ["filter_level"]=> string(3) "low" ["lang"]=> string(2) "ja" [ "timestamp_ms"]=> string(13) "1490589667759" }
summary
In the past, there were many things that had to be implemented in the Twitter API, such as implementing OAuth authentication, which is a pain and has become commonplace now, but with handling of access tokens, etc., but it took less than 5 minutes. However, no matter how much you worry about it, it's nice to have a library that allows you to get data that satisfies you in just an hour.
That's it.