We'd like to introduce you to firehose, a tool that allows you to easily use the Twitter Streaming API with PHP!

table of contents
Hello.
I'm Mandai, the Wild team member in charge of development.
I'm not sure if Twitter is still popular these days, or if it's already well-established, but Ithe Twitter Streaming APIthought I'd try using
Writing a program from scratch would be too difficult, so I tried using a convenient library, and it turned out to be incredibly easy, so I'd like to share it with you.
Obtain an access token and access secret for the Twitter API
Nothing can begin without this, so let's get it quickly.
You can either create a dedicated account or use an existing user account.
Log in to Twitter andthis page. You will find a link called "Create New App," so register using that link.
The Streaming API has no limits on the number of accesses (or rather, since it's streaming, the idea is to keep the connection open and continuously acquire data), so there's no need to worry about being banned for overuse.
However, it seems that if you repeatedly retry without establishing a connection, you may be stopped, so you will need to incorporate reconnection and response monitoring into your program.
It takes time to develop this, so this time we will introduce a library that will cover that aspect
Introducing Firehose, which makes it easy to use the Streaming API
phirehoseis a PHP library that handles communication with the Twitter Streaming API.
This provides classes that handle everything from authentication to connection and data retrieval, so all you have to do is copy and paste the sample and fill in the access key information, and run it from the console to retrieve the data
It's a public repository on GitHub, so you can use it immediately by git cloning it
# In a directory within the project, git clone https://github.com:fennb/phirehose.git
That's all for now, so I'd like to introduce a few more classes that correspond to the Streaming API endpoints
Public Streams
Public Streams is an API that retrieves data from the entire Twitter timeline.
There are two endpoints.
POST statuses / filter
The POST statuses/filterAPI is quite user-friendly, as it reads GET parameters despite being defined using the POST method.
Narrow down the tweets to be read using three criteria: Twitter user ID, keywords, and location information
follow (user ID)
The user IDs to filter.
In phirehose, these are passed as an array using the setFollow() method.
track (keyword)
Keywords for filtering.
In phirehose, these are passed as an array to the setTrack() method.
location
The location information to be filtered.
The selection is made using a rectangle represented by two points, the bottom left and top right corners. Therefore, if you want to filter only within Japan, you need to combine multiple location information points.
In phirehose, you pass this as an array using the setLocations() method.
It must also always be a two-dimensional array
- Bottom left (southwesternmost) longitude
- Bottom left (southwesternmost) latitude
- Upper right (northeasternmost) longitude
- Upper right (northeasternmost) latitude
You will pass an array as an argument, which will contain an array of values arranged in that order.
Therefore, by setting multiple rectangles, it is possible to create an area that covers the entire country of Japan.
You can also filter to target only certain major cities
GET statuses / sample
The GET statuses / sampleAPI retrieves a small sample of tweets randomly selected from all tweets on Twitter.
It seems to represent only about 1% of the total data, but even so, the volume of activity is quite high, so be careful.
This has no options
User Streams
This API targets a single user and retrieves their timeline, profile updates, events, and other information.
Streaming from an unspecified number of user accounts via this API could trigger connection restrictions from the same IP address, so we'll consider alternative methods.
The User Streams API also gives you the option to include replies and retweets to the target user
* In phirehose, there is a dedicated method for the track option, but there does not seem to be a dedicated method for with and replies. However, since we did not use the User Streams API this time, we have not investigated how to set it up, so we will not mention it here (this is only a translation of the documentation provided by Twitter)
with (handling other accounts the user follows)
By default, "with=followings" is set, so data about the user and the users they follow will be included in the stream
"with=user" will only include data for the account user
replies (handling replies)
By default, only replies between users who are followers of each other are streamed, but you can receive all replies by specifying "replies=all"
track (additional tweets by keyword)
By setting up a track, you can include additional tweets that match your keywords in your stream
Site Streams
The Site Stream API is currently in beta (and has been for quite some time), and while the User Streams API only covers one user's timeline, it combines the timelines of multiple users into a single stream
As it is a beta version, there are various limitations, such as the following. Will it ever become GA?
- A single connection delivers timelines for 100 users (and other data such as profile updates).Control StreamsThis can be expanded to up to 1000 users using
- Up to 25 connections per second. You must implement exponential back-off in case of errors due to overcalling etc
- If you open more than about 1,000 connections, you'll need to coordinate testing and launch with the Twitter Platform team. (I looked up what the Twitter Platform Team means but couldn't figure it out. Is it a team within Twitter? I'm pretty skeptical, so don't take it at face value.)
Sample of JSON data that can be obtained
Here's a sample of the retrieved JSON data.
Each data entry is relatively large, and you can get a variety of data.
array(25) { ["created_at"]=> string(30) "Mon Mar 27 04:41:07 +0000 2017" ["id"]=> float(0.000000000000E+17) ["id_str"]=> string(18) "0000000000000000" ["text"]=> string(78) "xxxxxxxxxxxxxxx" ["source"]=> string(82) "xxxxxxxxxxxxxxxx" ["truncated"]=> bool(false) ["in_reply_to_status_id"]=> NULL ["in_reply_to_status_id_str"]=> NULL ["in_reply_to_user_id"]=> NULL ["in_reply_to_user_id_str"]=> NULL ["in_reply_to_screen_name"]=> NULL ["user"]=> array(38) { ["id"]=> float(0000000000) ["id_str"]=> string(10) "0000000000" ["name"]=> string(13) "xxx xxx" ["screen_name"]=> string(8) "xxxxxxxx" ["location"]=> string(27) "xxxxxxxxxxxxxxxx" ["url"]=> NULL ["description"]=> string(135) "xxxxxxxxxx" ["protected"]=> bool(false) ["verified"]=> bool(false) ["followers_count"]=> int(669) ["friends_count"]=> int(533) ["listed_count"]=> int(1) ["favourites_count"]=> int(2267) ["statuses_count"]=> int(3727) ["created_at"]=> string(30) "Fri Mar 20 09:23:52 +0000 2015" ["utc_offset"]=> NULL ["time_zone"]=> NULL ["geo_enabled"]=> bool(true) ["lang"]=> string(2) "ja" ["contributors_enabled"]=> bool(false) ["is_translator"]=> bool(false) ["profile_background_color"]=> string(6) "C0DEED" ["profile_background_image_url"]=> string(48) "xxxxxxxxxxxx" ["profile_background_image_url_https"]=> string(49) "xxxxxxxxxxxx" ["profile_background_tile"]=> bool(false) ["profile_link_color"]=> string(6) "1DA1F2" ["profile_sidebar_border_color"]=> string(6) "C0DEED" ["profile_sidebar_fill_color"]=> string(6) "DDEEF6" ["profile_text_color"]=> string(6) "333333" ["profile_use_background_image"]=> bool(true) ["profile_image_url"]=> string(74) "xxxxxxxxxx" ["profile_image_url_https"]=> string(75) "xxxxxxxxxx" ["profile_banner_url"]=> string(59) "xxxxxxxxxx" ["default_profile"]=> bool(true) ["default_profile_image"]=> bool(false) ["following"]=> NULL ["follow_request_sent"]=> NULL ["notifications"]=> NULL } ["geo"]=> NULL ["coordinates"]=> NULL ["place"]=> array(9) { ["id"]=> string(16) "5ab538af7e3d614b" ["url"]=> string(56) "https://api.twitter.com/1.1/geo/id/5ab538af7e3d614b.json" ["place_type"]=> string(4) "city" ["name"]=> string(16) "Yokohama City Asahi Ward" ["full_name"]=> string(23) "Kanagawa Yokohama City Asahi Ward" ["country_code"]=> string(2) "JP" ["country"]=> string(6) "Japan" ["bounding_box"]=> array(2) { ["type"]=> string(7) "Polygon" ["coordinates"]=> array(1) { [0]=> array(4) { [0]=> array(2) { [0]=> float(139.488892) [1]=> float(35.440878) } [1]=> array(2) { [0]=> float(139.488892) [1]=> float(35.506665) } [2]=> array(2) { [0]=> float(139.570535) [1]=> float(35.506665) } [3]=> array(2) { [0]=> float(139.570535) [1]=> float(35.440878) } } } } ["attributes"]=> array(0) { } } ["contributors"]=> NULL ["is_quote_status"]=> bool(false) ["retweet_count"]=> int(0) ["favorite_count"]=> int(0) ["entities"]=> array(4) { ["hashtags"]=> array(0) { } ["urls"]=> array(0) { } ["user_mentions"]=> array(0) { } ["symbols"]=> array(0) { } } ["favorited"]=> bool(false) ["retweeted"]=> bool(false) ["filter_level"]=> string(3) "low" ["lang"]=> string(2) "ja" ["timestamp_ms"]=> string(13) "1490589667759" }
summary
Implementing OAuth authentication for the Twitter API was a pain, and although it's now commonplace, there were a lot of things that had to be implemented in the past, such as handling access tokens. However, it's nice to know that there is a library that allows you to obtain data that satisfies your needs in just an hour, not five minutes, no matter how much you struggle with it
That's all
0
