We'd like to introduce you to firehose, a tool that allows you to easily use the Twitter Streaming API with PHP!

table of contents
Hello,
I'm Mandai, the Wild Team member of the development team.
I'm not sure if Twitter is still popular or if it's already established, but I decided to try using
the Twitter Streaming API Writing a program from scratch would be a pain, so I tried using a convenient library, which turned out to be extremely easy, so I'd like to share it with you.
Obtain an access token and access secret for the Twitter API
Without this, nothing can get started, so it's best to get one quickly.
You can either create a dedicated account or use an existing user account.
you log in to Twitter and this page , you will see a link called "Create New App," so register from there.
The Streaming API has no limit on the number of times you can access it (or rather, since it's streaming, you just keep connecting and receiving data), so there's no need to worry about being banned for overdoing it.
However, if you retry multiple times when communication is not possible, you may be stopped, so you will need to create a program that handles reconnection and response monitoring.
It takes time to develop this, so this time we will introduce a library that will cover that aspect
Introducing Firehose, which makes it easy to use the Streaming API
phirehose is a PHP library that interacts with the Twitter Streaming API.
This provides classes that handle everything from authentication to connection and data retrieval, so all you have to do is copy and paste the sample and fill in the access key information, and run it from the console to retrieve the data
It's a public repository on GitHub, so you can use it immediately by git cloning it
# In a directory within the project, git clone https://github.com:fennb/phirehose.git
That's all for now, so I'd like to introduce a few more classes that correspond to the Streaming API endpoints
Public Streams
Public Streams is an API that retrieves data from the entire Twitter timeline.
It has two endpoints:
POST statuses / filter
POST statuses/filters are defined using the POST method, they also read GET parameters, making them a fairly user-friendly API.
Narrow down the tweets to be read using three criteria: Twitter user ID, keywords, and location information
follow (user ID)
User IDs to filter.
In Firehose, this is passed as an array from the setFollow() method.
track (keyword)
Keywords to filter.
In Firehose, pass them as an array from the setTrack() method.
location
Location information to filter.
Selection is made using a rectangle represented by two points in the bottom left and top right, so if you want to filter only within Japan, you will need to connect multiple locations.
In Firehose, this is passed as an array using the setLocations() method.
It must also always be a two-dimensional array
- Bottom left (southwesternmost) longitude
- Bottom left (southwesternmost) latitude
- Upper right (northeasternmost) longitude
- Upper right (northeasternmost) latitude
The argument is an array containing the values in the order shown above.
Therefore, by setting multiple rectangles, it is possible to create an area that covers the entire country of Japan.
You can also filter to target only certain major cities
GET statuses / sample
GET statuses / sample is an API that retrieves a randomly selected small sample of all tweets on Twitter.
It only captures about 1% of the data, but it still seems to be a huge hit, so be careful.
This has no options
User Streams
This is an API that targets a single user and retrieves their timeline, profile updates, events, etc.
Streaming from an unspecified number of user accounts via this API could result in restrictions on connections from the same IP address, so we'll consider a different approach.
The User Streams API also gives you the option to include replies and retweets to the target user
* In phirehose, there is a dedicated method for the track option, but there does not seem to be a dedicated method for with and replies. However, since we did not use the User Streams API this time, we have not investigated how to set it up, so we will not mention it here (this is only a translation of the documentation provided by Twitter)
with (handling other accounts the user follows)
By default, "with=followings" is set, so data about the user and the users they follow will be included in the stream
"with=user" will only include data for the account user
replies (handling replies)
By default, only replies between users who are followers of each other are streamed, but you can receive all replies by specifying "replies=all"
track (additional tweets by keyword)
By setting up a track, you can include additional tweets that match your keywords in your stream
Site Streams
The Site Stream API is currently in beta (and has been for quite some time), and while the User Streams API only covers one user's timeline, it combines the timelines of multiple users into a single stream
As it is a beta version, there are various limitations, such as the following. Will it ever become GA?
- A single connection can stream timelines (and profile updates and other data) for 100 users. Use Control Streams
- Up to 25 connections per second. You must implement exponential back-off in case of errors due to overcalling etc
- If you open more than about 1,000 connections, you'll need to coordinate testing and launch with the Twitter Platform team. (I looked up what the Twitter Platform Team means but couldn't figure it out. Is it a team within Twitter? I'm pretty skeptical, so don't take it at face value.)
Sample of JSON data that can be obtained
A sample of the retrieved JSON data looks like this.
Each piece of data is quite large, and a variety of data can be retrieved.
array(25) { ["created_at"]=> string(30) "Mon Mar 27 04:41:07 +0000 2017" ["id"]=> float(0.000000000000E+17) ["id_str"]=> string(18) "0000000000000000" ["text"]=> string(78) "xxxxxxxxxxxxxxx" ["source"]=> string(82) "xxxxxxxxxxxxxxxx" ["truncated"]=> bool(false) ["in_reply_to_status_id"]=> NULL ["in_reply_to_status_id_str"]=> NULL ["in_reply_to_user_id"]=> NULL ["in_reply_to_user_id_str"]=> NULL ["in_reply_to_screen_name"]=> NULL ["user"]=> array(38) { ["id"]=> float(0000000000) ["id_str"]=> string(10) "0000000000" ["name"]=> string(13) "xxx xxx" ["screen_name"]=> string(8) "xxxxxxxx" ["location"]=> string(27) "xxxxxxxxxxxxxxxx" ["url"]=> NULL ["description"]=> string(135) "xxxxxxxxxx" ["protected"]=> bool(false) ["verified"]=> bool(false) ["followers_count"]=> int(669) ["friends_count"]=> int(533) ["listed_count"]=> int(1) ["favourites_count"]=> int(2267) ["statuses_count"]=> int(3727) ["created_at"]=> string(30) "Fri Mar 20 09:23:52 +0000 2015" ["utc_offset"]=> NULL ["time_zone"]=> NULL ["geo_enabled"]=> bool(true) ["lang"]=> string(2) "ja" ["contributors_enabled"]=> bool(false) ["is_translator"]=> bool(false) ["profile_background_color"]=> string(6) "C0DEED" ["profile_background_image_url"]=> string(48) "xxxxxxxxxxxx" ["profile_background_image_url_https"]=> string(49) "xxxxxxxxxxxx" ["profile_background_tile"]=> bool(false) ["profile_link_color"]=> string(6) "1DA1F2" ["profile_sidebar_border_color"]=> string(6) "C0DEED" ["profile_sidebar_fill_color"]=> string(6) "DDEEF6" ["profile_text_color"]=> string(6) "333333" ["profile_use_background_image"]=> bool(true) ["profile_image_url"]=> string(74) "xxxxxxxxxx" ["profile_image_url_https"]=> string(75) "xxxxxxxxxx" ["profile_banner_url"]=> string(59) "xxxxxxxxxx" ["default_profile"]=> bool(true) ["default_profile_image"]=> bool(false) ["following"]=> NULL ["follow_request_sent"]=> NULL ["notifications"]=> NULL } ["geo"]=> NULL ["coordinates"]=> NULL ["place"]=> array(9) { ["id"]=> string(16) "5ab538af7e3d614b" ["url"]=> string(56) "https://api.twitter.com/1.1/geo/id/5ab538af7e3d614b.json" ["place_type"]=> string(4) "city" ["name"]=> string(16) "Yokohama City Asahi Ward" ["full_name"]=> string(23) "Kanagawa Yokohama City Asahi Ward" ["country_code"]=> string(2) "JP" ["country"]=> string(6) "Japan" ["bounding_box"]=> array(2) { ["type"]=> string(7) "Polygon" ["coordinates"]=> array(1) { [0]=> array(4) { [0]=> array(2) { [0]=> float(139.488892) [1]=> float(35.440878) } [1]=> array(2) { [0]=> float(139.488892) [1]=> float(35.506665) } [2]=> array(2) { [0]=> float(139.570535) [1]=> float(35.506665) } [3]=> array(2) { [0]=> float(139.570535) [1]=> float(35.440878) } } } } ["attributes"]=> array(0) { } } ["contributors"]=> NULL ["is_quote_status"]=> bool(false) ["retweet_count"]=> int(0) ["favorite_count"]=> int(0) ["entities"]=> array(4) { ["hashtags"]=> array(0) { } ["urls"]=> array(0) { } ["user_mentions"]=> array(0) { } ["symbols"]=> array(0) { } } ["favorited"]=> bool(false) ["retweeted"]=> bool(false) ["filter_level"]=> string(3) "low" ["lang"]=> string(2) "ja" ["timestamp_ms"]=> string(13) "1490589667759" }
summary
Implementing OAuth authentication for the Twitter API was a pain, and although it's now commonplace, there were a lot of things that had to be implemented in the past, such as handling access tokens. However, it's nice to know that there is a library that allows you to obtain data that satisfies your needs in just an hour, not five minutes, no matter how much you struggle with it
That's all
0