We'd like to introduce you to firehose, a tool that allows you to easily use the Twitter Streaming API with PHP!

Hello,
I'm Mandai, the Wild Team member of the development team.

I'm not sure if Twitter is still popular or if it's already established, but I decided to try using
the Twitter Streaming API Writing a program from scratch would be a pain, so I tried using a convenient library, which turned out to be extremely easy, so I'd like to share it with you.

Obtain an access token and access secret for the Twitter API

Without this, nothing can get started, so it's best to get one quickly.
You can either create a dedicated account or use an existing user account.

you log in to Twitter and this page , you will see a link called "Create New App," so register from there.

The Streaming API has no limit on the number of times you can access it (or rather, since it's streaming, you just keep connecting and receiving data), so there's no need to worry about being banned for overdoing it.
However, if you retry multiple times when communication is not possible, you may be stopped, so you will need to create a program that handles reconnection and response monitoring.

It takes time to develop this, so this time we will introduce a library that will cover that aspect

 

Introducing Firehose, which makes it easy to use the Streaming API

phirehose is a PHP library that interacts with the Twitter Streaming API.

This provides classes that handle everything from authentication to connection and data retrieval, so all you have to do is copy and paste the sample and fill in the access key information, and run it from the console to retrieve the data

It's a public repository on GitHub, so you can use it immediately by git cloning it

# In a directory within the project, git clone https://github.com:fennb/phirehose.git

 

That's all for now, so I'd like to introduce a few more classes that correspond to the Streaming API endpoints

 

Public Streams

Public Streams is an API that retrieves data from the entire Twitter timeline.
It has two endpoints:

 

POST statuses / filter

POST statuses/filters are defined using the POST method, they also read GET parameters, making them a fairly user-friendly API.

Narrow down the tweets to be read using three criteria: Twitter user ID, keywords, and location information

 

follow (user ID)

User IDs to filter.
In Firehose, this is passed as an array from the setFollow() method.

 

track (keyword)

Keywords to filter.
In Firehose, pass them as an array from the setTrack() method.

 

location

Location information to filter.
Selection is made using a rectangle represented by two points in the bottom left and top right, so if you want to filter only within Japan, you will need to connect multiple locations.
In Firehose, this is passed as an array using the setLocations() method.

It must also always be a two-dimensional array

  1. Bottom left (southwesternmost) longitude
  2. Bottom left (southwesternmost) latitude
  3. Upper right (northeasternmost) longitude
  4. Upper right (northeasternmost) latitude

The argument is an array containing the values ​​in the order shown above.
Therefore, by setting multiple rectangles, it is possible to create an area that covers the entire country of Japan.

You can also filter to target only certain major cities

 

GET statuses / sample

GET statuses / sample is an API that retrieves a randomly selected small sample of all tweets on Twitter.
It only captures about 1% of the data, but it still seems to be a huge hit, so be careful.

This has no options

 

User Streams

This is an API that targets a single user and retrieves their timeline, profile updates, events, etc.
Streaming from an unspecified number of user accounts via this API could result in restrictions on connections from the same IP address, so we'll consider a different approach.

The User Streams API also gives you the option to include replies and retweets to the target user

* In phirehose, there is a dedicated method for the track option, but there does not seem to be a dedicated method for with and replies. However, since we did not use the User Streams API this time, we have not investigated how to set it up, so we will not mention it here (this is only a translation of the documentation provided by Twitter)

 

with (handling other accounts the user follows)

By default, "with=followings" is set, so data about the user and the users they follow will be included in the stream

"with=user" will only include data for the account user

 

replies (handling replies)

By default, only replies between users who are followers of each other are streamed, but you can receive all replies by specifying "replies=all"

 

track (additional tweets by keyword)

By setting up a track, you can include additional tweets that match your keywords in your stream

 

Site Streams

The Site Stream API is currently in beta (and has been for quite some time), and while the User Streams API only covers one user's timeline, it combines the timelines of multiple users into a single stream

As it is a beta version, there are various limitations, such as the following. Will it ever become GA?

  • A single connection can stream timelines (and profile updates and other data) for 100 users. Use Control Streams
  • Up to 25 connections per second. You must implement exponential back-off in case of errors due to overcalling etc
  • If you open more than about 1,000 connections, you'll need to coordinate testing and launch with the Twitter Platform team. (I looked up what the Twitter Platform Team means but couldn't figure it out. Is it a team within Twitter? I'm pretty skeptical, so don't take it at face value.)

 

Sample of JSON data that can be obtained

A sample of the retrieved JSON data looks like this.
Each piece of data is quite large, and a variety of data can be retrieved.

array(25) { ["created_at"]=> string(30) "Mon Mar 27 04:41:07 +0000 2017" ["id"]=> float(0.000000000000E+17) ["id_str"]=> string(18) "0000000000000000" ["text"]=> string(78) "xxxxxxxxxxxxxxx" ["source"]=> string(82) "xxxxxxxxxxxxxxxx" ["truncated"]=> bool(false) ["in_reply_to_status_id"]=> NULL ["in_reply_to_status_id_str"]=> NULL ["in_reply_to_user_id"]=> NULL ["in_reply_to_user_id_str"]=> NULL ["in_reply_to_screen_name"]=> NULL ["user"]=> array(38) { ["id"]=> float(0000000000) ["id_str"]=> string(10) "0000000000" ["name"]=> string(13) "xxx xxx" ["screen_name"]=> string(8) "xxxxxxxx" ["location"]=> string(27) "xxxxxxxxxxxxxxxx" ["url"]=> NULL ["description"]=> string(135) "xxxxxxxxxx" ["protected"]=> bool(false) ["verified"]=> bool(false) ["followers_count"]=> int(669) ["friends_count"]=> int(533) ["listed_count"]=> int(1) ["favourites_count"]=> int(2267) ["statuses_count"]=> int(3727) ["created_at"]=> string(30) "Fri Mar 20 09:23:52 +0000 2015" ["utc_offset"]=> NULL ["time_zone"]=> NULL ["geo_enabled"]=> bool(true) ["lang"]=> string(2) "ja" ["contributors_enabled"]=> bool(false) ["is_translator"]=> bool(false) ["profile_background_color"]=> string(6) "C0DEED" ["profile_background_image_url"]=> string(48) "xxxxxxxxxxxx" ["profile_background_image_url_https"]=> string(49) "xxxxxxxxxxxx" ["profile_background_tile"]=> bool(false) ["profile_link_color"]=> string(6) "1DA1F2" ["profile_sidebar_border_color"]=> string(6) "C0DEED" ["profile_sidebar_fill_color"]=> string(6) "DDEEF6" ["profile_text_color"]=> string(6) "333333" ["profile_use_background_image"]=> bool(true) ["profile_image_url"]=> string(74) "xxxxxxxxxx" ["profile_image_url_https"]=> string(75) "xxxxxxxxxx" ["profile_banner_url"]=> string(59) "xxxxxxxxxx" ["default_profile"]=> bool(true) ["default_profile_image"]=> bool(false) ["following"]=> NULL ["follow_request_sent"]=> NULL ["notifications"]=> NULL } ["geo"]=> NULL ["coordinates"]=> NULL ["place"]=> array(9) { ["id"]=> string(16) "5ab538af7e3d614b" ["url"]=> string(56) "https://api.twitter.com/1.1/geo/id/5ab538af7e3d614b.json" ["place_type"]=> string(4) "city" ["name"]=> string(16) "Yokohama City Asahi Ward" ["full_name"]=> string(23) "Kanagawa Yokohama City Asahi Ward" ["country_code"]=> string(2) "JP" ["country"]=> string(6) "Japan" ["bounding_box"]=> array(2) { ["type"]=> string(7) "Polygon" ["coordinates"]=> array(1) { [0]=> array(4) { [0]=> array(2) { [0]=> float(139.488892) [1]=> float(35.440878) } [1]=> array(2) { [0]=> float(139.488892) [1]=> float(35.506665) } [2]=> array(2) { [0]=> float(139.570535) [1]=> float(35.506665) } [3]=> array(2) { [0]=> float(139.570535) [1]=> float(35.440878) } } } } ["attributes"]=> array(0) { } } ["contributors"]=> NULL ["is_quote_status"]=> bool(false) ["retweet_count"]=> int(0) ["favorite_count"]=> int(0) ["entities"]=> array(4) { ["hashtags"]=> array(0) { } ["urls"]=> array(0) { } ["user_mentions"]=> array(0) { } ["symbols"]=> array(0) { } } ["favorited"]=> bool(false) ["retweeted"]=> bool(false) ["filter_level"]=> string(3) "low" ["lang"]=> string(2) "ja" ["timestamp_ms"]=> string(13) "1490589667759" }

 

summary

Implementing OAuth authentication for the Twitter API was a pain, and although it's now commonplace, there were a lot of things that had to be implemented in the past, such as handling access tokens. However, it's nice to know that there is a library that allows you to obtain data that satisfies your needs in just an hour, not five minutes, no matter how much you struggle with it

That's all

If you found this article useful, please click [Like]!
0
Loading...
0 votes, average: 0.00 / 10
1,377
X Facebook Hatena Bookmark pocket

The person who wrote this article

About the author

Yoichi Bandai

My main job is developing web APIs for social games, but I'm also grateful to be able to do a variety of other work, including marketing.
My portrait rights within Beyond are CC0.