Stream Data from Twitter API with OAuth using Kettle


Streaming data from Twitter Api is really important from the data analytic perspective. Getting the pulse of your user community on the web and across different geographics gets really important in terms of making big decisions. Pentaho Kettle does provide you with few steps to read or stream data from Twitter. In fact there is already a sample example present in the installation directory of the PDI on the twitter. But that sample code might not work due to changes in the Authentication system of twitter api’s. Currently Twitter uses OAuth now for the third party users to access the data.

So in this blog will share few steps to actually stream the twitter data using OAuth:

STEP 1: Register an Application in Twitter (if you haven’t done yet):

Very first step is to register an application on Twitter. Click on this link and register yourself an application.

STEP 2: The Authentication details of the App:

Once you have registered your app on twitter, you will find few details shown. Check images below:

p5 t6

In the above images, my application name is : EnigmaRishu and twitter provides with various keys and access tokens. These keys and tokens are required when calling the twitter api from PDI in the request header.

STEP 3: Building a Transformation:

CaptureDefine Parameters: This data grid step is where we would define all the parameters required as a part of the authentication process. The various parameters are documented on the twitter developer space. The token values and key values needs to be as per your registered application and it differs user wise.

Generate Header: This is a JS Step which includes codes developed by Paul Johnston (A JavaScript implementation of the Secure Hash Algorithm, SHA-1) and Netflix to handle the encryption of the keys and token. All we need to do is to pass the secure data to the header along with the query.

Calling Twitter REST Client: This step called a REST Client and we are actually using REST API of Twitter to read the twitter streams. The URL would be the combination of twitter search url and the query string. Check the sample code (as below) for more details.

Output – Twitter Result: The final search result and all the twitter. The raw format of the data is in JSON. If you want to further analyze the JSON format, use the steps like JSON Input file to analyse each section of the data.

Having said, twitter limits the streaming data for security reasons. More details here in the official document.


PDI Sample code:

The sample PDI as explained above is in the github repo here.

Advertisements

12 thoughts on “Stream Data from Twitter API with OAuth using Kettle

  1. Hello Rishu,
    Thanks for putting this together and sharing! Unfortunately I keep getting errors like this (PDI 6.01, Sun JDK 7):

    2015/12/23 17:33:14 – Generate Header.0 – TypeError: Cannot call method “indexOf” of null (script#659)

    Any ideas on how to work around this? I’ve made the logging more verbose and added some additional debug logging which shows the URL is constructed correctly, but in some cases when the function ‘getBaseString’ is called, message.action is empty…

    Like

      1. I registered my app, but there are no values shown for oauth_nonce/signature/timestamp. Can’t see that in your screenshots too.

        Like

  2. Hi, @Rishu. Im’ using Spoon 5.4.0.1 with api.twitter.com v1.1. The problem is 2016/06/28 13:10:55 – Calling Twitter REST Client.0 – Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target.
    I’ve already generated the certificate to api.twitter.com and updated cacerts in JAVA_HOME. It seems like it has done nothing to Spoon.
    Do you know why is this happening? Thank you so much.

    Like

  3. Hi, @Rishu. Im’ using Spoon 5.4.0.1 with api.twitter.com v1.1. The problem is 2016/06/28 13:10:55 – Calling Twitter REST Client.0 – Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target.
    I’ve already generated the certificate to api.twitter.com and updated cacerts in JAVA_HOME. It seems like it has done nothing to Spoon.
    Do you know I is this happening? Thak you so much.

    Like

  4. Hi Rishu,
    Thank you or this post !
    I’ve used you’re workflow to connect magento 2 oauth 1.0 API with little modifications, but unfortunatelly I’ve got an “Invalid signature” response; same request & parameters in Firefox RET client works fine…
    Did you face this problem during your build?
    Thks, Thomas

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s