{"id":627,"date":"2015-04-22T13:04:39","date_gmt":"2015-04-22T13:04:39","guid":{"rendered":"http:\/\/www.jamesrising.net\/blog\/?p=627"},"modified":"2015-04-22T13:04:39","modified_gmt":"2015-04-22T13:04:39","slug":"scripts-for-twitter-data","status":"publish","type":"post","link":"http:\/\/www.jamesrising.net\/blog\/?p=627","title":{"rendered":"Scripts for Twitter Data"},"content":{"rendered":"<p>Twitter data&#8211; the endless stream of tweets, the user network, and the rise and fall of hashtags&#8211; offers a flood of insight into the minute-by-minute state of the society.  Or at least one self-selecting part of it.  A lot of people want to use it for research, and it turns out to be pretty easy to do so.<\/p>\n<p>You can either purchase twitter data, or collect it in real-time.  If you purchase twitter data, it&#8217;s all organized for you and available historically, but it basically isn&#8217;t anything that you can&#8217;t get yourself by monitoring twitter in real-time.  I&#8217;ve used <a href=\"https:\/\/gnip.com\/\">GNIP<\/a>, where the going rate was about $500 per million tweets in 2013.<\/p>\n<p>There are two main ways to collect data directly from twitter: &#8220;queries&#8221; and the &#8220;stream&#8221;. Queries let you get up to 1000 tweets at any point in time&#8211; whichever the most recent tweets that match your search criteria. The stream gives you a fraction of a percent of tweets continuously, which very quickly adds up, based on filtering criteria.<\/p>\n<p>Scripts for doing these two options are below, but you need to decide on the search\/streaming criteria. Typically, these are search terms and geographical constraints. See <a href=\"http:\/\/ift.tt\/1cOCOYO\">Twitter&#8217;s API documentation<\/a> to decide on your search options.<\/p>\n<p>Twitter uses an athentication system to identify both the individual collecting the data, and what tool is helping them do it.  It is easy to register a new tool, whereby you pretend that you&#8217;re a startup with a great new app.  Here are the steps:<\/p>\n<ol>\n<li>Install python&#8217;s twitter package, using &#8220;easy_install twitter&#8221; or &#8220;pip install twitter&#8221;.<\/li>\n<li>Create an app at <a href=\"http:\/\/ift.tt\/1oHSTpv\">http:\/\/ift.tt\/1oHSTpv<\/a>. Leave the callback URL blank, but fill in the rest.<\/li>\n<li>Set the CONSUMER_KEY and CONSUMER_SECRET in the code below to the values you get on the keys and access tokens tab of your app.<\/li>\n<li>Fill in the name of the application.<\/li>\n<li>Fill in any search terms or structured searches you like.<\/li>\n<li>If you&#8217;re using the downloaded scripts, which output data to a CSV file, change where the file is written, to some directory (where it says &#8220;twitter\/us_&#8221;).<\/li>\n<li>Run the script from your computer&#8217;s terminal (i.e., <tt>python search.py<\/tt>)<\/li>\n<li>The script will pop up a browser for you to log into twitter and accept permissions from your app.<\/li>\n<li>Get data.<\/li>\n<\/ol>\n<p>Here is what a simple script looks like:<\/p>\n<pre class=\"brush: python; collapse: false; title: ; wrap-lines: false; notranslate\">\nimport os, twitter\n\nAPP_NAME = &quot;Your app name&quot;\nCONSUMER_KEY = 'Your consumer key'\nCONSUMER_SECRET = 'Your consumer token'\n\n# Do we already have a token saved?\nMY_TWITTER_CREDS = os.path.expanduser('~\/.class_credentials')\nif not os.path.exists(MY_TWITTER_CREDS):\n    # This will ask you to accept the permissions and save the token\n    twitter.oauth_dance(APP_NAME, CONSUMER_KEY, CONSUMER_SECRET,\n                        MY_TWITTER_CREDS)\n\n# Read the token\noauth_token, oauth_secret = twitter.read_token_file(MY_TWITTER_CREDS)\n\n# Open up an API object, with the OAuth token\napi = twitter.Twitter(api_version=&quot;1.1&quot;, auth=twitter.OAuth(oauth_token, oauth_secret, CONSUMER_KEY, CONSUMER_SECRET))\n\n# Perform our query\ntweets = api.search.tweets(q=&quot;risky business&quot;)\n\n# Print the results\nfor tweet in tweets['statuses']:\n    if not 'text' in tweet:\n        continue\n\n    print tweet\n    break\n<\/pre>\n<p>For automating twitter collection, I&#8217;ve put together scripts for queries (<tt>search.py<\/tt>), streaming (<tt>filter.py<\/tt>), and bash scripts that run them repeatedly (<tt>repsearch.sh<\/tt> and <tt>repfilter.sh<\/tt>).  <a href=\"http:\/\/ift.tt\/1G6EFqB\">Download the scripts<\/a>.<\/p>\n<p>To use the repetition scripts, make the repetition scripts executable by running &#8220;<tt>chmod a+x repsearch.sh repfilter.sh<\/tt>&#8220;. Then run them, by typing <tt>.\/repfilter.sh<\/tt> or <tt>.\/repsearch.sh<\/tt>.  Note that these will create many many files over time, which you&#8217;ll have to merge together.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Twitter data&#8211; the endless stream of tweets, the user network, and the rise and fall of hashtags&#8211; offers a flood of insight into the minute-by-minute state of the society. Or at least one self-selecting part of it. A lot of people want to use it for research, and it turns out to be pretty easy &hellip; <a href=\"http:\/\/www.jamesrising.net\/blog\/?p=627\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Scripts for Twitter Data<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[10,14,15],"class_list":["post-627","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-ifttt","tag-james-rising","tag-pro"],"_links":{"self":[{"href":"http:\/\/www.jamesrising.net\/blog\/index.php?rest_route=\/wp\/v2\/posts\/627","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.jamesrising.net\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.jamesrising.net\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.jamesrising.net\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.jamesrising.net\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=627"}],"version-history":[{"count":1,"href":"http:\/\/www.jamesrising.net\/blog\/index.php?rest_route=\/wp\/v2\/posts\/627\/revisions"}],"predecessor-version":[{"id":628,"href":"http:\/\/www.jamesrising.net\/blog\/index.php?rest_route=\/wp\/v2\/posts\/627\/revisions\/628"}],"wp:attachment":[{"href":"http:\/\/www.jamesrising.net\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=627"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.jamesrising.net\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=627"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.jamesrising.net\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=627"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}