Category Archives: Software

Google Scholar Alerts to RSS: A punctuated equilibrium

If you’re like me, you have a pile of Google Scholar Alerts that you never manage to read. It’s a reflection of a more general problem: how do you find good articles, when there are so many articles to sift through?

I’ve recently started using Sux0r, a Bayesian filtering RSS feed reader. However, Google Scholar sends alerts to one’s email, and we’ll want to extract each paper as a separate RSS item.

alertemail

Here’s my process, and the steps for doing it yourself:

Google Scholar Alerts → IFTTT → Blogger → Perl → DreamHost → RSS → Bayesian Reader

  1. Create a Blogger blog that you will just use for Google Scholar Alerts: Go to the Blogger Home Page and follow the steps under “New Blog”.
  2. Sign up for IFTTT (if you don’t already have an account), and create a new recipe to post emails from scholaralerts-noreply@google.com to your new blog. The channel for the trigger is your email system (Gmail for me); the trigger is “New email in inbox from…”; the channel for the action is Blogger; and the title and labels can be whatever you want as along as the body is “{{BodyPlain}}” (which includes HTML).

    ifttttrigger

  3. Modify the Perl code below, pointing it to the front page of your new Blogger blog. It will return an RSS feed when called at the command line (perl scholar.pl).

    rssfeed

  4. Upload the Perl script to your favorite server (mine, http://existencia.org/, is powered by DreamHost.
  5. Point your favorite RSS reader to the URL of the Perl script as an RSS feed, and wait as the Google Alerts come streaming in!

Here is the code for the Alert-Blogger-to-RSS Perl script. All you need to do is fill in the $url line below.

#!/usr/bin/perl -w
use strict;
use CGI qw(:standard);

use XML::RSS; # Library for RSS generation
use LWP::Simple; # Library for web access

# Download the first page from the blog
my $url = "http://mygooglealerts.blogspot.com/"; ### <-- FILL IN HERE!
my $input = get($url);
my @lines = split /\n/, $input;

# Set up the RSS feed we will fill
my $rss = new XML::RSS(version => '2.0');
$rss->channel(title => "Google Scholar Alerts");

# Iterate through the lines of HTML
my $ii = 0;
while ($ii < $#lines) {
    my $line = $lines[$ii];
    # Look for a <h3> starting the entry
    if ($line !~ /^<h3 style="font-weight:normal/) {
        $ii = ++$ii;
        next;
    }

    # Extract the title and link
    $line =~ /<a href="([^"]+)"><font .*?>(.+)<\/font>/;
    my $title = $2;
    my $link = $1;

    # Extract the authors and publication information
    my $line2 = $lines[$ii+1];
    $line2 =~ /<div><font .+?>([^<]+?) - (.*?, )?(\d{4})/;
    my $authors = $1;
    my $journal = (defined $2) ? $2 : '';
    my $year = $3;

    # Extract the snippets
    my $line3 = $lines[$ii+2];
    $line3 =~ /<div><font .+?>(.+?)<br \/>/;
    my $content = $1;
    for ($ii = $ii + 3; $ii < @lines; $ii++) {
        my $linen = $lines[$ii];
        # Are we done, or is there another line of snippets?
        if ($linen =~ /^(.+?)<\/font><\/div>/) {
            $content = $content . '<br />' . $1;
            last;
        } else {
            $linen =~ /^(.+?)<br \/>/;
            $content = $content . '<br />' . $1;
        }
    }
    $ii = ++$ii;

    # Use the title and publication for the RSS entry title
    my $longtitle = "$title ($authors, $journal $year)";

    # Add it to the RSS feed
    $rss->add_item(title => $longtitle,
                   link => $link,
                   description => $content);
        
    $ii = ++$ii;
}

# Write out the RSS feed
print header('application/xml+rss');
print $rss->as_string;

In Sux0r, here are a couple of items form the final result:

sux0rfeed

Python SoundTouch Wrapper

SoundTouch is a very useful set of audio manipulation tools, with three powerful features:

  • Adjusting the pitch of a segment, without changing its tempo
  • Adjusting the tempo of a segment, without changing its pitch
  • Detecting the tempo of a segment, using beat detection

I used SoundTouch when I was developing CantoVario under the direction of Diana Dabby and using her algorithms for generating new music from existing music, using Lorenz attractors.  SoundTouch is a C++ library, but CantoVario was in python, so I built a wrapper for it.

Now you can use it too!  PySoundTouch, a python wrapper for the SoundTouch library is available on github!  It’s easy to use, especially with the super-cool AudioReader abstraction that I made with it.

AudioReader provides a single interface to any audio file (currently MP3, WAV, AIF, and AU files are supported).  Here’s an example of using AudioReader with the SoundTouch library:

# Open the file and convert it to have SoundTouch's required 2-byte samples
reader = AudioReader.open(srcpath)
reader2 = ConvertReader(reader, set_raw_width=2)

# Create the SoundTouch object and set the given shift
st = soundtouch.SoundTouch(reader2.sampling_rate(), reader2.channels())
st.set_pitch_shift(shift)

# Create the .WAV file to write the result to
writer = wave.open(dstpath, 'w')
writer.setnchannels(reader2.channels())
writer.setframerate(reader2.sampling_rate())
writer.setsampwidth(reader2.raw_width())

# Read values and feed them into SoundTouch
while True:
    data = reader2.raw_read()
    if not data:
        break

    print len(data)
    st.put_samples(data)

    while st.ready_count() > 0:
        writer.writeframes(st.get_samples(11025))

# Flush any remaining samples
waiting = st.waiting_count()
ready = st.ready_count()
flushed = ""

# Add silence until another chunk is pushed out
silence = array('h', [0] * 64)
while st.ready_count() == ready:
    st.put_samples(silence)

# Get all of the additional samples
while st.ready_count() > 0:
    flushed += st.get_samples(4000)

st.clear()

if len(flushed) > 2 * reader2.getnchannels() * waiting:
    flushed = flushed[0:(2 * reader2.getnchannels() * waiting)]

writer.writeframes(flushed)

# Clean up
writer.close()
reader2.close()