Githubify

April 12, 20218 min read

One of my favorite features of Spotify is getting to check out what all my friends are listening to. Not only does it expose me to some deep cuts I might not check out otherwise, but I get to see exactly when my friends are listening to their guilty pleasures. Yes, Jamie, I saw you listening to the certified platinum album "My World 2.0" for 3 hours yesterday. No hate - it's a good album. But, what a great feature to have! Thank you, Spotify. What would we do without you?

Now, I take pride in my music - I'll just come out and say it. I think I have decent music taste and music is a big part of my life. So, naturally, I feel like I gotta have my GitHub followers know what I am listening to as well. For this reason, I set out to make a bot that would keep track of what I was listening to on Spotify, and whenever that song changed, it would update my GitHub bio to reflect that. It's pretty simple really. I decided to write this in Python since I already had so much boilerplate sitting around to work with the Spotify API. The GitHub API was new to me, but luckily I was able to find this neat little library called PyGithub that wrapped up the GitHub API in easy Python classes and functions. I will say, this post wouldn't be very interesting if it weren't for the annoying authentication process we need to follow to get access tokens to our Spotify accounts.

Now, OAuth isn't a new concept. It's the industry-standard process by which we grant applications permission to get access to our data. However, often, modern web applications and infrastructures give us the ability to manually generate access tokens for development purposes. Twitter allows this. Reddit allows this. GitHub allows this. Spotify, however, does not. For this reason, in addition to the bot, I needed to make a mini web application that could let me run through the OAuth flow and get those tokens and stuff them into a database so that our bot could use them too. If you've made a web app before you probably know where this is going... I needed an API.

I went on a few walks and pondered how I could set all this up, host it on Heroku, and make it extensible - all for free! No way was I going to shell out $7 a month for the dynos needed to run my dinky little bot. After some thought - here is what I came up with. And no, this definitely was not drawn just now on my Macbook using the trackpad.

githubify-architecture-very-advanced-very-nice
Githubify super advanced novel architecture

It ain't much, but it's honest work. It's a standard modern web app architecture, but the front-end will be served by the base route of the server since all it does is act as a gateway to get our Spotify tokens. And, as I mentioned before, I went with Heroku. It's easy. It's got a free tier. It comes with Postgres. It supports Docker natively. Leave me alone, alright. I won't harp over the app architecture since it really is quite simple. The server in the middle is just a flask server that serves a React app at the base URL. It's got one "/tokens" endpoint to fetch and update my Spotify tokens... and that's it! Pretty simple really. Here is the code with all exposed routes:


@app.route("/")
def base():
    return render_template('index.html')

@app.route("/tokens", methods=['GET', 'POST'])
def tokens():
    
    #
    # Check authentication
    #
    if request.headers.get('Auth') != os.environ.get('INTERNAL_TOKEN'):
        return jsonify({
            "message": "Invalid auth token."
        }), 401
    
    
    if request.method == 'GET':
        results = Tokens.query.all()
        tokens = {}
        
        for row in results:
            tokens[row.token] = row.value

        return jsonify({
            "tokens": tokens
        })
    
    if request.method == 'POST':
        # get the data from the UI
        data = request.get_json()['data']
        
        # update access token
        access_token = Tokens.query.get(1) # the access token ID
        access_token.value = data['access_token']
        db.session.commit()
        db.session.refresh(access_token)
        
        # update refresh token
        refresh_token = Tokens.query.get(2) # the refresh token ID
        refresh_token.value = data['refresh_token']
        db.session.commit()
        db.session.refresh(refresh_token)
        
        return jsonify({
            "message": "success"
        })

You can see there ain't much to her. The React app is built and served out at the base. Then I let the React app PUT and POST the /tokens endpoint to stuff them into the Postgres database once they are gathered by the UI. You can also see I went with SQLAlchemy to manage the database because, well, it's just so damn convenient! The only thing that might seem out of place is this little bit of code right at the top of the /tokens endpoint:


    if request.headers.get('Auth') != os.environ.get('INTERNAL_TOKEN'):
        return jsonify({
            "message": "Invalid auth token."
        }), 401

I realized after I set everything up that since the flask app is being served on Heroku, all the endpoints are exposed and technically anyone can read and write to the database if they really wanted to. That means that anyone can get my Spotify tokens if they were so inclined. Not that they could actually do anything with them except farm my email address, but still, I felt like I should protect those- cuz why not? So, I manually generated a 320-byte MD5 hashed "token" that I gave to my UI as an environment variable and my flask app as an environment variable. Now, any requests to that /tokens endpoint need to have "Auth: <MD5 Hash>" as a header - and it must match! Now only the UI can access those endpoints and we were secured!

Finally, I created the bot itself. I leveraged Heroku's cool clock processes to get this done. It sort of acts as a worker - but is powered by the Python apscheduler library. I wrote a small little process for the bot to run through that checks for a song change and updates the bio accordingly. This process is run every 5 seconds. This interval was set completely arbitrarily. I just wanted something that wasn't crazy fast but also would detect changes pretty quickly. I've been running this thing for a few days and no hiccups so far! Here is the abstracted away code for the bot:


def cycle():
    sp_at = dbdriver.get_access_token(Tokens)
    sp_rt = dbdriver.get_refresh_token(Tokens)
    
    most_recent_song_uri = dbdriver.get_most_recent_song_URI(Spotify)
    
    bot = GithubifyBot(
        os.environ.get("GITHUB_ACCESS_TOKEN"), 
        sp_access_token=sp_at,
        sp_refresh_token=sp_rt
    )
    if bot._sp._refreshed:
        print('------> Refreshed tokens')
        dbdriver.update_tokens(
            Tokens,
            bot._sp._access_token,
            bot._sp._refresh_token
        )
    
    try:
        current_track = bot.get_current_song()
    except AttributeError:
        print('------> Make sure to visit your app and authorize Spotify!')
    
    if current_track is not None:
        print('------> Listening to: ', bot._sp.uri_to_track_and_artist(current_track['item']['uri']))
        
        if current_track['item']['uri'] == most_recent_song_uri:
            print('------> No song change')
        else:
            dbdriver.update_most_recent_song(Spotify, current_track['item']['uri'])
            print('------> Song change... updating bio.')
            
            # update bio here with githubify bot
            bot.update_bio(current_track)
    else:
        print('------> No song playing')
    
    print('\n')

You can see that I'm using some prebuilt classes here like dbdriver and Githubify. I won't go into the details of those, but basically, this is what happens in the cycle:

  1. Fetch tokens from the database.
  2. Get the most recent listened to song from inside the database.
  3. Initialize the bot using the Spotify tokens and the GitHub token which is an environment variable.
  4. Check if we utilized the refresh token to get a new access token. Update database accordingly.
  5. If we are listening to a song check if it's what was in the database.
  6. If it is -> Do nothing!
  7. Otherwise update the database with this info and then update our GitHub.
  8. Repeat!

It really is that simple.

There was still one last problem to solve, however. I'm using free-tier Heroku dynos. These actually go to sleep if unused for a certain period of time. That is - if the web process doesn't get any requests it shuts down until a request is made to it and it spins up. This - in turn - causes the bot itself in the clock process to sleep as well. Luckily some absolute gentleman made a website called Kaffeine that will let you schedule requests to be made to your Heroku web dynos to prevent them from sleeping! How convenient... how is Heroku okay with this? So, I submitted my web app, and ever since it's never gone to sleep! I've been testing the bot for a few days and it seems to work really well! Very happy with the results.

github-profile-with-bio
Future Nostalgia should have won album of the year!

So, did I over-engineer this? Probably, yeah. Heroku is convenient and great to get things going fast, but it can come with some caveats and force you to design things in unique ways. For example, Heroku uses what they call an "ephemeral disk". Basically, within a 24 hour period, the dyno/server may randomly restart and any files that aren't a part of the original container will get destroyed and reloaded. This means that I can not use a local data storage system like SQLite or any other local storage files like .json or other plain text options as they will get wiped out once the dyno refreshes. Any data that we want to be persisted across server refreshes must live outside the server code (i.e. a database). So, this is why we stored our Spotify tokens in their Postgres database. If I had set this up on AWS or Google Cloud Platform, I most certainly would have gone with SQLite since it comes with python out of the box and is just so fast and easy. And I probably wouldn't have even used an ORM. But, at the end of the day, I think small projects like these can be good for me to keep practicing my engineering capabilities from front to back and working with every part of the application architecture. It's good to make me explore some new options and be up to date with the most current tech and stacks.

Now, thanks to this bot, you can always check out my GitHub profile and see exactly what I am listening to! Or - you know - you could just give me a follow on Spotify ¯\_(ツ)_/¯.