Thursday, April 12, 2012

Memcaching the right stuff

I am hosting PewPew's scores on the Google App Engine. I chose the GAE because it was free for small websites that stay under a free quota. Unfortunately for me, they recently lowered the free quota, and I started having to pay the minimum fee of $9/month. It's not much, but I want PewPew to keep on working forever, and $9 per month forever amounts to an infinite amount of money, which I don't plan on having.

There were two areas where I was over the free quota: the database was too large (the free quota is 1GB), and I was using to much CPU for the database requests.

Reducing the database size

The database was taking more than 1GB because I never deleted any of the submitted scores/replay. I wrote an handler that erased the lowest scores. Erasing ~400k scores cost me approximately 4$.
If you have a database with tens of millions of entries, doing operations on them can become expensive!
The handler could only erase 1000 entries at a time before timing out, so I "automatized" its call.
Now, I only have ~100k entries left, taking 792 MB. Out of those, 491 MB are used by the replays, which amounts to 23.5 days of non-stop action.

Reducing the CPU usage

I thought optimizing the CPU usage was just going to be a matter of caching the queries. Turns out, there's a right way and a wrong way to do that.

Initially, the scores for the 8 game modes were generated this way:
    for level in levels:
      query_str = get_query(level)
      scores = db.GqlQuery(query_str)
      for entry in scores:
        #write the score, player_name

And to cache the results, I did this :
    for level in levels:
      query_str = get_query(level)
      scores = memcache.get(query_str)
      if scores is None:
        scores = db.GqlQuery(query_str)
        memcache.set(query_str, scores, 60 * 60 * 24)
      for entry in scores:
        #write the score, player_name


But doing this did not improve my CPU usage, and profiling my app showed that I was still doing a lot of database call, even though none appear in my code. I assume I was not caching the entries, only references to the results. Reading the entries still required doing some database calls:



The correct way to do things is to manually put in the cache the strings and numbers you need to store, not references to the entries in your database.
Once this was done, I got the expected 8 memcache calls and divided the response time by 40:


Now the players can get the scores instantly, and I am back to paying $0/forever.