ROR File Uploading Using EC2, S3, and MERB!
Here little Merbie Merbie…
I can’t help to think of a cute little furry animal when I say the word Merb, but don’t be fooled. These guys may be small and perceivably harmless, but they can kick Ruby on Rails’ butt in the file uploading department.
You see, RoR has this small problem, your Mongrel instances will get p0wned while an upload is happening. So if you have three people uploading to your three mongrel website, no other page request will be served until atleast one of your mongrels get freed up again.
Enter Merb. Merb is a pocket-framework. It is a project of Ezra Zygmuntowicz and was designed to be very light-weight. Part of his goal in designing it was too solve the Rails uploading problem. Merb will allow you to do multiple file uploads on one mongrel instance. I’m not exactly sure how it works, but the Mongrel instances don’t get snatched while the user is uploading there 10 megabytes of pictures, freeing mongrel up for serving other users.
On my start-ups website, there will be a ton of photo uploading, which means a lot of image processing, which means extensive CPU, hard-drive, and bandwidth requirements. This could be a nightmare for anyone on a tight budget. Thank goodness for the wonderful utility computing services of Amazon’s Elastic Computing Cloud (EC2) and Simple Storage Service (S3).
We realized that as we approach launch, we better find a good solution to handle our dependency on user photo uploading.
Merb Meet EC2
Andrew and I decided to completely outsource our image processing and let our wonderful RoR server at SlingShot Hosting do what it does best, serve up pages. We already were using S3 for our photo storage, and it turns out that EC2 can save to S3 almost instantaneously. Now all I needed to do was figure out how to get file uploads over to EC2 and then get it to redirect back to my slingshot server.
Setting Up Merb
I downloaded Merb and got the sample application with it to understand the directory structure. Ezra did a great job in making it feel Rails-like. I copied my ruby controller actions and the AWS::S3 API code over to Merb. I first tested all this locally on my box as any code programmer would do. Merb, by default, runs on port 4000. So in the post request of my file upload, I changed the url to “localhost:4000″ and sent the file over. After the controller action was done, I used the Merb’s redirect command and redirected back to localhost:3000 (where my Rails app was running)
Setting Up EC2
This part was sort of tricky. We signed up for an EC2 account over at amazonaws.com. We set up one of Amazon’s default EC2 images which contained Fedora Core 4. We went on to install all the wonderful gems (includingMERB) along with apache 2 and other Linux goodies.
Ezra told me that he was able to deploy Merb with Capistrano. We tried to set this up, but I’m not familiar enough with Capistrano to get it working, so I gave up on that for now. Instead, I just used ansvn+ssh command to export the code to my server and generated a seperate script to restart the Merbs along with apache.
Each Merb process took about 30-40 megabytes of space. With 1.5 GB of ram on each EC2 instance, this means we could run well over 20 MERB instances without problems.
The Real Bottleneck
Many of you are probably thinking that having to upload a file to S3 every upload action could cause a bottleneck. Ah, but you are wrong! If you use any bandwidth monitoring utility, you will find that Ec2 can send to S3 almost as fast as aharddrive can write to your computer because it is all in Amazon’s internal network. Plus, Amazon alots unlimited bandwidth between EC2 and S3 which makes for a great setup.
So what’s the bottleneck you might ask? Well, the ~1.5 Ghz processor will halt your uploads to a slow pathetic crawl when it has to process your images and resize them. The trick is to make this part of the upload process as light as possible. My good hosting company head guru, Charles Brian Quinn, told me to port over Mini-Magick*. I noticed an approximate two-fold increase using MM, but the really good news is that my memory problems are virtually non-existent. I would recommend to stay away fromRmagick for a production site unless you want run GC .start a lot or you have a ton of memory, or don’t have much traffic. In any case, you aren’t important and don’t matter, but I digress. =)
Conclusion and Testing
You may want to use a linux utility like top to watch the merb-ies and the mogrify processes jump to the top of top (no pun intended) as user’s upload. Seebq (Charles Brian Quinn) has a great article on using HTTPERF to test how many requests your Merb server can handle. I would suggest to create a test action in Merb, that pulls an image off the EC2 harddrive and manipulates it the way you need and then sends it to S3. Use a variety of pictures sizes to understand how it affects the performance ofMerb.
Disclaimer: This method is one of the best bang-for-the-buck methods to outsource image uploading for a production site. It costs ZERO upfront dollars and you pay for use. And hey, if you got a problem with it, get some smart nerdy coder to duplicate the image and spread the load amongst multiple EC2 instances. Now where’s that coder? *cough* We are hiring. *cough*
* Mini-Magick – A ruby wrapper for ImageMagick command line. MiniMagick gives you access to all the commandline options ImageMagic.


Great idea – love the redirect workaround – was wondering if you implemented this and if so how you dealt with session/authentication?
[...] Posted By mdd at Fri Sep 07 17:12:49 UTC 2007 ROR File Uploading Using EC2, S3, and MERB [...]
TweetBar » Blog Archive » posted by the post gnome said this on October 23, 2007 at 10:36 am
The session was dealt with via a callback to the mainserver using a simple REST call
Good info. I wondering though about the part you mention of having multiple merb instances per server (you mention 20). I thought that Merb was multi threaded so you wouldn’t need that many multiple instances of it? (Except maybe to take advantage of multi core CPUs).
Another note on session handling between rails and merb on the same host — not so relevant to EC2 and this example, but for those of you wondering — probably the easiest way i have found to *share* session data between the apps is to use memcached.
Store your data inside memcached with the cookie/session key as the key. once your inside merb you can access it the same way by grabbing the cookie and fetching the vals from memcached. its lightning fast. Make sure your rails and merb apps are on the same subdomain so you can access the cookies (use rewriting in apache, nginx, lighty or whatever flavour your server is)
Awesome information, thanks for writing on it. I believe Zed Shaw, creator of Mongrel, has stated that image uploading in Mongrel is buffered, and therefore, the request is only locked once the uploaded data is complete, and not for the entire request. None of the less, Merb looks very promising and I think it’s going to be a major player in the Ruby web world.
Great writeup. I just made the move from Rmagick to ImageScience (http://seattlerb.rubyforge.org/ImageScience.html), which cuts your image processing overhead dramatically, even over MiniMagick. Then you’ll be flyin on EC2!
where is the code man?
Who said file uploading is working in merb…
Merb dont have any file upload plugin which is perfectly working..
Rails have good plugins for file uploading.
If ur sure merb works for file upload send me the link and send me the code
FIY Cool there is a dm-paperclip plugin that works just fine with merb. I have succesfully implemented a simple app with merb handling multiple file uploads. Now I am trying to integrate the MERB APP with my existing RAILS APP, having an hard time tough trying to use merb-auth to let users authenticate by using the same mysql db. Merb works just fine out of the box, I think that the harder part (as it is for me at the time being) is integrating it with no pain with an existing RAILS APP.
Also take a look at mojo_magick (http://www.misuse.org/science/2008/01/30/mojomagick-ruby-image-library-for-imagemagick/) – it’s based on MiniMagick but doesn’t use temp files to process images, using memory only where possible (though ImageMagick can still swap to disk as needed). This results in faster file processing – radically faster when resizing lots of small images in place. Nice write up!
Good article. One question though: How did you manage to keep the session between rails and merb, to figure which user uploaded which image?
nevermind… the comments covered that