Lessons Learned in Lambda
We recently finished up a project that uses AWS Lambda. The project needed to crunch large sets of numbers in a small amount of time and Lambda turned out to be a good fit for doing exactly that. But, it involved some trial and error. Here are a few lessons we learned along the way.
Don’t Fail Silently
Runtime exceptions can be difficult to hunt down manually in Lambda’s CloudWatch log streams. At a minimum you will want to set up CloudWatch alerts to notify you when an exception occurs. Without this you can only see that an error occurred, but not where it occurred, or why.
For detailed error visibility, you can configure an external service like Honeybadger or Rollbar to capture errors and send notifications. In our case we already leveraged Honeybadger for the Ruby on Rails front-end, so we configured the Lambda application to send its exceptions to that service.
Constraints Are Everywhere
Constraints lurk around every corner when building Lambda applications. We ended up butting heads with several of them along the way.
Lambda limits the size of a deployment package to 50 MB. That may seem like a high ceiling, but it’s easy to bump into that constraint if you’re not diligent about your dependency size.
Developers can build Lambda applications in Java, Node.js, Python, and C#. We chose to use Python because it gave us access to number-crunching libraries like NumPy and SciPy. However, when we added SciPy as a pip dependency, it bloated the deployment package size, putting it well beyond the 50 MB size constraint.
Our solution was to copy the handful of functions we were using from SciPy directly into our package code. That’s not the cleanest solution, but it slimmed our deployment package down considerably.
Lambda functions limit how much data you can pass into them. Functions invoked as a RequestResponse (blocking Lambda function calls) allow up to 6MB of data in the request payload. And functions invoked as Event (asynchronous Lambda function calls) allow only 128K.
Initial iterations of our app passed results from a Lambda function as arguments for the next Lambda function. However, that broke down quickly in cases where we used Event invocations. The 128K constraint was too small for the megabytes of data we were trying to pass from one function to the next.
To avoid this constraint, we ended up moving calculated results into Redis via ElastiCache. Then we refactored our Lambda functions to accept Redis keys as arguments, and retrieve the data from Redis with those keys instead of passing the data in directly.
Being aware of Lambda’s constraints ahead of time will save you the headache in the long run. Take them into consideration when designing an application for Lambda.
Build for the Right Architecture
We develop on macOS machines at Collective Idea, and this caused trouble when deploying any dependencies that are not written in pure Python. Both NumPy and SciPy leverage native C binaries to optimize crunching large sets of numbers. These binary libraries are incompatible with Lambda because it requires binaries compiled for the Amazon Linux architecture. A developer must build these libraries on an EC2 instance running Amazon Linux to use them on Lambda. This was a significant slowdown to our deployment workflow.
Thankfully, some popular binary Python libraries are already pre-compiled for Amazon Linux and packaged as pip dependencies in the lambda-packages project. Using this package made it simple for us to run Python binary code compiled for macOS during development. And by swapping out the requirements.txt file, create a working production deployment package that ran in Lambda.
Optimize for Concurrency
Lambda’s sweet-spot is performing small, short-running tasks. That’s why the limit of a running Lambda function is five minutes. Lambda functions running longer than five minutes halt, and it’s the application’s responsibility to respond to the shut down.
Since our application needed to process a large set of small calculations in a short amount of time, we opted to execute these small calculations in parallel.
Our first attempt at executing Lambda functions concurrently was to invoke a Lambda function for each atomic set of calculations that we needed to process. A single Lambda function would get triggered through a RESTful API as our starting point, which spawned as many Lambda functions as atomic calculations needing execution. Our thought process here was that executing each atomic calculation on a single concurrent Lambda function will minimize the time taken to calculate the entire set.
Instead, the initial Lambda function that spawned Lambda functions for the atomic calculations took so long to execute the aggregate set, that it hit the five minute timeout consistently.
Additionally, the technique caused Lambda to drop calculations as it limits the number of concurrently running Lambda functions to 100. Once our app hit that ceiling, Lambda simply skipped any additional calls to execute new Lambda functions.
This method also wasted money. Each atomic calculation’s runtime of 50 ms was shorter then Lambda’s minimum billable runtime of 100 ms. The application was leaving money in the cloud.
To eliminate the dropped calculations and to make sure we were maximizing our billable execution time, we tried a different approach. Still interested in making our application concurrent, we were able to utilize running concurrent Lambda functions and threading within each Lambda function.
We grouped these atomic calculations into batches of around 100, and ran each batch in their own thread. That extended the runtime of each Lambda function to around seven seconds, but greatly reduced the number of concurrently running Lambda functions by a factor of 100. This effectively raised the amount of data that we c,ould process without dropping calculations. We also broke parts of the initial Lambda function into smaller steps, called in sequence to further reduce its runtime.
Planning ahead will save you headaches, especially if you are new to Lambda. Understanding the behaviors and constraints of Lambda will help you work efficiently and deliver your product on time.
This blog post was also co-authored by Kyle Magnuson
Kyle was born and raised in Chicago. He moved to Holland after graduating from Hope College with a BS in Computer Science and a BA in Management. Kyle first became interested in programming after taking engineering classes in high school, and enjoyed working on Android and iOS apps for Hope College.
He is excited to live and work in Holland for the good beaches, good beer, and good people. In his free time, he enjoys playing golf, tennis, and guitar.