Aaron Stahl
Aaron Stahl
AOL
Lead Systems Engineer
Aaron Stahl is a Lead Systems Engineer in AOL's Advertising Technical Operations group, supporting AOL Platforms. Over the past 10 years, he has worked in various capacities and now leads the daily technical operations for the ONE by AOL: Display real-time and static bidding systems. While at AOL, Aaron has worked to develop architectures that are scalable, efficient, and meet the demands of an ever growing business. Most recently Aaron led the migration of AOL's RTB system into the public cloud. Outside of work, Aaron enjoys travel, building Lego sets with his kids, and spending time with his family.
Elastically Scaling Your Legacy

AOL Platforms' (AOLP) experience bringing legacy applications to Amazon Web Services in an elastically-scaling fashion. In our presentation, we would like to talk about how we have worked to move the AOLP Real-Time Bidding application stack from AOL datacenters into AWS regions across the globe.

Topics covered:

  • Operational solutions to cloud infrastructure problems
  • Infrastructure-as-code: full deployments via CloudFormation
  • Blue/Green deployments of full stack
  • Retrofitting service discovery into legacy stacks
  • Self-healing infrastructure
  • Live demo of RTB capacity buildout possible

Overview:

  • What is RTB? 22B real transactions/day (not talking CDN traffic here, but actual server requests)
  • Why is this a good match for elastic scaling?
  • What is our tech stack? Why couldn't we just use auto-scaling groups? Old static architecture from AOL Datacenters
  • Why were we doing this migration? Time-to-market for capacity increases, easy of regional deployments, cost savings with elastic capacity.

Evolved implementation:

  • Phase 1 - Basic cloudformation stacks
  • Phase 2 - Full AWS API management
  • Phase 3 - Increasingly autonomous system blocks for capacity
  • Phase 4 - Service discovery, full automation Lessons learned:
  • What is CloudFormation good for? What isn't it?
  • The AWS API - 1000 ways to die
  • Scaling logic for multi-tiered systems
  • The Need for Speed - how long until more capacity comes online?
  • Working within constraints - what to do when you have no dev resources?

Presented with Brian Reavey