We are the Capacity Management team in Amazon Elastic Cloud Computing (Amazon EC2). Amazon EC2 has revolutionized the way we obtain computing resources. Instead of acquiring hardware resources in house, companies now can run their systems in Amazon EC2 reliably, scale up and down dynamically, and only pay for the capacity they actually use.
The Capacity Management team’s job is to satisfy customers’ dynamic, often unpredictable computing needs by performing both online capacity management and offline capacity analysis and planning. Our challenge is to make the cloud appear to be infinitely elastic and instantly scalable, yet cost effective for the company.
As a member of the Capacity Management team, you will be at the forefront of this transformational technology with interactions with leading companies in this space and engineers within Amazon developing the capability. You will be surrounded by people that are smart, passionate about cloud computing, and believe that world class support is critical to customer success. You will provide capacity support to a global list of customers that are building mission-critical applications on top of EC2 and to internal teams for new product launches.
Every day will bring new and exciting challenges on the job while you:
- Learn and use groundbreaking technologies to allocate computing resources to the right places at the right time.
- Coordinate with Business Dev/Sales, Capacity Procurement and other internal teams to address customers’ capacity needs.
- Support capacity needs for new product launches.
- Leverage your extensive customer support experience to provide feedback to the engineering team and provide input (and potentially assistance) to develop tools to improve our customer's experience with our services
- Outright own customer issues until a resolution is achieved.
- Drive technical innovation and efficiency in infrastructure operations via automation
- Create processes that enhance operational workflow and provide positive customer impact
- Design systems management solutions using automation and self-repair rather than relying on alarming and human intervention
- Dive deep to resolve problems at their root, looking for failure patterns amenable to long-term solutions via simplification and automation
- Recognize and adopt best practices in documentation, testing, security, operational support at scale, and efficient use of resources.
- Develop appropriate metrics to demonstrate performance at improving operational efficiency