Amazon Web Services is a dynamic and rapidly growing business within Amazon.com. We are building some of the largest and most complex distributed systems in the world, and we need world class people to help us implement and operate them.
We provide organizations with building block web services that allow them to innovate faster and operate their software more cost-effectively. These services-in-the-cloud include on-demand compute capacity, storage, content delivery, querying of structured data, message queuing, and more.
The AWS Identity & Access team is building and delivering the next generation of cloud computing security that supports the public AWS offerings like S3, EC2, and CloudFront. We are innovating new ways of building massively scalable distributed security systems involving identity management, federation, web services security, single sign-on, and much more. We enable our customers to control some of the most sensitive secrets on the Internet.
We have high standards for our computer systems as well as our employees: our systems are highly secure, highly reliable, highly available, and must function at massive scale; our employees are super smart, driven to serve customers, and fun to work with. The successful Support Engineer does much more than plug computers together or track changes. They are instrumental in deploying, operating, and scaling a massive always-on distributed system that is core to all of AWS. We are looking for a seasoned Support Engineer to join our energetic, fast-moving and passionate team.
You should have or be most of the following:
- Experience running and maintaining a 24x7 Internet-oriented production environment, preferably across multiple data centers, involving (preferably) hundreds of machines
- Demonstrable expertise around specifying, designing, and/or implementing system health, performance monitoring tools, and software management tools for 24x7 environments
- A solid grasp of networking fundamentals, preferably including hands-on experience with load balancers, switches, routers, etc.
- Familiar with the challenges surrounding efficient operations and failure mode analysis in large complex distributed systems
You will be expected to deliver on these kinds of things in the first six to twelve months on the job:
- Write and review accurate and complete support procedures, system documentation, and issue tracking entries
- Conduct monitoring and troubleshooting of Unix/Linux systems running AWS Identity and Access software
- Deploy new or scale existing systems and software using automated build and deployment tools
- Configure monitoring/alarming for new and existing systems using automated configuration tools
- Provide 24x7 on-call support during assigned periods