Senior Manager, Developer Operations

Amazon Corporate LLC
Systems, Quality, & Security Engineering
The AWS Developer and Management Tools organization is looking for a strong operational leader to head up its Availability, Operations, and Reliability team.

In this role, you would lead the systems and support engineering teams for a large, geographically distributed software engineering organization that is responsible for many widely adopted and critical AWS products, include AWS CodeCommit, AWS CodeDeploy, and AWS CodePipeline. The team is also responsible for the core development tools used by all of Amazon’s software engineering teams, including, Kindle, Echo, Prime Video, Prime Air, AWS, and many more. In this role, you would be building multiple strong operationally minded engineering teams, driving scalable mechanisms for automation, outage prevention, and root cause resolution, and leading large-scale programs around system availability, security, cost reduction, and operational efficiency.

A B.S. in Computer Science or five years’ equivalent experience in a large-scale enterprise environment is required. Experience in systems and network administration is also highly recommended. The successful candidate will have a proven track record of success in delivering complex projects, including coordinating and driving issues to resolution autonomously utilizing excellent project management skills.

Basic Qualifications

You should have or be most of the following: 

• Experience running and maintaining a 24x7 Internet-oriented production environment, preferably across multiple data centers, involving (preferably) at least hundreds of machines
• Demonstrable expertise around specifying, designing, and/or implementing system health, performance monitoring tools, and software management tools for 24x7 environments
• A solid grasp of networking fundamentals
• Deep knowledge of one or more Linux distributions
• Software development experience of systems management and automation in a Perl, Python, Ruby, and/or Java
• Familiar with the challenges surrounding efficient operations and failure mode analysis in large complex distributed systems

You will be expected to deliver on these kinds of things in the first six to twelve months on the job:
• Through participation in all phases of the development of a large distributed system, provide hardware, manageability, operability and performance perspectives across multiple AWS services
• Define and/or refine hardware requirements and selected designs, balancing raw up-front dollar cost with operability and TCO, from the data center infrastructure up
• Specify and participate in the development and delivery of operability-related features such as system health monitoring, diagnostics, repair, and other self-healing automation
• Develop or further existing application and system management tools and processes that reduce manual efforts and increase overall efficiency
• Adapt and improve operations management systems and processes to accommodate rapid and increasing growth in systems and traffic
• Participate in the design and execution of production acceptance tests and new hardware evaluations
• Maintain fleet inventory management, including producing, maintaining, and evolving capacity plans for various components
• Monitor the health of the fleet, automating system health, maintenance tasks, and reporting systems as needed
• Manage directly assigned tasks and on-call duties gracefully

Primary Qualifications
• BS Computer Science or other technical degree and/or related experience
• 5+ years of *NIX system administration experience
• Development experience in Perl, Python, and Java

Preferred Qualifications

• Experience with systems management or monitoring software
• Automation or monitoring framework experience, deployment or development
• Experience with very large distributed systems such as large scale distributed database systems, storage farms, and/or horizontally scaled request processing fleets
• Experience with hardware load balancer administration, network optimization, or other related and demonstrable TCP-level experience.

Amazon is an Equal Opportunity-Affirmative Action Employer – Minority / Female / Disability / Veteran / Gender Identity / Sexual Orientation.