Are you a technical leader with the drive and passion to help us make game-changing innovations in cloud technologies? Do you enjoy supporting and building high-performance distributed systems, and solving complex technical challenges in this space? Do you also have a track record of building strong operational excellence teams and inspiring them to create simple and effective solutions for customers?
Do you want to lead a team of support and systems engineers who manage critical infrastructure for internal and external customers of Amazon Web Services?
Amazon Web Services' Global Cloud Initiatives team is looking for a highly motivated leader to own and grow a Virginia-based team of highly skilled Support Engineers. This team is responsible to deploy and operate an “air-gapped” private community cloud for critical work load.
As the leader you will interact with the AWS GCI leadership team and operational leaders across AWS to understand how to best position the AWS Intelligence Induction team to deliver on current and future needs of each service teams to help continue to scale the AWS organization. You will be responsible for creation and delivery of recruiting and training plans that will closely measure and monitor all critical KPI’s to insure that the induction team is tracking in the right way to support the hiring demand from all service owners and provide AWS trained talent.
This role consists of owning and growing a team of skilled Systems Engineers providing 24x7 operational support for critical software infrastructure components upon which internal and external customers rely. Own and actively manage hiring and developing the best talent for this team along with keeping a high bar for the team consistent with Amazon leadership principles.
- You will need to quickly adapt to new development environments and changing business requirements, learn new systems, and find creative and scalable solutions to difficult problems are required.
- Recruiting and Hiring: You will take the lead in hiring quality personnel who not only fit the needs of the current organization but also will allow the team to scale with platform and service growth. You will coordinate with Amazon recruiting staff to evaluate potential candidates, participate in initial phone screens and provide relevant guidance and feedback during on-site interview loops. You will also be responsible for ensuring that proper training takes place for all new hires.
- Operational Excellence: As a manager within the Amazon Web Services team you will be expected to drive operational excellence in everything we do. This includes creating sane processes and procedures, then automating them to improve efficiency in our day-to-day tasks and projects.
- You will own the relationship with your customers and advocate with your software development stakeholders on their behalf to help shape the roadmap of their products.
- You will need to have the technical chops to advise and coach your team through the many challenges they will need to overcome to drive continuous improvement for their customers.
- You will need develop and continuously build upon the vision for your team, motivating your engineers in their relentless pursuit of excellence on behalf of your customers. *MVP221982
The AWS team is building and delivering the next generation of cloud computing that supports public AWS offerings like S3, EC2, and CloudFront. We are innovating new ways of building massively scalable distributed systems.
We have high standards for our computer systems as well as our employees: our systems are highly secure, highly reliable, highly available, and must function at massive scale; our employees are super smart, driven to serve customers, and fun to work with. On a “typical” day, support engineers might deep dive to root cause a customer issue, investigate why a metric is trending the wrong way, consult with the top engineers at Amazon, or discuss radical new approaches to automate operational issues. We are looking for a seasoned Support Engineer to join our energetic, fast-moving and passionate team.This is an opportunity to operate and engineer systems on a massive scale, and to gain top-notch experience in cloud computing. You'll be surrounded by people who are wickedly smart, passionate about cloud computing, and believe that world class service is critical to customer success. You'll become a master at AWS Services platform diagnosis, response, measurement, and automation. You will design and build the operational scalability that sustains the platform's insane growth. You will measure your success and it will be visible.
You should have or be most of the following:
- Experience running and maintaining a 24x7 Internet-oriented production environment, preferably across multiple data centers, involving (preferably) hundreds of machines
- Demonstrable expertise around specifying, designing, and/or implementing system health, performance monitoring tools, and software management tools for 24x7 environments
- A solid grasp of networking fundamentals, preferably including hands-on experience with load balancers, switches, routers, etc.
- Familiar with the challenges surrounding efficient operations and failure mode analysis in large complex distributed systems
- Ability to lead as a team technical leader on AWS Supported services and emulated by other Support Engineers. Act as a subject matter expert for one or more AWS Services in the Core Engineering team.
You will be expected to deliver on these kinds of things in the first six to twelve months on the job:
- Through participation in all phases of the development of a large distributed system; providing hardware, manageability, operability and performance perspectives on all aspects of the system
- Define and/or refine hardware requirements and selected designs, balancing raw up-front dollar cost with operability and TCO, from the data center infrastructure up specify and participate in the development and delivery of operability-related features such as system health monitoring, diagnostics, repair, and other self-healing automation
- Develop or further existing application and system management tools and processes that reduce manual efforts and increase overall efficiency
- Adapt and improve operations management systems and processes to accommodate rapid and increasing growth in systems and traffic
- Participate in the design and execution of production acceptance tests and new hardware evaluations
- Maintain fleet inventory management, including producing, maintaining, and evolving capacity plans for various components
- Monitor the health of the fleet, automating system health, maintenance tasks, and reporting systems as needed
- Perform various system maintenance tasks (your hands get dirty here), including configuration of new machines
- Manage directly assigned tasks and on-call duties gracefully
Successful candidates will join a world-class engineering team, provide troubleshooting and operations support, and innovate to replace operational tasks with scripts and code.