Systems Engineer

US-WA-Seattle
2 months ago
Job ID
543616
Amazon Web Services, Inc.
Position Category
Operations, IT, & Support Engineering

Job Description

AWS Tagging is a set of highly available and scalable service that will help customers organize, manage and control resources across the entire AWS enterprise.. We operate some of the largest distributed systems in the world, and we need smart people to help us make and keep them excellent.
To assist with that mission, Amazon Web Services is seeking an experienced DevOps Engineer to work as part of our engineering team in Seattle, Washington. Are you excited about complex automation, self-healing fleets, tight monitoring and health checks? Then, we should talk!
We are looking for a seasoned DevOps Engineer to join our energetic, fast-moving and passionate team. The ideal candidate will have experience and talent for solving complex problems of scalability and availability in massively distributed systems, working as an integral part of engineering team as a DevOps engineer responsible for automation at scale.
This is a unique opportunity to join a fast-paced team, help drive design decisions for every feature and help us in deploying, operating, monitoring and further scaling a massive always-on distributed system that is core to all of AWS.
In this role, you will:
  • Significant impact on design, development, testing, deployment and operation of these services end-to-end.
  • Draw from your deep and broad technical expertise to hire and mentor engineers, complete hands-on technical work and provide leadership on complex operational and design issues.
  • Be responsible for delivering automation for some of our most strategic technical projects in AWS and work on systems at the cutting edge of distributed storage and database technologies.
  • Have a significant bottom-line impact on our business and competitive position by accelerating expansion to new AWS regions.
  • Lead running and maintaining a 24x7 Internet-oriented production environment, preferably across multiple data centers, involving (preferably) thousands of machines.
  • Drive specifying, designing, and implementing system health, performance monitoring tools, and software management tools for 24x7 environments.
  • Be responsible for solving challenges surrounding efficient operations and failure mode analysis in large complex distributed system.
  • Develop and improve existing application and system management tools and processes that reduce manual efforts and increase overall efficiency.
  • Participate in the design and execution of production acceptance tests and new hardware evaluations.
  • Monitor the health of the fleet, automating system health, maintenance tasks, and reporting systems as needed.

Basic Qualifications

  • Bachelor's Degree in Computer Science or Information Systems, or equivalent
  • 3+ years of relevant, engineering experience
  • 2 year plus of development of systems management and administration automation in Perl, Python, Ruby, or Java
  • Experience running and maintaining a 24x7 Internet-scale production environment, across multiple data centers, involving (preferably) hundreds of machines.
  • Demonstrable expertise around specifying, designing, and/or implementing deployment, system health, performance monitoring tools, and software management tools for 24x7 environments.
  • Deep knowledge of the Linux operating system fundamentals, good understanding of networking concepts, load balancers, performance tuning.
  • Strong knowledge and experience with a compiled language.
  • Excellent written and verbal communication skills to facilitate efficient and effective interaction with peers and customers.

Preferred Qualifications

The following qualifications are preferred:
· Experience using AWS is a plus
· Experience with agile software development practices
· Experience with service-oriented architecture and web services.
· Experience with very large, high-throughput distributed systems
· Able to prioritize and perform in complex, fast-paced situations
Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.
Share on your newsfeed