The Amazon Finance Technology (FinTech) team is looking for a Sr. System Engineer to drive growth and stability in our financial systems platforms which run on Linux. You will support critical business functions for customers across the world while meeting high up-time SLAs and ensuring robust system performance. You will discover innovative ways to automate and scale our infrastructure as we expand globally. You will work together with the Hyperion Financial Systems, and Oracle E-Business Suite teams, with plenty of opportunities to learn and grow.
You’re perfect if you possess that rare mix of depth of Development, Networking, Systems Engineering, and Customer Obsession. You’re right for the job if you're comfortable with deep technical O/S, networking, and distributed architectures. You'll excel if you have enthusiasm for digging deep and a flare for sharp technical communication, prioritization and organization. In addition to providing top-tier management and support of FinTech’s vast infrastructure, Sr. Systems Engineers are expected to develop best practices, refine operational procedure and constantly think pro-actively and with innovation.
You should have or be most of the following:
- Experience running and maintaining a 24x7 Internet-oriented production environment, preferably across multiple data centers, involving (preferably) at least thousands of servers.
- Demonstrable expertise around specifying, designing, and/or implementing system health, performance monitoring tools, and software management tools for 24x7 environments.
- A solid grasp of networking fundamentals, preferably including hands-on experience with load balancers, switches, routers, etc.
- Familiar with the challenges surrounding efficient operations and failure mode analysis in large complex distributed systems.
You will be expected to deliver on these kinds of things in the first six to twelve months on the job:
- Develop or further existing application and system management tools and processes that reduce manual efforts and increase overall efficiency.
- Through participation in all phases of the development of a large distributed system, provide hardware, manageability, operability and performance perspectives on all aspects of newly architected platform and potentially its dependencies.
- Define and/or refine hardware requirements and selected designs, balancing raw up-front dollar cost with operability and TCO, from the data center infrastructure up specify and participate in the development and delivery of operability-related features such as system health monitoring, diagnostics, repair, and other self-healing automation.
- Adapt and improve operations management systems and processes to accommodate rapid and increasing growth in systems and traffic.
- Participate in the design and execution of production acceptance tests and new hardware evaluations.
- Maintain fleet inventory management, including producing, maintaining, and evolving capacity plans for various components.
- Monitor the health of the fleet, automating system health, maintenance tasks, and reporting systems as needed.
- Perform various system maintenance tasks, including configuration of new machines.
- Manage directly assigned tasks and on-call duties gracefully.