About the job Platform Engineering LeadKey Roles and Responsibilities:
Technical Leadership and Strategy: Lead the development and execution of the organization's platform engineering strategy. Provide technical leadership and guidance to the platform engineering team, ensuring alignment with business goals.
Team Management: Manage and mentor a team of platform engineers, fostering a collaborative and high-performance team culture. Conduct regular performance assessments, set goals, and provide constructive feedback.
Infrastructure Architecture: Oversee the design and implementation of scalable and reliable technology platforms. Collaborate with cross-functional teams to understand business requirements and contribute to architectural decisions.
Cloud Platform Management: Provide expert-level guidance in managing and optimizing cloud resources on platforms such as AWS, Azure, or Google Cloud.
Implement and enforce advanced best practices for cloud
architecture, security, and cost optimization.Automation: Lead the development and maintenance of advanced automation scripts using tools like Ansible, Terraform, or similar technologies. * Orchestration Excellence: Implement robust orchestration processes to automate complex deployment and configuration workflows.
Security and Compliance Leadership: Establish and enforce comprehensive security measures and compliance standards for infrastructure components. Lead the organization's efforts in maintaining a secure and compliant technology environment.
Collaboration and Stakeholder Management: Collaborate with senior leadership, product teams, and other stakeholders to align platform engineering initiatives with business objectives.
Act as a key liaison between the platform engineering team and other business units.I ncident Response: Lead incident response activities, participate in complex troubleshooting efforts, and contribute to the resolution of critical system issues. * Problem Resolution: Proactively identify and address potential challenges to ensure system stability.
Application lifecycle management: Lead the creation and maintenance of processes, procedures and governance. Ensure compliance requirements are met. Continuous maintenance and improvement of the environment. Evaluation (initiate or participate) of new solutions. Ensure that all solutions are on supported versions, software is fully licensed and that upgrades are planned for timeously.
Collaboration: Collaborate with various teams (cross-functional, software, platform, security, data, architecture and DevSecOps) to ensure successful delivery of outcomes.
Application Performance Monitoring: Improve on existing alerts and reports and implement additional appropriate alerts and reports to ensure all critical systems and processes are monitored for availability and reliability. Ensure issues are identified and attended to ensure no impact to system consumers.
Qualifications:
Bachelor's or Master's degree in Computer Science, Information Technology, or a related field.
10+ years of experience in platform engineering, system administration, or a related role.
Proven experience managing and leading a team of platform engineers.
Expert proficiency in scripting languages (e.g., Python, Bash) and extensive experience with automation tools.
In-depth knowledge of cloud platforms (AWS, Azure, or Google Cloud) and advanced cloud services.
Strong expertise in containerization technologies (e.g., Docker, Kubernetes).
Proven leadership skills with the ability to guide technical teams and collaborate with other stakeholders.
Preferred Skills:
Certification in cloud technologies, platform engineering, or related domains.
Experience with advanced CI/CD pipelines, infrastructure as code, and version control systems (e.g., Git).
Deep understanding of cybersecurity principles and practices.Advanced knowledge of networking concepts and protocols.Strong SRE skillsExperience in the use and management of Vendor packaged software, including patching and upgrades. * Experience with monitoring and observability architecture, tools, deployment, maintenance of multi-vendor tool suite environments, compliance with published SLA's, including but not limited to: Dynatrace, Canary Checker, Loki, Grafana, Prometheus, Cloudwatch.
Experience in working with application & engineering teams to develop requirements that define monitoring and interpret alerting, notification and escalation requirements for managing the end user experience, assist with fault isolation, and deliver proactive environment health management analysis and reporting.