Programmers Force is a product-driven software company that excels in the field of Artificial Intelligence and Machine Learning since 2016. The company was founded by a team of visionary entrepreneurs that led its operations under software development, data science, DevOps, system architecture, big data processing, and blockchain-based applications development. We take pride in our diversified workforce with talent coming from top institutions of Pakistan and abroad. Our vision is to create innovative and intelligent business solutions through the development of smart web & mobile applications with a mission to support global industries in their day-to-day business challenges. Our specialised teams possess tacit knowledge of high-tech systems that enables us to tap businesses from more than 200 countries worldwide. This is just the beginning for us! We are in search of talented candidates with technical expertise who can add value to our fast-paced and work-intensive environment.
Tack of Programmers Force
Not only the way out but the best way out! No rather, no “one or two” but a must for all. Win-Win is the goal.
Programmers Force is looking for a Site Reliability Engineer (SRE). The SRE is responsible for meeting the agreed-upon SLOs for the Enterprise Imaging systems in their area of responsibility. They will plan all maintenance and deployment work, will ensure the improvement of tools and procedures necessary for the operation of the systems and will perform work such as troubleshooting, root cause analysis and some complex maintenance tasks themselves.
Roles & Responsibilities:
- Responsible for Toil Reduction, implementing identified improvement opportunities, and handling minor enhancement and non-ticketed activity.
- Define and monitor service level metrics that include incident management KPIs like MTTD, MTTR, MTBF, MTTF, Unavailability rate, Incident count, etc.
- Create rules to optimise incident response by metrics, streamlining alert flows, and collaboration and communication across squads.
- Proactively identify the issues that might disrupt the service in production
- Address incoming service requests to their support groups/Jira tool
- Create and maintain alerts
- Change validation or change planning-related requests
- Assist business stakeholders in determining SLO or adjusting threshold limits
- Demand and capacity management & make corrections to SLI/SLO threshold limits
- Gather and analyse metrics from both operating systems and applications to assist in performance tuning and fault finding
- Partner with development teams to improve services through rigorous testing and release procedures
- Participate in system design consulting, platform management, and capacity planning
- Create sustainable systems and services through automation and uplift
- Balance feature development speed and reliability with well-defined service level objective (SLO, SLI)
- Debug production issues across services and levels of the stack
- Monitoring and auditing the production operations and policies related to infrastructure
Education & Experience Requirements:
- Bachelor’s Degree in Software Engineering, Computer Science or related field
- Software engineering and task automation skills with Bash, Python
- Familiarity with the Agile software development lifecycle
- Deep background in Linux systems and engineering
- Experience supporting web applications running on Java / Apache / Tomcat in a live production environment
- Prior experience with DevOps tools (Git, Gitlab)
- Production-At-Scale support background in a heavily microservice-based world
- Hands-on engineering and ops expertise in containerization (Docker, Kubernetes/EKS, CNI, and Ingress networking)
- Experience working with Relational Databases such as MongoDB, Postgresql, SQL
- Exposure to application development, web UI (design and development), JSON, application architecture
- Experience strongly utilising observability tools (logging/APM) like Datadog, Cloud Watch, and PagerDuty.
- Experience in analysing and troubleshooting large-scale distributed systems
- Expertise in cloud native monitoring tools like Grafana, Kibana, and Prometheus.
- Lunch on the House
- Flexible Working Hours
- Payment for Overtime
- Annual Leaves
- Enjoy your weekend, we work on weekdays only
- Health Insurance
- Life Insurance
- Provident Fund
- Advance Salary
- Family Care
- Family Treat
- Personal Loan
- In House Trainings
- Surprise Gifts & Performance based Bonuses
- Performance based salary increment and promotion
- Gym & Indoor Gaming – Perfect balance between work and play
- Opportunity to engage in frequent local and international trips
- Child Education
- Marriage Allowance
- Maternity Allowance
- Home Allowance
- Hostel Allowance
- Travel Allowance
- Personal Growth – Learn the best from the best