Salesforce.com, Inc Lead Site Reliability Engineer in Hyderabad, India
Job CategoryProducts and Technology
Lead - Site Reliability Engineer
Salesforce is seeking an engineering candidate to join the Site Reliability organization in Hyderabad. Working closely with counterparts in the Infrastructure and R&D organizations, this organization provides a global team of engineers monitoring cloud service availability and ready to swiftly repair any service-impacting issues. Seven days a week, 24 hours a day, in a follow-the-sun model, the Site Reliability team keeps the Salesforce cloud and our customers protected. As a member of the Site Reliability team, you will be responsible for the primary task of detecting and resolving incidents within minutes. This objective is met by monitoring the services, reacting to problems, and proactively addressing issues before they affect performance or availability.When not fighting fires, the team is responsible for fire prevention through monitoring, automation, self-healing and resiliency initiatives, destructive testing, and game day exercises. The incumbent in this role would demonstrate a strong focus on tactical operations, as well as large-scale production engineering and orchestration.Role Description:
Being available to discuss and resolve technical issues and escalations with other technical staff as required.
Working with the team to ensure that technical skills are maintained and continuous development of the technical competence of the SME/SR’s matching the company requirements through training and personal development.
Keep the customer-facing services available at top performance by maintaining the constant health of the supporting systems.
Incident management - Act in key support roles during major incidents e.g. Sev0, Sev1. Also, participate in the technical review of the incident for problem management
Problem Management - populate and participate in RCAs and hand them off to the Global Solutions team
Ensuring that work carried out by the Site Reliability team is executed in such a way as to comply with the company’s internal compliance policy and directives
Being available to discuss and resolve technical issues and escalations with other technical staff as required
Work with and lead other members of the team in staying on top of key industry innovation and technology, and assist in team development growth
Identifying work opportunities and preparing or assisting with the preparation of technical proposals as required
Ability to operate in the high-pressure environment and troubleshoot complex issues quickly successfully handle multiple priorities
Work to automate detection and resolution of recurring issues in the production environment
As a lead you will have the responsibility to manage the team and to help the team on leading issues.
You would be expected to communicate with other infrastructure teams for resolving the problem.
Acting as a technical mentor/coach to all Site Reliability team members.
Systems engineering experience in enterprise scale internet service engineering or support role
Expertise in TCP/IP related technologies (networking protocols, network programming, etc.)
Expertise in CLI enterprise support of Unix variants (Linux/Solaris/BSD) as well as strong Linux/UNIX knowledge with significant exposure to Red Hat Enterprise Linux and Solaris
Strong understanding of monitoring implementations and administration
Strong communication skills (Written and Oral)
Past experience in Incident Management and good understanding of ITIL service operations
Experience in working in a 24/7 team managing large data centers
- BS or higher degree in Computer Science or Electrical Engineering plus relevant job-related experience
Perl/Python/BASH scripting experience
Prior Chef/Puppet or automated deployment experience
Experience in supporting and maintaining a monitoring and alert systems
Experience supporting and troubleshooting relational databases and distributed platforms
Experience in supporting and maintaining Java applications
Experience in Docker orchestration and management.
Hands on experience configuring and managing AWS (Amazon Web Services), using the CLI/SDKs
Experience managing systems monitoring and alerts.
Experience with JVM optimization and Java server technologies like Tomcat or Jetty
Salesforce.com and Salesforce.org are Equal Employment Opportunity and Affirmative Action Employers. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, or disability status. Headhunters and recruitment agencies may not submit resumes/CVs through this Web site or directly to managers. Salesforce.com and Salesforce.org do not accept unsolicited headhunter and agency resumes. Salesforce.com and Salesforce.org will not pay fees to any third-party agency or company that does not have a signed agreement with Salesforce.com or Salesforce.org.
Salesforce, the Customer Success Platform and world's #1 CRM, empowers companies to connect with their customers in a whole new way. We are the fastest growing of the top 10 enterprise software companies, the World's Most Innovative Company according to Forbes, and one of Fortune's 100 Best Companies to Work For six years running. The growth, innovation, and Aloha spirit of Salesforce are driven by our incredible employees who thrive on delivering success for our customers while also finding time to give back through our 1/1/1 model, which leverages 1% of our time, equity, and product to improve communities around the world. Salesforce is a team sport, and we play to win. Join us!