We are seeking a Senior Cloud Production Support Engineer with a good technical foundation of experience, an analytic mindset able to demonstrate critical thinking to problem solving, production environment leadership experience and most importantly a strong desire to learn & evolve all aspects of our AWS serverless platform for the support of both our internal engineering team and our enterprise customers.
Join our start up journey working remotely within a 4hr timezone difference of our headquarter Sydney office, collaborating with our global team on a long term contract.
With our recent funding round taking our seed funding to >$4mill and winning a $2mill government/industry grant for our new Biopass biometric multi factor authentication product to add to our recent $2mill innovation grant, the future is looking very exciting for us here at truuth and we are now scaling up our team.
Our website www.truuth.id shows our 2 year journey to date where we launched our digital identity services platform (a multi-tenant AWS serverless SaaS offering) with our Verification of Identity (‘KYC’) solution now used by large Australian enterprises like Australia Finance Group and Macquarie/NuMobile.
We are seeking candidates who understand the huge opportunity that joining a 'scale up' startup provides to learn a wide range of technologies & skills from a passionate & highly skilled team - its a fantastic opportunity to accelerate your career. You are first 'in the door' to setup our production support function for our initial customers and you will build out a high performing team.
If you are passionate about cloud computing, believe that digital identity is a key enabler for online security, innovation & convenience, and believe that world class support is critical to customer success then you will be a great fit!
You will establish processes and build up a team to deliver across the following areas:
Application Support & Monitoring:
- Monitor infrastructure, servers, middleware, databases, and batch jobs.
- Aggressively respond to service requests from business partners facing support teams, Operations, Risk/control partners, etc.
- Troubleshoot environment, data control and operational issues.
- Create and Maintain documentation to ensure knowledge accessibility.
- Automate and streamline process using scripts and scheduling tools.
- Liaise with internal/external business and technical partners.
- Provide ad hoc and on-demand reports.
- Perform timely escalation of critical issues and proactively identify patterns of recurring issues to improve production.
- Lead problem resolution and conduct root cause analysis and establish processes that will help incident prevention.
- Participates in the Incident and Problem Management processes as a resolver accountable for root cause analysis, resolution and reporting.
- Ensures that all production changes are processed according to Change Management policies and procedures.
- Ensures that appropriate levels of Quality Assurance have been met for all new and existing products.
- Support Sustained Resiliency, Disaster Recovery, and High Availability events.
- Help development team with setting up monitoring and bridging the gaps in current monitoring setup.
- Play key part in setting up reporting and be a key component in Monitor -> Report -> Improve principle
- Coordinate incident management coverage, to ensure appropriate coverage.
- Call facilitation, coordination and communications during critical outage situations.
- Call documentation, queue management, ticket analysis via the Production Assurance process.
- End to end view of issues for objectivity.
- Influence senior team members to ensure timely resolution of incidents
- Participate and ensure RCA (root cause analysis) activities on client impacting incidents are executed and action items are assigned / completed.
- Provide expertise and support during critical incidents, interfacing with product owners to better manage the message.
- Chronic issue coordination and leadership.
Hygiene and Capacity Maintenance:
- Work aggressively to make sure all services are up to company standards as per uptimes, patch level etc.
- Work on Capacity planning for applications, estimating and analysing growth rates of vital infrastructure components and adding capacity pro-actively as and when required.
Know Your Application:
- Understand application code, work flow and business usage of application.
- Understand DB component of application.
- Understand the impacts of application based on seasonality of critical applications.
- Document known errors and play important role in Knowledge transfer to development team.
- Reduce escalations to Level 3 based on incremental learning about applications.
Enterprise customer onboarding & support
You will ensure a seamless onboarding experience for our enterprise customers & champion rectification of any issues they encounter
- Minimum 6 years of relevant Information Technology experience.
- Should be able to provide 24/7 on-call support.
- Proven experience in incident/problem management with a good understanding of any of the tools used for this purpose.
- Understanding of SRE concepts and a proven experience working on automation or application development using any programming language.
- Solid technical skills including knowledge of client server technology, networking basics, database technology, end to end understanding of microservices and event-driven architecture.
- Good understanding of cloud concepts.
- Good understanding of the AWS technologies and infrastructure such as AWS Lambda, Step Functions, S3, DynamoDB, Cognito, ECS, Fargate, API Gateway, VPC, Subnets, CloudFront etc.
- Good understanding of database technologies like Mongo DB and SQL.
- Good understanding of monitoring tools.
- Excellent communication skills, both verbal and written, with the ability to lead/manage conference calls.
- Comfortable providing clear problem descriptions and guidance to business users in a time critical environment.
- Ability to be proactive with a strong bias for action, naturally inquisitive, and bias for continuous improvement of practices / process.
- Excellent influence, negotiation and presentation skills.
- Solid understanding of the major functionality bundled into a release, both from a technology and business point of view.
- Strong knowledge of relevant applications and development life cycles.
- Experience working with geographically distributed and culturally diverse work-groups.
- Strong desire to learn new technology.
- Ability to work independently as a self-starter, and within a team environment.
- Previous experiences with SaaS and multi-tenant solutions will be highly regarded.
Come and join a fun team and help us make the world a safer place.