Cloud Support Engineer

trUUth
trUUth
( IT / Development )
asiaremotejobs.com  Remote (Asia | APAC Time Zone Permitted)

Job Type : Contract
Experience : 3 to 5 years
Education : Bachelor Degree

Job Detail

We are seeking a Senior Cloud Production Support Engineer with a good technical foundation of experience, an analytic mindset able to demonstrate critical thinking to problem solving, production environment leadership experience and most importantly a strong desire to learn & evolve all aspects of our AWS serverless platform for the support of both our internal engineering team and our enterprise customers.

Join our start up journey working remotely within a 4hr timezone difference of our headquarter Sydney office, collaborating with our global team on a long term contract.

With our recent funding round taking our seed funding to >$4mill and winning a $2mill government/industry grant for our new Biopass biometric multi factor authentication product to add to our recent $2mill innovation grant, the future is looking very exciting for us here at truuth and we are now scaling up our team.

Our website www.truuth.id shows our 2 year journey to date where we launched our digital identity services platform (a multi-tenant AWS serverless SaaS offering) with our Verification of Identity (‘KYC’) solution now used by large Australian enterprises like Australia Finance Group and Macquarie/NuMobile.

We are seeking candidates who understand the huge opportunity that joining a 'scale up' startup provides to learn a wide range of technologies & skills from a passionate & highly skilled team - its a fantastic opportunity to accelerate your career. You are first 'in the door' to setup our production support function for our initial customers and you will build out a high performing team.

If you are passionate about cloud computing, believe that digital identity is a key enabler for online security, innovation & convenience, and believe that world class support is critical to customer success then you will be a great fit!

Key Responsibilities

You will establish processes and build up a team to deliver across the following areas:

Application Support & Monitoring:

  • Monitor infrastructure, servers, middleware, databases, and batch jobs.
  • Aggressively respond to service requests from business partners facing support teams, Operations, Risk/control partners, etc.
  • Troubleshoot environment, data control and operational issues.
  • Create and Maintain documentation to ensure knowledge accessibility.
  • Automate and streamline process using scripts and scheduling tools.
  • Liaise with internal/external business and technical partners.
  • Provide ad hoc and on-demand reports.
  • Perform timely escalation of critical issues and proactively identify patterns of recurring issues to improve production.
  • Lead problem resolution and conduct root cause analysis and establish processes that will help incident prevention.
  • Participates in the Incident and Problem Management processes as a resolver accountable for root cause analysis, resolution and reporting.
  • Ensures that all production changes are processed according to Change Management policies and procedures.
  • Ensures that appropriate levels of Quality Assurance have been met for all new and existing products.
  • Support Sustained Resiliency, Disaster Recovery, and High Availability events.
  • Help development team with setting up monitoring and bridging the gaps in current monitoring setup.
  • Play key part in setting up reporting and be a key component in Monitor -> Report -> Improve principle

Incident Management:

  • Coordinate incident management coverage, to ensure appropriate coverage.
  • Call facilitation, coordination and communications during critical outage situations.
  • Call documentation, queue management, ticket analysis via the Production Assurance process.
  • End to end view of issues for objectivity.
  • Influence senior team members to ensure timely resolution of incidents

Problem Management:

  • Participate and ensure RCA (root cause analysis) activities on client impacting incidents are executed and action items are assigned / completed.
  • Provide expertise and support during critical incidents, interfacing with product owners to better manage the message.
  • Chronic issue coordination and leadership.

Hygiene and Capacity Maintenance:

  • Work aggressively to make sure all services are up to company standards as per uptimes, patch level etc.
  • Work on Capacity planning for applications, estimating and analysing growth rates of vital infrastructure components and adding capacity pro-actively as and when required.

Know Your Application:

  • Understand application code, work flow and business usage of application.
  • Understand DB component of application.
  • Understand the impacts of application based on seasonality of critical applications.
  • Document known errors and play important role in Knowledge transfer to development team.
  • Reduce escalations to Level 3 based on incremental learning about applications.

Enterprise customer onboarding & support

You will ensure a seamless onboarding experience for our enterprise customers & champion rectification of any issues they encounter

Qualifications

  • Minimum 6 years of relevant Information Technology experience.
  • Should be able to provide 24/7 on-call support.
  • Proven experience in incident/problem management with a good understanding of any of the tools used for this purpose.
  • Understanding of SRE concepts and a proven experience working on automation or application development using any programming language.
  • Solid technical skills including knowledge of client server technology, networking basics, database technology, end to end understanding of microservices and event-driven architecture.
  • Good understanding of cloud concepts.
  • Good understanding of the AWS technologies and infrastructure such as AWS Lambda, Step Functions, S3, DynamoDB, Cognito, ECS, Fargate, API Gateway, VPC, Subnets, CloudFront etc.
  • Good understanding of database technologies like Mongo DB and SQL.
  • Good understanding of monitoring tools.
  • Excellent communication skills, both verbal and written, with the ability to lead/manage conference calls.
  • Comfortable providing clear problem descriptions and guidance to business users in a time critical environment.
  • Ability to be proactive with a strong bias for action, naturally inquisitive, and bias for continuous improvement of practices / process.
  • Excellent influence, negotiation and presentation skills.
  • Solid understanding of the major functionality bundled into a release, both from a technology and business point of view.
  • Strong knowledge of relevant applications and development life cycles.
  • Experience working with geographically distributed and culturally diverse work-groups.
  • Strong desire to learn new technology.
  • Ability to work independently as a self-starter, and within a team environment.
  • Previous experiences with SaaS and multi-tenant solutions will be highly regarded.

Come and join a fun team and help us make the world a safer place.

11 total views, 1 today
Apply this position
LinkedIn-VN - 3 weeks ago