Principal Software Engineer
AT&T / DIRECTV
2004 – 2024 · El Segundo, CA
- Provide strategic leadership for the design, implementation, and lifecycle management of the NFL OTT API ecosystem, ensuring secure, scalable, and reliable digital commerce experiences.
- Oversee the end-to-end integration of authentication (OAuth), eligibility, offer management, cart, prospect, and order APIs, aligning technical delivery with business objectives and compliance requirements.
- Direct cross-functional teams in the development and maintenance of customer-facing endpoints, ensuring seamless user journeys from eligibility assessment through order completion.
- Establish and enforce API security, session management, and data privacy standards, including token-based authentication and sensitive payment data handling.
- Drive continuous improvement in API error handling, monitoring, and transaction traceability to optimize operational efficiency and customer satisfaction.
- Collaborate with business stakeholders to define requirements and prioritize enhancements for NFL OTT digital product offerings.
- Ensure robust documentation, version control, and change management practices for all API modules to support internal and external developer enablement.
- Lead incident response and root cause analysis for API-related issues, implementing preventative measures and communicating impacts to executive leadership.
- Foster a culture of innovation and agility, leveraging emerging technologies and best practices to enhance the NFL OTT digital platform’s competitive advantage.
- Designed and implemented cloud-based CI/CD pipelines, automating deployments across hybrid environments using Jenkins, GitLab CI/CD, and Azure DevOps.
- Developed an AI-powered self-healing system using Camunda, reducing incident resolution from hours to minutes, saving $10K per event in mitigation costs.
- Led the containerization of microservices, deploying applications using Docker, Kubernetes, and Helm to improve scalability and fault tolerance.
- Automated infrastructure provisioning with Terraform and Ansible, reducing manual operations and improving system consistency.
- Implemented security best practices for DevOps workflows, integrating SAST/DAST scanning tools (SonarQube, Trivy) into pipelines.
- Optimized log monitoring and alerting with ELK stack, Prometheus, and Grafana, reducing MTTR (Mean Time to Resolution) for incidents.
- Led an API team to facilitate the adoption of APIs, transforming web services to expose assets through APIs.
- Developed services to deactivate and recycle improperly deactivated access cards to prevent automatic reinstatement.
- Conceptualized and designed the AiOps application for AI/ML detection and automated recovery, enhancing service delivery and optimizing resource utilization. This solution facilitated comprehensive monitoring of operational databases and application health resources through a customizable dashboard integrated with ServiceNow using tools such as Nagios, Dynatrace, AppDynamics, and Prometheus for metrics collection and ELK and Splunk for logging.
- Automated remediation to reduce potential performance impacts.
- Achieved over 80% average outage minutes avoided, exceeding the original target of 30–50% Outage Minute Reduction.
- Showed a 10% decrease in Outage Minutes (saving $700,000 at Observability and Notifications Level) and a 60% decrease in Outage Minutes (saving $6,000,000 at Automated Diagnostics & Recovery level) from 2021 vs 2022 Impact Assessment.
- Responsibilities include developing and maintaining infrastructure as code (IaC) scripts using Terraform for Azure, ensuring software compliance, managing Docker images, and automating security vulnerability remediation using Azure DevOps Pipelines.