Mark crossed the Atlantic from London to NYC through a full blown snowstorm just to don a Gandalf grade beard in person.
He commandeered the opening slot like a benevolent wizard of uptime to unpack the origin story of the assignment, trace the architectural lineage of the idea, and outline the deployment plan for making the very first DevOps Not Dead an event worthy of a permanent entry in the changelog.... Read more
This talk explores how agentic AI can be applied to build enterprise-ready CI/CD pipelines that are faster, safer, and easier to operate at scale. Chinmay will walk through how autonomous agents can make decisions across build, test, and deployment workflows, reducing manual intervention while improving reliability and governance. Attendees will gain a practical view of how AI-driven automation fits into modern CI/CD platforms and what it takes to move from experimental pipelines to production-grade delivery systems.... Read more
For the past 10 years, I’ve created live-coded music and audio-reactive visual art under the name Messica Arson. This talk will cover my tools, my creative process for sound and visuals, and how I’ve built reliable artistic systems.
For this talk, I’ll focus on points of failure, lessons learned, and practical solutions, ranging from troubleshooting scripts to touring checklists for venues with varying constraints and internet access concerns.
Whether you’ve coded on stage or deployed to production at 2 a.m., this talk shows how to design systems that withstand pressure and unpredictability.... Read more
Standard runbook automation often fails because it lacks context. When an incident strikes, the missing link is the "reasoning" that connects a spike in latency to a specific commit or infrastructure change. Join this session to see how AI SRE agents are bridging that gap using Agentic workflows. We will break down the stack required for autonomous resolution:
* Context Injection: How agents ingest observability data and Git history.
* Reasoning Loops: Moving from "if-this-then-that" to LLM-driven diagnostics.
* Safe Remediation: Implementing guardrails and human-in-the-loop checkpoints.
See how ilert is evolving the SRE toolkit to ensure that by the time a human logs in, the service is already restored.... Read more
J.R.R. Tolkien was a master of presenting real world situations in fantastical settings and making them both understandable and solvable. For many of us, Tolkien gave us the tools and the courage to conquer both the imagined fears and all-too-real obstacles standing in our way during childhood and adolescence.
BUT... If we take the time to really listen, Tolkien's Legendarium contains relevant and compelling lessons for seasoned I.T. practitioners. Seen in the right light (not just the setting sun of Durin's Day) we can find models for handling all types of technical challenges.
In this talk, I'll speak of far-sighted elves, underestimated hobbits, mysterious wizards, and of course of rings (Tolkien rather than token). But in the process I'll share a dragon's hoard of insights for the problems we find in our middleware, rather than Middle Earth.... Read more
This talk explores how AI is collapsing the traditional software development lifecycle and breaking long held linear DevOps assumptions, pushing meaningful work earlier through AI enabled prototyping and tighter collaboration across development, operations, and business teams. Drawing from real leadership lessons, it covers what actually reduced rework, what failed in practice, and how teams can adapt safely as these shifts accelerate.... Read more
As AI agents increasingly act autonomously in production systems, devops expands into evaluation, observability, and all of the above. This talk explores how to observe, improve, and debug AI agents in ways we did not have to think about for standard software services. I'll cover AI agent eval, monitoring, sandboxing best practices, and failure modes unique to agent-driven systems. Because when your software makes decisions on its own, someone still needs to be watching (it's agents all the way down).... Read more
A language is more than just the syntax and semantic rules of the words themselves. It also encompasses the shared culture of the speakers. With the proliferation of programming languages as well as the deeply held cultural beliefs of the community, it's easy to see that learning DevOps is like trying to learn a foreign language.
I will review five foundational hypotheses from the field of Second Language Acquisition and relate these hypotheses back to the world of DevOps. DevOps practitioners, trainers, tool builders, and learners should all come away with useful insights to apply to their practice.... Read more
DORA is the largest and longest running research program of its kind, that seeks to understand the capabilities that drive software delivery and operations performance. Our latest report is the 2025 State of AI-assisted Software Development, which reveals AI’s primary role is as an amplifier magnifying an organization’s existing strengths and weaknesses. The greatest returns on AI investment come not from the tools themselves, but from a strategic focus on the underlying organizational system. This year’s research also introduces the new DORA AI Capabilities Model, which identifies the key technical and cultural practices that are proven to amplify the positive impacts of AI on performance.... Read more
We built a AI FinOps platform in 2 weeks with Kiro and AWS Bedrock that found $2M in annual AWS waste— without buying expensive tools. 13 automated scanners, gamified leaderboards, AI recommendations, and one-click cleanup. I'll show you exactly how we did it and how you can too.... Read more
AI systems are rapidly evolving from assistive tools into autonomous decision-makers operating in production environments. While most organizations still evaluate AI success primarily through accuracy, latency, or model performance—often overlooking reliability, failure modes, and human trust.
In this talk, I will introduce AI Reliability Engineering (AIRE)—an emerging discipline that applies Site Reliability Engineering principles to AI-driven systems. I will explore how AI fails differently than traditional software, why AI-related incidents often go undetected, and how foundational SRE concepts such as SLOs, error budgets, observability, and graceful degradation must evolve to support reliable AI workloads at scale.... Read more
In every fantasy world, chaos erupts when realms drift apart - and modern platforms are no different. Development, staging, and production environments often diverge, creating configuration drift, broken releases, and late-night firefighting.
This talk explores how GitOps becomes the single source of truth, binding all environments under one declarative contract. We’ll walk through practical patterns for environment overlays, progressive promotion, and policy enforcement — ensuring that what is tested in one realm is exactly what reaches the final battlefield: production.... Read more
Dependency Dragons and Code Goblins are the hidden security threats buried in builds, originating from unvetted third-party dependencies and the insecure code developed. Dependency Dragons lurk inside third-party libraries , while Code Goblins appear from improper secure coding practices.... Read more
18:00
Council of the Pint (Happy Hour - by Imply)
Main lobby
18:30
Wrap up
Scan each other's QR codes & head to a nearby pub!
For the past 10 years, I’ve created live-coded music and audio-reactive visual art under the name Messica Arson. This talk will cover my tools, my creative process for sound and visuals, and how I’ve built reliable artistic systems.
For this talk, I’ll focus on points of failure, lessons learned, and practical solutions, ranging from troubleshooting scripts to touring checklists for venues with varying constraints and internet access concerns.
Whether you’ve coded on stage or deployed to production at 2 a.m., this talk shows how to design systems that withstand pressure and unpredictability.
Bio
Jessica Garson is a Developer Relations leader with 15 years of experience, having held senior roles at Elastic and Twitter. She is active in the Python, JavaScript, developer tooling, and creative coding communities, and has spoken globally at conferences including PyCon, Write the Docs, and PyOhio.
Based in New York City, Jessica has been part of the creative coding scene since 2017 and runs RAID, an art and technology meetup. Under the name Messica Arson, she creates noise music and visual art using modular synthesizers and live coding.
Birol Yildiz
ilert
Building AI SRE Agents for Autonomous Recovery
Abstract
Standard runbook automation often fails because it lacks context. When an incident strikes, the missing link is the "reasoning" that connects a spike in latency to a specific commit or infrastructure change. Join this session to see how AI SRE agents are bridging that gap using Agentic workflows. We will break down the stack required for autonomous resolution:
Context Injection: How agents ingest observability data and Git history.
Reasoning Loops: Moving from "if-this-then-that" to LLM-driven diagnostics.
Safe Remediation: Implementing guardrails and human-in-the-loop checkpoints.
See how ilert is evolving the SRE toolkit to ensure that by the time a human logs in, the service is already restored.
Bio
Birol Yildiz is the Co-founder and CEO of ilert, adeptly steering the company with a rare combination of technical and product expertise. His prior experience includes a significant role as Chief Product Owner for Big Data products at REWE Digital. With a strong foundation in computer science, Birol bridges the gap between developer and product strategist, constantly striving to innovate and provide customer-centric solutions at ilert.
Leon Adato
Cribl
What Tolkien Taught Me About Being an SRE
Abstract
J.R.R. Tolkien was a master of presenting real world situations in fantastical settings and making them both understandable and solvable. For many of us, Tolkien gave us the tools and the courage to conquer both the imagined fears and all-too-real obstacles standing in our way during childhood and adolescence.
BUT... If we take the time to really listen, Tolkien's Legendarium contains relevant and compelling lessons for seasoned I.T. practitioners. Seen in the right light (not just the setting sun of Durin's Day) we can find models for handling all types of technical challenges.
In this talk, I'll speak of far-sighted elves, underestimated hobbits, mysterious wizards, and of course of rings (Tolkien rather than token). But in the process I'll share a dragon's hoard of insights for the problems we find in our middleware, rather than Middle Earth.
In my sordid career, I have been an actor, bug exterminator and wild-animal remover (nothing crazy like pumas or wildebeests. Just skunks, snakes, and raccoons.), electrician, carpenter, stage-combat instructor, ASL interpreter, and Sunday school teacher. Oh, yeah, I've also worked with computers.
While my first keyboard was an IBM selectric, and my first digital experience was on an Atari 400, my professional work in tech started in 1989 (when you got Windows 286 for free on twelve 5¼” when you bought Excel 1.0). Since then I've worked as a classroom instructor, courseware designer, helpdesk operator, desktop support staff, sysadmin, network engineer, and software distribution technician.
Then, about 25 years ago, I got involved with monitoring. I've worked with a wide range of tools: Tivoli, BMC, OpenView, janky perl scripts, Nagios, SolarWinds, DOS batch files, Zabbix, Grafana, New Relic, and other assorted nightmare fuel. I've designed solutions for companies that were modest (~10 systems), significant (5,000 systems), and ludicrous (250,000 systems). In that time, I've learned a lot about monitoring and observability in all it's many and splendid forms.
Sadio Jonas
The Academy For AI Strategy
Leading the Future of DevOps with AI
Abstract
This talk explores how AI is collapsing the traditional software development lifecycle and breaking long held linear DevOps assumptions, pushing meaningful work earlier through AI enabled prototyping and tighter collaboration across development, operations, and business teams. Drawing from real leadership lessons, it covers what actually reduced rework, what failed in practice, and how teams can adapt safely as these shifts accelerate.
Bio
Sadio Jonas is a 20-year technology veteran, Fractional AI Executive, AI innovator, and the Founder and Director of AI Strategy at AI Vantage Consulting—the first consulting firm built natively for the Generative AI era. She is also the Founder of The Academy for AI Strategy, focused on AI skills for visionary leadership in the "AI Augmented Era." Among other transformation efforts she has lead to enterprise-wide change to Dev Ops in the Financial Services Sector.
She partners with organizations and their leadership teams to navigate enterprise-wide AI transformation with clarity, confidence, and a distinctly human-centered approach.
Diamond Bishop
DataDog
Who Watches the Watchmen - Running Production Ready AI Agents
Abstract
As AI agents increasingly act autonomously in production systems, devops expands into evaluation, observability, and all of the above. This talk explores how to observe, improve, and debug AI agents in ways we did not have to think about for standard software services. I'll cover AI agent eval, monitoring, sandboxing best practices, and failure modes unique to agent-driven systems. Because when your software makes decisions on its own, someone still needs to be watching (it's agents all the way down).
Bio
Diamond Bishop is a Director of Engineering and AI at Datadog, where he leads applied AI teams and experimental product efforts focused on AI agents and developer productivity. A longtime techno optimist, he has spent over 15 years building AI and ML systems across startups and large platforms, including Amazon, Meta, and AWS. Diamond is the co founder of Augmend, an AI company acquired by Datadog, and splits his time between New York and San Francisco thinking about how AI can meaningfully extend human capability.
Josh Lee
Altinity
DevOps is a Foreign Language (or Why There Are No Junior SREs)
Abstract
A language is more than just the syntax and semantic rules of the words themselves. It also encompasses the shared culture of the speakers. With the proliferation of programming languages as well as the deeply held cultural beliefs of the community, it's easy to see that learning DevOps is like trying to learn a foreign language.
I will review five foundational hypotheses from the field of Second Language Acquisition and relate these hypotheses back to the world of DevOps. DevOps practitioners, trainers, tool builders, and learners should all come away with useful insights to apply to their practice.
Bio
Josh is a seasoned software developer with over a decade of experience, specializing in a broad range of topics including operations, observability, agile methodologies, and accessibility. His passion for technology is matched by his enthusiasm for sharing knowledge through public speaking. Currently, Josh serves as a Senior Developer Advocate for Altinity, where he creates educational content on ClickHouse and OpenTelemetry, and he is a contributor to the OpenTelemetry project.
James Brookbank
Google
Latest DORA research on AI-assisted Software Development
Abstract
DORA is the largest and longest running research program of its kind, that seeks to understand the capabilities that drive software delivery and operations performance. Our latest report is the 2025 State of AI-assisted Software Development, which reveals AI’s primary role is as an amplifier magnifying an organization’s existing strengths and weaknesses. The greatest returns on AI investment come not from the tools themselves, but from a strategic focus on the underlying organizational system. This year’s research also introduces the new DORA AI Capabilities Model, which identifies the key technical and cultural practices that are proven to amplify the positive impacts of AI on performance.
Bio
James Brookbank is a cloud solutions architect manager at Google based in NYC. Solution architects help make cloud easier for Google’s customers by solving complex technical problems and providing expert architectural guidance. Before joining Google, James worked at a number of large enterprises with a focus on IT infrastructure and financial services.
Nishkarsh Raj
StatusNeo
Q the Savings: How We Built a $2M/Year FinOps Platform in 2 Weeks
Abstract
We built a AI FinOps platform in 2 weeks with Kiro and AWS Bedrock that found $2M in annual AWS waste— without buying expensive tools. 13 automated scanners, gamified leaderboards, AI recommendations, and one-click cleanup. I'll show you exactly how we did it and how you can too.
Bio
Nishkarsh is a DevSecOps expert and an International GitHub Star. Nishkarsh is an ardent supporter of open-source, GitHub, DevEx, and DevOps. Nishkarsh serves as StatusNeo Inc.'s Principal Evangelist & Consultant. Over the years, he has been actively GitHubbing and contributing to open-source. By giving talks at conferences, organizing meetups, and encouraging people to take on the #100DaysofCode challenge, he has encouraged many brilliant minds to embark on their journeys in open-source projects and preach the significance of collaboration to aspiring developers.
Akash Thakur
Cognizant
AI Reliability Engineering (AIRE): Building Systems We Can Actually Trust
Abstract
AI systems are rapidly evolving from assistive tools into autonomous decision-makers operating in production environments. While most organizations still evaluate AI success primarily through accuracy, latency, or model performance—often overlooking reliability, failure modes, and human trust.
In this talk, I will introduce AI Reliability Engineering (AIRE)—an emerging discipline that applies Site Reliability Engineering principles to AI-driven systems. I will explore how AI fails differently than traditional software, why AI-related incidents often go undetected, and how foundational SRE concepts such as SLOs, error budgets, observability, and graceful degradation must evolve to support reliable AI workloads at scale.
Bio
Akash Thakur is a Site Reliability Engineering leader and IT Architect with 17+ years of experience modernizing mission-critical systems across finance, healthcare, and the public sector. He currently serves as an SRE Architect at Cognizant, where he leads automation-first SRE and AI-driven resilience initiatives for Fortune 500 enterprises. He writes and speaks on the evolving intersection of SRE and AI infrastructure
Avinash Sabat
Synechron / UBS
One Repo to Rule Them All: GitOps Across the Realms of Dev, Staging, and Production
Abstract
In every fantasy world, chaos erupts when realms drift apart - and modern platforms are no different. Development, staging, and production environments often diverge, creating configuration drift, broken releases, and late-night firefighting.
This talk explores how GitOps becomes the single source of truth, binding all environments under one declarative contract. We’ll walk through practical patterns for environment overlays, progressive promotion, and policy enforcement — ensuring that what is tested in one realm is exactly what reaches the final battlefield: production.
Bio
Avinash Sabat is a Principal Cloud DevOps Engineer with over 14 years of experience designing, automating, and scaling infrastructure in the financial services sector. He specializes in Kubernetes, Azure, AWS, and GitLab CI/CD, with deep expertise in Infrastructure as Code using Terraform and Helm. Avinash has built enterprise-grade platforms that improve application deployment velocity, system resilience, and automation efficiency.
Mahender Mangalasri
Cognizant
Slaying Dragons & Goblins: Securing the Code in the DevSecOps Dungeon
Abstract
Dependency Dragons and Code Goblins are the hidden security threats buried in builds, originating from unvetted third-party dependencies and the insecure code developed. Dependency Dragons lurk inside third-party libraries , while Code Goblins appear from improper secure coding practices.
Bio
Mahender is a Cybersecurity consultant, currently working as an AppSec manager with a Global consulting firm. He has over a decade of experience in securing web applications, APIs, and Databases across diverse enterprises and Lines of Business. He is a Senior IEEE member, ISC2 NJ Chapter Member, Mentor & a Speaker.
Mark Pawlikowski
DND Organizer
Opening note: Was this a furry convention all along?
Abstract
Mark crossed the Atlantic from London to NYC through a full blown snowstorm just to don a Gandalf grade beard in person.
He commandeered the opening slot like a benevolent wizard of uptime to unpack the origin story of the assignment, trace the architectural lineage of the idea, and outline the deployment plan for making the very first DevOps Not Dead an event worthy of a permanent entry in the changelog.
Chinmay Gaikwad
Harness
Building Enterprise-ready CI/CD Pipelines using Agentic AI
Abstract
This talk explores how agentic AI can be applied to build enterprise-ready CI/CD pipelines that are faster, safer, and easier to operate at scale. Chinmay will walk through how autonomous agents can make decisions across build, test, and deployment workflows, reducing manual intervention while improving reliability and governance. Attendees will gain a practical view of how AI-driven automation fits into modern CI/CD platforms and what it takes to move from experimental pipelines to production-grade delivery systems.
Bio
Chinmay is the Director of Product Marketing at Harness. He has experience in developer tools, cloud, and data center technologies. Previously, he worked at companies such as Intel, IBM, and early-stage startups, focusing on application security, observability, and Kubernetes. In his free time, he loves traveling and exploring new restaurants.