<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[benroberts.io]]></title><description><![CDATA[A blog for all things Azure! Whether you're a developer or IT pro, explore tips, trends, and hands-on guides to optimize and transform your workflow with Microsoft Azure's powerful capabilities. Join me while I learn new features and technologies from code to infrastructure.]]></description><link>https://benroberts.io</link><image><url>https://cdn.hashnode.com/res/hashnode/image/upload/v1727490663733/066fe0bc-b8d9-4cfd-9f48-88c85ada3b7a.png</url><title>benroberts.io</title><link>https://benroberts.io</link></image><generator>RSS for Node</generator><lastBuildDate>Tue, 14 Apr 2026 15:59:20 GMT</lastBuildDate><atom:link href="https://benroberts.io/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[The Path to Success for Governing AI Agents with Microsoft Entra Agent ID]]></title><description><![CDATA[Introduction
Most organizations do not wake up one morning and decide to run an agent fleet. It happens in increments. A Copilot appears to summarize meetings. A bot gets introduced to triage requests]]></description><link>https://benroberts.io/the-path-to-success-for-governing-ai-agents-with-microsoft-entra-agent-id</link><guid isPermaLink="true">https://benroberts.io/the-path-to-success-for-governing-ai-agents-with-microsoft-entra-agent-id</guid><category><![CDATA[Entra ID]]></category><category><![CDATA[IAM]]></category><category><![CDATA[identity and access management ]]></category><dc:creator><![CDATA[Ben Roberts]]></dc:creator><pubDate>Tue, 31 Mar 2026 23:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/66f75313534fa6a33c25d941/22d9ed71-f4b3-487b-865b-1e635ac3769c.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>Introduction</h2>
<p>Most organizations do not wake up one morning and decide to run an agent fleet. It happens in increments. A Copilot appears to summarize meetings. A bot gets introduced to triage requests. A workflow assistant starts opening cases after hours. An autonomous agent begins moving across APIs, files, and approvals with just enough independence to be useful. Then another team builds one. A vendor brings one. A low-code platform makes one easy to publish. Soon enough, you've inadvertently deployed the software equivalent of Talkie Toaster, relentlessly asking your APIs if they'd like any toast, or perhaps a toasted teacake, while consuming valuable resources. What looked like a handful of helpful automation becomes a population of nonhuman actors that sign in, request tokens, call resources, and accumulate access. Before you know it, your tenant is drifting like the Jupiter Mining Corporation ship Red Dwarf, overrun by unmanaged Skutters while the AI effectively operates with the IQ of a PE teacher.</p>
<p>That is exactly why Microsoft Entra Agent ID matters. In preview, and currently available through Microsoft Agent 365 in the Frontier program, it gives organizations a formal identity model for AI agents instead of forcing them into a patchwork of human accounts and long-lived application identities. Microsoft is not just adding a new object type. It is building a governance model around agent identity blueprints, blueprint principals, agent identities, and agent users, then layering sponsorship, access packages, Conditional Access, ID Protection, and monitoring on top.</p>
<p>The practical question, then, is not whether agent identity matters, (spoiler alert: they do) it's how to adopt them successfully without creating new sprawl, new blind spots, and a new category of orphaned access. The best way to think about Microsoft Entra Agent ID is as a path to success. Not a single feature, but a sequence.</p>
<h2>The Tikka to Ride: Agent 365 and the Frontier Program</h2>
<p>Before we start building our mechanoid empire, let's talk prerequisites. You can't just flip a switch and get Agent ID today—it requires authorization higher than a Space Corps Directive. To get started, your tenant must be part of the <a href="https://adoption.microsoft.com/copilot/frontier-program/">Frontier preview program</a> and you have to agree to the Agent 365 terms of service. You also need at least one license of Microsoft 365 Copilot in the tenant to enable Agent 365.</p>
<p>To turn the key, an admin needs to head over to the Microsoft 365 admin center, navigate to <strong>Copilot &gt; Settings</strong>, find <strong>Copilot Frontier</strong> under User access, and grant access to the specific users or groups looking to pilot the future.</p>
<img src="https://learn.microsoft.com/en-us/microsoft-agent-365/media/overview/mac-steps.png" alt="" style="display:block;margin:0 auto" />

<h2>The Path to Success</h2>
<h3>Laying the Foundation: The Agent Blueprint Model</h3>
<p>The first successful move is to resist the temptation to create agents one by one with ad hoc settings. If you build them like snowflakes, they'll melt into a management puddle—or worse, you'll end up with a Series 4000 mechanoid obsessed with ironing everything in your tenant. Microsoft’s model starts with the agent identity blueprint for a reason. Every agent identity in a tenant is created from a blueprint, and that blueprint defines the shared characteristics of the agent type. Microsoft documents properties such as description, app roles, verified publisher, and authentication-related settings like optional claims as part of that common definition. The blueprint also carries the credentials used to request tokens for the agent identities it creates, which means the authentication model is set at the class-of-agent level rather than reinvented for every instance.</p>
<p>That is where categories, metadata, and auth method become operational decisions instead of afterthoughts. If an organization wants a category for customer support agents, another for finance automation, and another for internal copilots, the blueprint is the right boundary. If you want those categories to carry meaningful metadata about purpose, publisher, capabilities, and expected use, the blueprint is where you set that baseline. If you want the authentication method to align with environment and risk, this is where Microsoft’s guidance becomes especially important. For Azure-hosted agents, managed identity is the strongest production pattern. For other software-hosted agents, federated identity credentials are the modern choice. Certificates and client secrets exist, but Microsoft is clear that they are less aligned with current security best practice.</p>
<p>This matters because standardized provisioning is one of the main governance wins in Agent ID. A blueprint is not just a template. It is also a control surface. Microsoft explicitly notes that policies applied to a blueprint can affect every agent identity created from it, and that disabling a blueprint prevents its child agent identities from authenticating. That makes the blueprint model the right place to define consistent categories, metadata, and authentication posture before the fleet grows beyond what any team can reason about manually.</p>
<h3>Putting Humans in Charge: Sponsorship and Accountability</h3>
<p>Once the blueprint model exists, the next step is to make sure every agent is tied to accountable humans (preferably someone more responsible than Dave Lister). This is where many automation stories fail. The technology works, the business value shows up, and six months later nobody can answer who owns the thing or who should approve more access for it. Microsoft Entra Agent ID tries to close that gap by treating sponsors and owners as core governance constructs rather than optional documentation.</p>
<p>Microsoft’s governance guidance is direct on this point. Sponsors are the human users accountable for lifecycle and access decisions. Owners are the technical administrators responsible for operational management. In practice, that gives enterprises a clean division between business accountability and technical custody. More importantly, Microsoft has built lifecycle features around it. Sponsors can request access on behalf of an agent identity. Sponsors receive notifications as time-bound access assignments approach expiration. If sponsorship changes are required, Lifecycle Workflows can notify managers and cosponsors so accountability does not quietly disappear when people move roles or leave the organization.</p>
<p>This is where lifecycle approvals, renewals, and access escalations stop being improvised in email threads. If an agent needs broader access, the sponsor can act as the human checkpoint. If access is nearing expiry, the sponsor can renew it within policy or allow it to lapse. If an escalation is needed, the request can route through configured approval stages instead of becoming a permanent privilege grant that nobody revisits. For agent fleets, that human-in-the-loop model is not bureaucratic overhead. It is the thing that keeps nonhuman identities from becoming unmanaged business risk.</p>
<img src="https://learn.microsoft.com/en-us/entra/id-governance/media/manage-agent-sponsors/sponsor-workflow-tasks.png#lightbox" alt="" style="display:block;margin:0 auto" />

<h3>Bundling the Keys: Access Packages for Agents</h3>
<p>After accountability is in place, access itself needs structure. This is where entitlement management becomes the turning point. Microsoft Entra now supports access packages for agent identities, and this is arguably one of the most important capabilities in the entire story because it turns access from scattered assignments into a governed product.</p>
<p>Microsoft documents that access packages for agent identities can include security group memberships, Microsoft Entra roles, and OAuth API permissions, including application permissions to target APIs. That means organizations can build a reusable access pattern for an entire class of agents rather than granting rights one object at a time. A support-agent package might include membership in a bounded security group and a specific API permission. A finance automation package might include a time-bound directory role and group-based access to tightly controlled systems. An agent can request a package programmatically, a sponsor can request it on the agent’s behalf, or an administrator can assign it directly.</p>
<img src="https://learn.microsoft.com/en-us/entra/id-governance/media/entitlement-management-access-package-create/requests.png" alt="" style="display:block;margin:0 auto" />

<p>What makes this powerful is not just the packaging. It is the policy around the package. Microsoft’s access package model lets the admin define who can request access, how many approval stages apply, who the approvers are, how long the assignment lasts, and whether extension is allowed. That is the entitlement-based governance pattern enterprises have needed for agent fleets. The access is intentional, auditable, and time-bound. It is also worth being precise about the limits. Microsoft notes that agent identities and service principals cannot be added through these access packages to application roles, SAP roles, or SharePoint Online site roles, so those should not be described as supported package targets. The supported story today is group memberships, allowed Entra roles, and API permissions.</p>
<p>There is an even more important constraint sitting underneath that packaging model: not every role or permission can be granted to an agent in the first place. Microsoft maintains a specific list of Microsoft Entra roles allowed for agents and separately blocks a set of high-risk Microsoft Graph permissions for agent identities and blueprints, including examples such as <code>Application.ReadWrite.All</code>, <code>RoleManagement.ReadWrite.All</code>, <code>User.ReadWrite.All</code>, and <code>Directory.AccessAsUser.All</code>. The reason is straightforward and worth stating plainly. High-privilege directory roles and tenant-wide control permissions assume a human administrator exercising deliberate judgment. An autonomous or semi-autonomous agent with those privileges could delete users, alter security settings, or escalate access at machine speed. Microsoft’s design is intentionally restrictive so that agent authorization defaults to least privilege instead of turning every helpful assistant into a tiny, tireless super-admin.</p>
<h3>Giving Agents a Desk, Not Just a Badge: Agent Users and Work IQ</h3>
<p>Not every extension story stops with an agent identity acting like an application. Sometimes an agent needs to participate as a digital worker with user-shaped capabilities. That is where the agent user pattern becomes important. Microsoft’s agent user model gives an agent a dedicated nonhuman user account that is linked one-to-one with its parent agent identity. That user account is optional, not automatic, and should be created only when the agent truly needs to act in contexts where a user identity is required.</p>
<p>This distinction matters because an agent user is not just "an app with a mailbox." Microsoft documents it as a constrained user identity that receives user-type tokens, can be added to groups, can be licensed for Microsoft 365 resources, and can participate in collaboration scenarios, while still being prevented from behaving like a normal interactive human account. It cannot have passwords or passkeys, it cannot break away from its parent identity, and it cannot take on privileged admin roles. In other words, you get user-context capability without abandoning the nonhuman security model.</p>
<p>That opens up a useful design pattern for extending agents with Work IQ. Work IQ MCP servers give agents access to grounded Microsoft 365 context and deterministic tools across mail, calendar, Teams, SharePoint, OneDrive, Word, and user profile data. For many scenarios, delegated or On-Behalf-Of access is enough. But the agent user pattern becomes compelling when the agent must persist as a long-lived teammate rather than merely borrowing a person’s context for a moment.</p>
<p>Think about a few practical scenarios. A service desk agent acting as a round-the-clock digital employee might need its own mailbox, Teams presence, and membership in a support operations group so it can triage inbound issues, summarize the case history with Work IQ Mail, inspect meeting context with Work IQ Calendar, and post updates into the right Teams channel. A project coordinator agent might need to appear as a stable participant in recurring planning meetings, access collaboration spaces that are only exposed to user identities, and use Work IQ User, Calendar, and Teams to keep schedules, participant context, and follow-up actions aligned. A research or account-planning agent might use Work IQ Copilot, SharePoint, and OneDrive to gather organizational context, files, and prior conversations while retaining a persistent user-shaped identity that other systems can recognize as part of the working team.</p>
<p>The key point is that Agent ID governance and Agent 365 extensibility are not competing stories. They reinforce each other. Agent identities give you the control plane for lifecycle, access, and policy. Agent users give you a constrained pattern for user-centric collaboration scenarios. Work IQ gives those governed identities useful, grounded context and tools. Put together, they allow an agent to be more than a background daemon without letting it wander the tenant like a caffeinated intern with global admin. 🤪</p>
<h3>Setting the Bouncers: Conditional Access for Agents</h3>
<p>With provisioning and entitlement in place, the next step is to make the access decision itself more intelligent. Microsoft Entra Conditional Access for Agent ID brings that control to token acquisition by agent identities and agent users. This is where the phrase “adaptive risk evaluation before token issuance” becomes more than marketing language. It's essentially putting a bouncer at the door of your data, ensuring that merely shouting "Smoke me a kipper, I'll be back for breakfast!" isn't enough to secure an access token.</p>
<p>Microsoft’s documentation is explicit that Conditional Access applies when an agent identity or agent user requests a token for a resource. It does not apply when the blueprint acquires a token for Microsoft Graph to create agent identities or agent users, and it does not apply to the intermediate token exchange step itself. That nuance matters because it keeps the blog accurate and helps architects understand where enforcement actually sits. The control point is the moment an instantiated agent identity or agent user is trying to obtain access to a protected resource.</p>
<img src="https://learn.microsoft.com/en-us/entra/identity/conditional-access/media/agent-id/conditional-access-agent-settings.png#lightbox" alt="" style="display:block;margin:0 auto" />

<p>From there, Microsoft gives admins several ways to shape policy. Policies can target all agent identities, specific agents by object ID, agents grouped by blueprint, or agents selected through custom security attributes. Policies can also target all resources, all agent resources, or specific resources. Conditions include agent risk, and access controls can block. Microsoft even documents a concrete pattern for approval-aware policy design: assign custom security attributes such as an approval status to agents or blueprints, assign corresponding attributes to resources, and then use Conditional Access to block all agents except those reviewed and approved for that kind of resource.</p>
<p>That is an important shift. Conditional Access is no longer just a broad perimeter control for users and apps. In the Agent ID model, it becomes a way to express that a newly created or unreviewed agent should not be able to get a token to sensitive resources until it has passed the organization’s governance checkpoints. That is the practical meaning of agent-context-aware policy.</p>
<h3>Preventing the Zombie Apocalypse: Lifecycle Automation</h3>
<p>Once agents can be created and granted access safely, the next challenge is to make sure they do not keep that access forever. Lifecycle governance is where most identity programs prove whether they are durable, and the same is true for agent identity. A forgotten, over-permissioned agent is the IT equivalent of a dormant volcano—if left unchecked, eventually someone is going to have to say, "Everybody's dead, Dave."</p>
<p>Microsoft’s guidance already provides several lifecycle levers. Access package assignments can expire automatically. Sponsors can renew them or let them lapse. Agent owners and sponsors can disable agent identities through the My Account experience. Blueprint principals can be viewed and managed in the Entra admin center, where their linked identities, permissions, owners, sponsors, audit logs, and sign-in logs are visible. Microsoft also documents that blueprint principals can be disabled, and that disabling the blueprint prevents child agent identities from authenticating. On the extreme end, a blueprint principal can be removed from a tenant, and the blueprint documentation also notes that associated identities and users should be removed before deleting the blueprint itself.</p>
<img src="https://learn.microsoft.com/en-us/entra/includes/media/entitlement-management-lifecycle-policy/expiration.png" alt="" style="display:block;margin:0 auto" />

<p>This is also the right place to bring in access reviews, but carefully. Microsoft’s agent-specific governance guidance is strongest around access packages, sponsor approvals, expiration's, and administrative actions. Microsoft’s broader access review documentation, however, makes it clear that access reviews can be used to re-certify access to groups, enterprise applications, role assignments, and access package assignments. So the most accurate way to frame lifecycle reviews for agents is this: review the groups, applications, roles, and access package assignments that agent identities depend on, and use the results to remove unneeded access automatically where the integration supports it. That keeps the lifecycle story factual while still supporting the “review, remove access, revoke, deactivate” progression.</p>
<p>In practical terms, a mature lifecycle motion looks like this: review the package assignments and dependent resource access on a schedule, let expired assignments revoke access automatically, disable the agent identity if the agent should stop operating, remove unnecessary assignments and memberships, and disable the blueprint principal when the whole class of agent should no longer authenticate. That is how you keep the directory from becoming a graveyard of forgotten automation.</p>
<h3>Keeping an Eye on the Machines: Monitoring and Guardrails</h3>
<p>The final step is to accept that governance is not complete just because an agent was onboarded correctly. Agents must remain observable, and guardrails must hold when behavior changes. When things go sideways, you need a warning system significantly more helpful than Holly simply announcing, "Emergency. There's an emergency going on." This is where Microsoft Entra ID Protection, sign-in telemetry, audit logs, and network-level controls come together.</p>
<p>Microsoft Entra ID Protection for agents is designed to spot behaviors that fall outside the normal baseline for an agent. The current documented detection's include unfamiliar resource access, sign-in spikes, failed access attempts, delegated sign-in on behalf of a risky user, and threat-intelligence-backed suspicious activity. Admins can investigate risky agents, confirm compromise, dismiss false positives, or disable the agent entirely. That last point is important for the requested guardrail around compromised credentials. The most accurate way to say it is not that Entra blocks “compromised credentials” directly, but that it can flag risky or confirmed-compromised agents and feed that signal into Conditional Access policies that block high-risk agents before they obtain tokens for resources.</p>
<img src="https://learn.microsoft.com/en-us/entra/id-protection/media/concept-risky-agents/risky-agents-report.png#lightbox" alt="" style="display:block;margin:0 auto" />

<img src="https://learn.microsoft.com/en-us/entra/id-protection/media/concept-risky-agents/risky-agent-details.png#lightbox" alt="" style="display:block;margin:0 auto" />

<p>Microsoft also provides observability through sign-in logs and audit logs. For Conditional Access troubleshooting, admins can filter sign-in events by agent type. For blueprint principals, the Entra admin center exposes linked identities, status, permissions, audit logs, and sign-in logs. Inside Microsoft Agent 365, the registry and reporting model broadens that visibility further by giving IT and security teams a unified view of agents across supported platforms.</p>
<p>Then there are the network guardrails. Microsoft’s current Global Secure Access documentation is most explicit for Copilot Studio agent traffic, where tenant-level baseline profiles can enforce web content filtering, threat-intelligence filtering, and file filtering once traffic is forwarded through the service. Even where those network controls are product-scoped today, the design intent is clear: agent governance is not just identity issuance and access approval, it is also safe resource boundaries. The enterprise goal is to ensure an agent cannot freely roam to any API, any connector, or any destination simply because it can technically make a call. Safe boundaries come from a combination of least privilege, resource targeting, risk-based policy, and network restriction.</p>
<h2>Conclusion</h2>
<p>That is why the path to success matters. Microsoft Entra Agent ID is not valuable simply because it introduces new object types. Its value comes from the sequence it enables. First, define the blueprint model so agents are categorized and provisioned consistently. Then tie each agent to accountable humans. Govern access through packages instead of ad hoc grants. Make token issuance conditional on risk and approval state. Run lifecycle processes that can renew, expire, disable, and remove what is no longer needed. Finally, monitor what agents do and enforce guardrails when behavior moves out of bounds.</p>
<p>For organizations trying to govern AI copilots, automation bots, and autonomous agents seriously, that sequence is much closer to an operating model than a feature checklist. It reduces attack surface. It creates a consistent baseline. It improves visibility and compliance. Most importantly, it keeps the agent conversation anchored in identity engineering rather than novelty. In the Frontier-era Microsoft Agent 365 story, Microsoft Entra Agent ID is starting to look like the control plane enterprises will need if they want agent adoption to scale without losing control of who, or what, is acting inside the tenant. 🚀</p>
<h2>References</h2>
<p><a href="https://learn.microsoft.com/en-us/entra/agent-id/">Microsoft Entra Agent ID documentation</a>.</p>
<p><a href="https://learn.microsoft.com/en-us/entra/agent-id/identity-professional/microsoft-entra-agent-identities-for-ai-agents">What is Microsoft Entra Agent ID?</a>.</p>
<p><a href="https://learn.microsoft.com/en-us/entra/agent-id/identity-platform/what-is-agent-id-platform">What is Microsoft agent identity platform</a>.</p>
<p><a href="https://learn.microsoft.com/en-us/entra/agent-id/identity-platform/agent-blueprint">Agent identity blueprints in Microsoft Entra Agent ID</a>.</p>
<p><a href="https://learn.microsoft.com/en-us/entra/agent-id/identity-platform/create-blueprint">Create an agent identity blueprint</a>.</p>
<p><a href="https://learn.microsoft.com/en-us/entra/agent-id/identity-platform/what-is-agent-id">What are agent identities</a>.</p>
<p><a href="https://learn.microsoft.com/en-us/entra/agent-id/identity-platform/agent-identities">Agent identities in Microsoft Entra Agent ID</a>.</p>
<p><a href="https://learn.microsoft.com/en-us/entra/agent-id/identity-platform/agent-users">Agent users in Microsoft Entra Agent ID</a>.</p>
<p><a href="https://learn.microsoft.com/en-us/entra/agent-id/identity-platform/manage-agent-blueprint">View and manage agent identity blueprints in your tenant</a>.</p>
<p><a href="https://learn.microsoft.com/en-us/entra/identity/conditional-access/agent-id">Conditional Access for Agent ID (Preview)</a>.</p>
<p><a href="https://learn.microsoft.com/en-us/entra/id-governance/agent-id-governance-overview">Governing Agent Identities (Preview)</a>.</p>
<p><a href="https://learn.microsoft.com/en-us/entra/agent-id/identity-professional/authorization-agent-id">Authorization in Microsoft Entra Agent ID</a>.</p>
<p><a href="https://learn.microsoft.com/en-us/entra/agent-id/identity-professional/agent-access-packages">Access packages for Agent identities</a>.</p>
<p><a href="https://learn.microsoft.com/en-us/entra/agent-id/identity-platform/agent-users">The agent's user account in Microsoft Entra Agent ID</a>.</p>
<p><a href="https://learn.microsoft.com/en-us/entra/id-governance/access-reviews-overview">What are access reviews?</a>.</p>
<p><a href="https://learn.microsoft.com/en-us/entra/id-governance/access-reviews-application-preparation">Prepare for an access review of users' access to an application</a>.</p>
<p><a href="https://learn.microsoft.com/en-us/entra/id-governance/agent-sponsor-tasks">Agent identity sponsor tasks in Lifecycle Workflows (Preview)</a>.</p>
<p><a href="https://learn.microsoft.com/en-us/entra/id-protection/concept-risky-agents">ID Protection for agents (Preview)</a>.</p>
<p><a href="https://learn.microsoft.com/en-us/microsoft-agent-365/overview">Overview of Microsoft Agent 365</a>.</p>
<p><a href="https://learn.microsoft.com/en-us/microsoft-agent-365/tooling-servers-overview">Work IQ MCP overview (preview)</a>.</p>
<p><a href="https://learn.microsoft.com/en-us/microsoft-agent-365/admin/capabilities-entra">Protect agent identities with Microsoft Entra</a>.</p>
]]></content:encoded></item><item><title><![CDATA[Microsoft Entra Group Source of Authority Conversion with Cloud Sync Writeback]]></title><description><![CDATA[This post explores a specific hybrid identity scenario: taking a group synchronized from Active Directory Domain Services (AD DS), changing its Source of Authority (SOA) to Microsoft Entra, and then u]]></description><link>https://benroberts.io/microsoft-entra-group-source-of-authority-conversion-with-cloud-sync-writeback</link><guid isPermaLink="true">https://benroberts.io/microsoft-entra-group-source-of-authority-conversion-with-cloud-sync-writeback</guid><category><![CDATA[Entra ID]]></category><category><![CDATA[Powershell]]></category><category><![CDATA[identity-management]]></category><dc:creator><![CDATA[Ben Roberts]]></dc:creator><pubDate>Fri, 13 Mar 2026 20:57:56 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/66f75313534fa6a33c25d941/1e4410bd-fcf5-4e6a-9130-b85b8a9e7b20.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This post explores a specific hybrid identity scenario: taking a group synchronized from Active Directory Domain Services (AD DS), changing its Source of Authority (SOA) to Microsoft Entra, and then using Microsoft Entra Cloud Sync to write the group back to AD DS. The critical requirement in this pattern is configuring the writeback to reconnect to the <em>original</em> AD group rather than creating a duplicate object in a default Organization Unit (OU).</p>
<p>The standard capability to convert SOA for a group is well-documented, but combining it with Cloud Sync customized attribute mapping to seamlessly overwrite the existing AD group requires bringing together several different configuration concepts.</p>
<p>This post assumes that fundamental hybrid infrastructure is already established. AD DS, Microsoft Entra ID, identity synchronization, and Cloud Sync are deployed and functioning. The target group must already exist in AD and synchronize to Microsoft Entra.</p>
<h2>Scenarios / use cases</h2>
<p>Below are practical reasons why organizations utilize this conversion pattern:</p>
<p>- <strong>Legacy application authorization:</strong> On-premises applications that depend on LDAP queries or Kerberos tokens need security groups to stay in AD DS.</p>
<p>- <strong>Modernizing access governance:</strong> By changing the group's source of authority to Entra, administrators can utilize Entra ID Governance like access packages, access reviews and self-service management, which are difficult or impossible to perform efficiently in AD DS.</p>
<p>- <strong>Shifting the management plane:</strong> Rebuilding permission models across legacy applications is often not feasible. By transitioning group SOA to the cloud and mirroring the result backward, Entra becomes the control plane, while AD functions merely as a projection layer.</p>
<p>It is important to note that this is not a dual-write sync. Once a group's source of authority is moved to the cloud, any direct modifications to the on-premises AD group are treated as temporary and overwritten during the next provisioning cycle.</p>
<h2>Prerequisites</h2>
<p>Before converting the source of authority, ensure the following requirements are met:</p>
<p>- <strong>Supported sync client versions:</strong> Ensure Entra Connect Sync is running version 2.5.76.0 or later, and Cloud Sync is running version 1.1.1370.0 or later. For group writeback to AD DS, the Cloud Sync provisioning agent must be at version 1.1.3730.0 or later.</p>
<p>- <strong>Active Directory schema validation:</strong> The AD schema requires the <code>msDS-ExternalDirectoryObjectId</code> attribute, which is included by default in Windows Server 2016 and newer.</p>
<p>- <strong>Microsoft Graph permissions:</strong> Changing the group's source of authority requires the <code>Group-OnPremisesSyncBehavior.ReadWrite.All</code> permission. For delegated workflows, the least-privileged administrative role required is Hybrid Identity Administrator.</p>
<p>- <strong>Universal group scope:</strong> The existing AD group should be set to Universal scope before the conversion.</p>
<h2>Preserving the existing AD identity</h2>
<p>Since the goal is to map the cloud-managed group back to its original AD location, preservation steps must happen prior to the conversion. The existing distinguished name (DN) of the AD group must be mapped into Entra.</p>
<p>The most common enterprise pattern is that inbound synchronization still runs through Microsoft Entra Connect Sync, while Cloud Sync is introduced later for the Entra-to-AD writeback leg. In that model, the DN should be preserved in a tenant-scoped directory extension that Entra Connect Sync owns through its <code>Tenant Schema Extension App</code>. If the inbound leg is already running on Cloud Sync, the equivalent extension can instead be created on <code>CloudSyncCustomExtensionsApp</code>. Cloud Sync group writeback can consume extension attributes from either supported application, but Entra Connect Sync should not be described as exporting directly into a CloudSync-managed extension.</p>
<p>The following Microsoft Graph PowerShell example demonstrates creating the extension on <code>CloudSyncCustomExtensionsApp</code>, which is the pattern Microsoft documents for Cloud Sync-native extension mapping:</p>
<pre><code class="language-powershell">$tenantId = (Get-MgOrganization).Id
\(app = Get-MgApplication -Filter "identifierUris/any(uri:uri eq 'API://\)tenantId/CloudSyncCustomExtensionsApp')"
if (-not $app) {
    \(app = New-MgApplication -DisplayName "CloudSyncCustomExtensionsApp" -IdentifierUris "API://\)tenantId/CloudSyncCustomExtensionsApp"
}

\(sp = Get-MgServicePrincipal -Filter "AppId eq '\)($app.AppId)'"
if (-not $sp) {
    \(sp = New-MgServicePrincipal -AppId \)app.AppId
}

New-MgApplicationExtensionProperty -ApplicationId $app.Id -Name "GroupDN" -DataType "String" -TargetObjects Group
</code></pre>
<p>After creating or identifying the extension, the next step is to populate it. The objective is the same regardless of sync client: take the current AD distinguished name and copy it into an Entra extension attribute on the synchronized group before the SOA change.</p>
<h3>Common enterprise approach: Entra Connect Sync inbound</h3>
<p>If your tenant still uses Entra Connect Sync for the AD-to-Entra path, the cleanest supported pattern avoids custom synchronization rules entirely. Because converting a group's Source of Authority is typically a one-off transition for each group rather than a continuously synchronized state, you do not need a permanent script or custom rule bridging <code>distinguishedName</code> to an extension attribute.</p>
<p>A practical pattern is:</p>
<p>1. Pick an unused on-premises group attribute that can safely hold the original DN as a single string value (e.g., <code>extensionAttribute15</code>).</p>
<p>2. Populate that on-premises attribute with the group's current DN statically, just once, before the SOA change.</p>
<p>3. Use the Entra Connect wizard to enable that attribute for Group directory extensions. (Let the built-in sync rules flow the attribute natively).</p>
<p>4. Run the required import/sync steps so Entra Connect creates and populates the generated extension property in Entra.</p>
<p>5. Verify the value in Entra before changing <code>isCloudManaged</code>.</p>
<p>A short worked example from my lab looked like this:</p>
<p>- On-premises source attribute: <code>extensionAttribute15</code> on the group object</p>
<p>- Value stored before conversion (one-off text deployment): <code>CN=GroupSOADemo,OU=Groups,DC=contoso,DC=com</code></p>
<p>- Entra Connect wizard action: enable <code>extensionAttribute15</code> for Group directory extensions</p>
<p>- Resulting Entra attribute after sync: <code>extension_&lt;TenantSchemaExtensionAppId&gt;_extensionAttribute15</code></p>
<p>Custom rules in the Synchronization Rules Editor are functionally optional and only required if you decide to dynamically derive the DN value for ongoing synchronizations. By simply treating the extension attribute as a static, one-time payload populated before the cutover, the native directory extensions feature handles the rest without complex customized rule shapes.</p>
<p>After synchronization completes, verify that the generated Entra extension contains the full DN. That is the value later consumed by the Cloud Sync <code>ParentDistinguishedName</code> and <code>CN</code> expressions.</p>
<p>Running <code>Get-MgGroup -GroupId &lt;groupId&gt; -Property *</code> will show the on-prem extension properties in the <code>AdditionalProperties</code> collection, which is often easier to parse than the more complex <code>$expand=extensions</code> syntax. The key point is confirming that the extension contains the full DN before proceeding with the SOA switch.</p>
<img src="https://cdn.hashnode.com/uploads/covers/66f75313534fa6a33c25d941/e7f46b71-c45a-463d-95ee-7cf18e0b9ca2.png" alt="" style="display:block;margin:0 auto" />

<h3>Cloud Sync inbound alternative</h3>
<p>If the inbound leg is already on Cloud Sync, the workflow is simpler:</p>
<p>- Add or confirm an inbound attribute mapping for the group object that sources the AD distinguished name.</p>
<p>- Target the tenant-scoped extension attribute created on <code>CloudSyncCustomExtensionsApp</code>, such as <code>extension_&lt;appIdWithoutHyphens&gt;_GroupDN</code>.</p>
<p>- Run a sync cycle and wait for the group object in Entra to update.</p>
<p>- Confirm the value on the target group before changing <code>isCloudManaged</code>.</p>
<p>Validation matters here because the writeback configuration later depends on this value being correct. A simple validation approach is to query the group through Microsoft Graph and confirm the extension property contains the full original DN, for example <code>CN=GroupSOADemo,OU=Groups,DC=contoso,DC=com</code>. If the value is missing, truncated, or reflects a stale OU path, Cloud Sync will not have enough information to match the original object reliably.</p>
<p>For example, after the inbound mapping runs, retrieve the group and inspect the extension property:</p>
<pre><code class="language-plaintext">GET https://graph.microsoft.com/v1.0/groups/{groupId}?\(select=id,displayName&amp;\)expand=extensions
</code></pre>
<p>Depending on the client you use, it can be easier to request the specific extension property directly through Microsoft Graph PowerShell or Graph Explorer and verify that the stored DN exactly matches the current on-premises group DN. In an Entra Connect Sync-based deployment, this extension name will usually be in the form <code>extension_&lt;TenantSchemaExtensionAppId&gt;_&lt;AttributeName&gt;</code>. In a Cloud Sync-based deployment, it will typically be <code>extension_&lt;CloudSyncCustomExtensionsAppId&gt;_&lt;AttributeName&gt;</code>. That one check establishes that Entra now has a durable copy of the AD location data needed for the later <code>ParentDistinguishedName</code> and <code>CN</code> mappings.</p>
<h2>Converting Source of Authority</h2>
<p>The source-of-authority switch is performed against the Microsoft Graph API.</p>
<p>First, retrieve the current synchronization state:</p>
<pre><code class="language-http">GET https://graph.microsoft.com/v1.0/groups/{groupId}/onPremisesSyncBehavior?$select=isCloudManaged
</code></pre>
<p>For standard synced AD groups, the <code>isCloudManaged</code> property will evaluate to <code>false</code>. Next, issue a <code>PATCH</code> request to flip management to the cloud:</p>
<pre><code class="language-powershell">Invoke-MgGraphRequest -Method PATCH `
    -Uri "https://graph.microsoft.com/v1.0/groups/$groupId/onPremisesSyncBehavior" `
    -Body @{ isCloudManaged = $true }
</code></pre>
<p>Subsequent <code>GET</code> requests will show <code>isCloudManaged</code> as <code>true</code> and <code>onPremisesSyncEnabled</code> as <code>null</code>. At this operational boundary, the group becomes fully editable within Microsoft Entra. However, the legacy application integration depends on completing the writeback setup.</p>
<img src="https://cdn.hashnode.com/uploads/covers/66f75313534fa6a33c25d941/5e52bed9-d4e5-46aa-9728-d46c6e02c137.png" alt="" style="display:block;margin:0 auto" />

<p>You can also verify the change in the Entra portal. The group will lose its "Synchronized from on-premises Active Directory" status and the "Source" field will update to "Cloud". The critical part of this pattern is that the original AD group is not orphaned or duplicated, but rather becomes cloud-managed while retaining its original identity and location in AD. However, it will be flagged as excluded in Entra Connect Sync when the source of authority has changed.</p>
<p>Before:</p>
<img src="https://cdn.hashnode.com/uploads/covers/66f75313534fa6a33c25d941/e67f6f8f-b75e-4e1e-ae3f-11441b051b94.png" alt="" style="display:block;margin:0 auto" />

<p>After:</p>
<img src="https://cdn.hashnode.com/uploads/covers/66f75313534fa6a33c25d941/c452c66e-e8d8-49ea-9393-1a109bdff3e3.png" alt="" style="display:block;margin:0 auto" />

<h2>Configuring Cloud Sync Writeback</h2>
<p>The final component uses the preserved DN established in the prerequisites for the Entra-to-AD provisioning job.</p>
<p>In the Cloud Sync attribute mapping for groups, expression-based mappings must be configured for the <code>ParentDistinguishedName</code> and <code>CN</code> target attributes. The mapping definitions strip the <code>CN=</code> portion to identify the parent OU path for the <code>ParentDistinguishedName</code>, while separately extracting the <code>CN</code> string to specify the target group name.</p>
<p>Microsoft's documented pattern uses an extension name such as <code>extension_&lt;AppIdWithoutHyphens&gt;_GroupDistinguishedName</code>. In an enterprise tenant using Entra Connect Sync to flow an existing on-premises attribute (like <code>extensionAttribute15</code>), the stored DN resides on the <code>Tenant Schema Extension App</code> with a generated name like <code>extension_&lt;TenantSchemaExtensionAppId&gt;_extensionAttribute15</code>. Substitute your exact generated extension attribute name in the expressions below. The key point is that both expressions must reference the same stored DN value.</p>
<p>Use the following expression for <code>ParentDistinguishedName</code>:</p>
<pre><code class="language-text">IIF(
  IsPresent([extension_&lt;TenantSchemaExtensionAppId&gt;_extensionAttribute15]),
  Replace(
    Mid(
      Mid(
        Replace([extension_&lt;TenantSchemaExtensionAppId&gt;_extensionAttribute15], "\,", , , "\2C", , ),
        InStr(Replace([extension_&lt;TenantSchemaExtensionAppId&gt;_extensionAttribute15], "\,", , , "\2C", , ), ",", , ),
        9999
      ),
      2,
      9999
    ),
    "\2C", , , ",", ,
  ),
  "&lt;Existing ParentDistinguishedName&gt;"
)
</code></pre>
<p>This expression does two things. If the extension is populated, it removes the leading <code>CN=</code> segment and returns only the parent DN path. If the extension is empty, it falls back to the default target OU that you specify in the mapping.</p>
<p>Use the following expression for <code>CN</code>:</p>
<pre><code class="language-text">IIF(
  IsPresent([extension_&lt;TenantSchemaExtensionAppId&gt;_extensionAttribute15]),
  Replace(
    Replace(
      Replace(
        Word(Replace([extension_&lt;TenantSchemaExtensionAppId&gt;_extensionAttribute15], "\,", , , "\2C", , ), 1, ","),
        "CN=", , , "", ,
      ),
      "cn=", , , "", ,
    ),
    "\2C", , , ",", ,
  ),
  Append(Append(Left(Trim([displayName]), 51), "_"), Mid([objectId], 25, 12))
)
</code></pre>
<p>This expression extracts the first DN component, removes the <code>CN=</code> prefix, and restores any escaped commas that were temporarily converted during parsing. If the extension is not present, the fallback generates a deterministic CN from the group display name and part of the object ID.</p>
<p>Worked example:</p>
<p>- Stored extension value: <code>CN=GroupSOADemo,OU=Groups,DC=contoso,DC=com</code></p>
<p>- Resolved <code>ParentDistinguishedName</code>: <code>OU=Groups,DC=contoso,DC=com</code></p>
<p>- Resolved <code>CN</code>: <code>GroupSOADemo</code></p>
<p>This is the intended outcome of the two mappings together. The first expression removes the leading common name component and preserves the remaining OU and domain path. The second expression extracts only the common name so Cloud Sync can target the original group name in the original container.</p>
<p>This ensures Cloud Sync derives the proper container and naming convention to match the original AD object path, rather than generating an arbitrary object. When properly scoped and configured, the provisioning logs should indicate a match and update against the pre-existing target group, confirming the behavior framework.</p>
<p>The next step is to run the Cloud Sync provisioning job and monitor the logs for the expected match and update operations against the original AD group. If the expressions are correct and the extension contains the right DN, you should see a successful update rather than a creation of a new object in the default OU.</p>
<img src="https://cdn.hashnode.com/uploads/covers/66f75313534fa6a33c25d941/95f52a1f-e62b-4d71-a357-f23db06825c5.png" alt="" style="display:block;margin:0 auto" />

<p>You can perform an additional validation step by adding or removing a member from the group in Entra and confirming the change is reflected in AD DS after provisioning runs. This confirms that the writeback is functioning end-to-end and that the original group is now being managed from the cloud.</p>
<img src="https://cdn.hashnode.com/uploads/covers/66f75313534fa6a33c25d941/07209798-733e-4633-a4ab-201685b1ea28.png" alt="" style="display:block;margin:0 auto" />

<img src="https://cdn.hashnode.com/uploads/covers/66f75313534fa6a33c25d941/39ff8be5-3357-45c4-8957-4c5cc21c3555.png" alt="" style="display:block;margin:0 auto" />

<h2>Troubleshooting</h2>
<p>If your provisioning log shows <code>HybridSynchronizationActiveDirectoryInvalidGroupType</code>, the expression mappings are usually not the problem. That error normally means the matched on-premises target group is not a supported writeback target. In practice, recheck that the original AD object is a standard non-mail-enabled Security group, that its scope is Universal before the SOA cutover, and that it is not still carrying Exchange-style mail-enabled group characteristics from an earlier lifecycle.</p>
<p>A quick validation checklist for the original AD group is:</p>
<p>- <code>GroupCategory = Security</code></p>
<p>- <code>GroupScope = Universal</code></p>
<p>- Not a Mail-Enabled Security Group or Distribution List</p>
<p>- No Exchange dependency that keeps the group mail-enabled when Cloud Sync tries to update it</p>
<h2>Important caveats and limitations</h2>
<p>- <strong>Mail-enabled groups and writeback:</strong> While Mail-Enabled Security Groups (MESGs) and Distribution Lists (DLs) can have their Source of Authority converted to the cloud (for management via Exchange Online), they are not supported for Cloud Sync group writeback to AD DS. If provisioning logs show <code>HybridSynchronizationActiveDirectoryInvalidGroupType</code>, this unsupported target-group state is one of the first things to verify. The writeback pattern detailed in this post applies only to standard Security Groups.</p>
<p>- <strong>Entra Connect Sync customization boundaries:</strong> If you preserve the DN by using Entra Connect Sync, use custom synchronization rules with higher precedence than the defaults and avoid directly editing Microsoft out-of-box rules. Microsoft also documents that directory extensions owned by Entra Connect should be managed through the Entra Connect-supported model, because cloning or manually re-pointing directory extension rules can create upgrade and synchronization issues.</p>
<p>- <strong>Nested groups:</strong> Group SOA does not apply recursively. For nested synced groups to transition to cloud management, administrators must convert them iteratively, typically beginning at the lowest level of the hierarchy.</p>
<p>- <strong>Cloud-only users:</strong> Group provisioning to AD DS handles only member references with valid on-premises identity anchors. For hybrid groups containing both cloud-only and synchronized user accounts, Cloud Sync writes back the synchronized identities and skips cloud-only references.</p>
<p>- <strong>Prohibited local modifications:</strong> Post-conversion, the on-premises copy is no longer the source of truth. Any direct changes made against AD DS will be overwritten silently when the background provisioning system next executes.</p>
<p>- <strong>Scale constraints:</strong> For the Cloud Sync group provisioning job, <code>Selected security groups</code> is the recommended scope. There are scale boundaries documented regarding maximum groups, total processing memberships, and maximum membership per individual group (capped at 50,000 users).</p>
<p>- <strong>Rollback operations:</strong> Source of authority can be reverted by running a <code>PATCH</code> request setting <code>isCloudManaged</code> to <code>false</code>. The rollback process completes only when the directory sync client evaluates and re-assumes ownership on the next iteration. It is critical to sever cloud reference dependencies, such as clearing cloud-only members or removing associated Access Packages, prior to initiating a rollback.</p>
<h2>Architecture</h2>
<p>The following diagram illustrates the complete logical workflow.</p>
<img src="https://cdn.hashnode.com/uploads/covers/66f75313534fa6a33c25d941/ee22066d-8517-4d8e-8303-3e747cc44c1f.png" alt="" style="display:block;margin:0 auto" />

<h2>Conclusion</h2>
<p>Converting a group's Source of Authority from Active Directory to Microsoft Entra, combined with Cloud Sync writeback mapping, provides a precise bridge between overlapping architectures. Administrators can transition group management, approvals, and dynamic requirements to Entra ID while safely preserving the exact group structure that existing LDAP or Kerberos applications rely on.</p>
<p>Once you have validated the core pipeline of storing the <code>GroupDN</code> and applying custom Cloud Sync expressions, there are several ways to scale or apply this model:</p>
<p>- <strong>Connect Group SOA to Access Packages:</strong> With management shifted to Entra ID, these groups are eligible for Microsoft Entra ID Governance. This means you can wrap the group in an Access Package, enabling self-service requests and automated access reviews for legacy apps without writing custom code or deploying complex on-premises identity managers.</p>
<p>- <strong>Implement AD DS Minimization:</strong> Evaluate whether certain applications have modernized entirely to SAML or OpenID Connect. If they no longer require the AD group for authorization, you can flip their group SOA to the cloud and simply skip the Cloud Sync writeback step. Over time, this steadily reduces reliance on AD DS.</p>
<p>- <strong>Address Exchange Dependencies:</strong> Distribution Lists (DL) and Mail-Enabled Security Groups (MESG) synced from on-premises Exchange can also have their SOA converted. While these cannot be directly managed by Microsoft Graph, converting them enables management through Exchange Online PowerShell. From there, you can upgrade non-nested DLs into modern Microsoft 365 Groups for richer collaboration.</p>
<p>- <strong>Convert Groups Systematically:</strong> For nested groups, begin converting from the lowest level of the hierarchy, as the SOA switch does not apply recursively.</p>
<h2>References</h2>
<p>1. [Guidance for using Group Source of Authority (SOA)](<a href="https://learn.microsoft.com/en-us/entra/identity/hybrid/concept-group-source-of-authority-guidance">https://learn.microsoft.com/en-us/entra/identity/hybrid/concept-group-source-of-authority-guidance</a>)</p>
<p>2. [Configure Group Source of Authority (SOA)](<a href="https://learn.microsoft.com/en-us/entra/identity/hybrid/how-to-group-source-of-authority-configure">https://learn.microsoft.com/en-us/entra/identity/hybrid/how-to-group-source-of-authority-configure</a>)</p>
<p>3. [Embrace cloud-first posture: Convert Group Source of Authority to the cloud](<a href="https://learn.microsoft.com/en-us/entra/identity/hybrid/concept-source-of-authority-overview">https://learn.microsoft.com/en-us/entra/identity/hybrid/concept-source-of-authority-overview</a>)</p>
<p>4. [Tutorial - Provision groups to Active Directory Domain Services by using Microsoft Entra Cloud Sync](<a href="https://learn.microsoft.com/en-us/entra/identity/hybrid/cloud-sync/tutorial-group-provisioning">https://learn.microsoft.com/en-us/entra/identity/hybrid/cloud-sync/tutorial-group-provisioning</a>)</p>
<p>5. [Group writeback with Microsoft Entra Cloud Sync](<a href="https://learn.microsoft.com/en-us/entra/identity/hybrid/group-writeback-cloud-sync">https://learn.microsoft.com/en-us/entra/identity/hybrid/group-writeback-cloud-sync</a>)</p>
<p>6. [Cloud sync directory extensions and custom attribute mapping](<a href="https://learn.microsoft.com/en-us/entra/identity/hybrid/cloud-sync/custom-attribute-mapping">https://learn.microsoft.com/en-us/entra/identity/hybrid/cloud-sync/custom-attribute-mapping</a>)</p>
<p>7. [Attribute mapping - Active Directory to Microsoft Entra ID](<a href="https://learn.microsoft.com/en-us/entra/identity/hybrid/cloud-sync/how-to-attribute-mapping">https://learn.microsoft.com/en-us/entra/identity/hybrid/cloud-sync/how-to-attribute-mapping</a>)</p>
<p>8. [Writing expressions for attribute mappings in Microsoft Entra ID](<a href="https://learn.microsoft.com/en-us/entra/identity/hybrid/cloud-sync/reference-expressions">https://learn.microsoft.com/en-us/entra/identity/hybrid/cloud-sync/reference-expressions</a>)</p>
<p>9. [Expression builder with cloud sync](<a href="https://learn.microsoft.com/en-us/entra/identity/hybrid/cloud-sync/how-to-expression-builder">https://learn.microsoft.com/en-us/entra/identity/hybrid/cloud-sync/how-to-expression-builder</a>)</p>
<p>10. [onPremisesSyncBehavior resource type](<a href="https://learn.microsoft.com/en-us/graph/api/resources/onpremisessyncbehavior">https://learn.microsoft.com/en-us/graph/api/resources/onpremisessyncbehavior</a>)</p>
<p>11. [Get onPremisesSyncBehavior](<a href="https://learn.microsoft.com/en-us/graph/api/onpremisessyncbehavior-get">https://learn.microsoft.com/en-us/graph/api/onpremisessyncbehavior-get</a>)</p>
<p>12. [Update onPremisesSyncBehavior](<a href="https://learn.microsoft.com/en-us/graph/api/onpremisessyncbehavior-update">https://learn.microsoft.com/en-us/graph/api/onpremisessyncbehavior-update</a>)</p>
<p>13. [Microsoft Entra Connect Sync: Directory extensions](<a href="https://learn.microsoft.com/en-us/entra/identity/hybrid/connect/how-to-connect-sync-feature-directory-extensions">https://learn.microsoft.com/en-us/entra/identity/hybrid/connect/how-to-connect-sync-feature-directory-extensions</a>)</p>
<p>14. [Microsoft Entra Connect Sync: Make a change to the default configuration](<a href="https://learn.microsoft.com/en-us/entra/identity/hybrid/connect/how-to-connect-sync-change-the-configuration">https://learn.microsoft.com/en-us/entra/identity/hybrid/connect/how-to-connect-sync-change-the-configuration</a>)</p>
<p>15. [Microsoft Entra Connect Sync: Best practices for changing the default configuration](<a href="https://learn.microsoft.com/en-us/entra/identity/hybrid/connect/how-to-connect-sync-best-practices-changing-default-configuration">https://learn.microsoft.com/en-us/entra/identity/hybrid/connect/how-to-connect-sync-best-practices-changing-default-configuration</a>)</p>
]]></content:encoded></item><item><title><![CDATA[Enhance Entra Identity Governance with Azure Event Grid]]></title><description><![CDATA[Microsoft Graph change notifications are a great building block, but the “classic” model (hosting a public HTTPS webhook) tends to get uncomfortable fast and won't make you any friends in the Cybersecurity or Networking/Gateway teams to manage the in...]]></description><link>https://benroberts.io/enhance-entra-identity-governance-with-azure-event-grid</link><guid isPermaLink="true">https://benroberts.io/enhance-entra-identity-governance-with-azure-event-grid</guid><category><![CDATA[Azure]]></category><category><![CDATA[Azure Functions]]></category><category><![CDATA[Powershell]]></category><category><![CDATA[Entra ID]]></category><dc:creator><![CDATA[Ben Roberts]]></dc:creator><pubDate>Sun, 01 Feb 2026 13:00:32 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1767066485720/0c099036-6fc8-465a-88ef-18c707d5523c.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Microsoft Graph change notifications are a great building block, but the “classic” model (hosting a public HTTPS webhook) tends to get uncomfortable fast and won't make you any friends in the Cybersecurity or Networking/Gateway teams to manage the internet-exposed endpoint. 🤣</p>
<p>Microsoft Graph also supports delivering these change events <em>through Azure Event Grid</em> (via <strong>Event Grid Partner Events</strong>). Instead of posting directly to your webhook, Graph publishes events into an Event Grid <strong>partner topic</strong> in your Azure subscription. From there, you use a normal Event Grid <strong>event subscriptions</strong> to route events to whatever you want 🎉 . You still need to handle at-least-once delivery and subscription lifecycle events, but you get Azure-native routing, filtering, monitoring, and dead-lettering without putting an HTTP endpoint on the public internet.</p>
<p>This project utilizes Microsoft Graph to deliver events into <strong>Azure Event Grid Partner Events</strong>, and from there I route them into <strong>Azure Functions (PowerShell)</strong> with proper buffering, idempotency, dead-lettering, and subscription lifecycle handling.</p>
<p>In this post I detail an identity governance scenario (joiner/mover/leaver-style signals on <code>users</code>), but the pattern is useful anywhere you want “near real-time” identity change pipelines: birthright automation, security reactions, audit/enrichment streams into SIEM, or ops workflows that open tickets and notify owners.</p>
<p>A side benefit of this design is that it removes the temptation to “paper over” eventual consistency and transient failures with arbitrary <code>Start-Sleep</code> calls and retry loops sprinkled throughout your business logic. Instead of making the application guess when Graph has “settled” (or how long downstream systems need), you let the platform absorb the timing variability: Event Grid provides durable at-least-once delivery, the Function handler turns each event into a queued work item, and workers process asynchronously with explicit retry and dead-letter semantics. If a dependency is temporarily unavailable (Graph throttling, directory propagation delays, downstream API outages), you can retry in a controlled way (with backoff, visibility timeouts, and poison-message handling) without blocking an HTTP request, holding open connections, or burning compute on sleeps.</p>
<p>Because events can be duplicated and delivered out of order, the pipeline also makes idempotency a first-class concern: a stable dedupe key is recorded once, and replays become a no-op instead of “retry storms” that require defensive delays. The result is fewer brittle timing hacks in the app, clearer operational behavior (you can see retries and failures as messages), and a system that degrades predictably when the real world is slow—without turning every caller into a hand-rolled queue processor.</p>
<p>This pattern is particularly well-suited to hybrid identity environments where you're waiting for on-prem changes to sync up to Entra ID via Entra Connect/Cloud Sync. By leveraging Event Grid's reliable delivery and Azure Functions' asynchronous processing, you can effectively manage the inherent delays in synchronization without complicating your application logic.</p>
<p>Other possible scenarios include:</p>
<ul>
<li><p>User risk event processing</p>
</li>
<li><p>Privileged role assignment changes</p>
</li>
<li><p>Group membership changes</p>
</li>
<li><p>Application credential changes</p>
</li>
<li><p>And more (whatever Entra publishes through Event Grid partners!)</p>
</li>
</ul>
<h2 id="heading-what-about-entra-id-governance-lifecycle-workflows">What about Entra ID Governance Lifecycle Workflows?</h2>
<p>Compared to Entra ID Governance Lifecycle Workflows, this pattern trades “built-in product workflow” for “programmable event pipeline”: the big wins are flexibility (any Graph/API call or downstream integration), near real-time reactions (seconds/minutes instead of scheduled runs), richer idempotency/retry/dead-letter control, and easier extension beyond the lifecycle catalog (tickets, SIEM enrichment, custom audits); the downsides are you own more of the engineering surface area (deployments, monitoring, versioning, security reviews), you must correctly manage and periodically re-validate Graph permissions/consent, and you’re responsible for guardrails and supportability that Lifecycle Workflows provide out of the box (scoping, approvals/notifications, reporting, and “it’s supported by Microsoft” operational expectations).</p>
<h2 id="heading-the-demo-environment">The Demo Environment</h2>
<p>The promise is simple: I run one deployment script and end up with a working pipeline where Graph events land in my Function App reliably, with dead-lettering and lifecycle handling.</p>
<p>First the infrastructure is deployed using Bicep templates in <code>infra/</code>. The core resources are:</p>
<ul>
<li><p><strong>Azure Function App</strong> (PowerShell) with managed identity</p>
</li>
<li><p><strong>Storage Account</strong> with queues (work items + lifecycle) and table (dedupe keys)</p>
</li>
<li><p><strong>Event Grid Partner Configuration</strong> authorizing Microsoft Graph</p>
</li>
<li><p><strong>App Insights</strong> for logging and monitoring</p>
</li>
</ul>
<p>Next, the deployment scripts in <code>scripts/</code> wire everything up. They create and assign a user-assigned managed identity, assign Azure RBAC and Graph app roles, deploy the Bicep templates, zip-deploy the Function code, create the Graph subscription that delivers to Event Grid, activate the partner topic, and finally create the event subscription from the partner topic to the Function app using <strong>CloudEvents 1.0</strong> as the delivery schema.</p>
<p>Inside the Function App I treat incoming events as “messages,” not “requests”: I dedupe them using <strong>Table Storage</strong> (Entra ID auth, no keys) and buffer them through <strong>Storage Queues</strong> so downstream workers can run independently. Lifecycle events are handled too: the subscription gets reauthorized and renewed so the demo doesn’t silently die.</p>
<p>Finally, the event subscription is configured with <strong>dead-lettering</strong> to blob storage using a <strong>user-assigned managed identity</strong> (so there are no storage keys sitting in configuration).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767067010990/7b1bde11-6c13-42f2-8406-a3d965bda503.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-a-guided-tour-of-the-architecture">A guided tour of the architecture</h2>
<p>Here’s the complete flow at a glance. I’ll walk it left-to-right in the next sections.</p>
<pre><code class="lang-mermaid">flowchart LR
  Graph[Microsoft Graph&lt;br&gt;change notifications] --&gt;|Partner Events| PC[Event Grid&lt;br&gt;Partner Configuration]
  PC --&gt; PT[Event Grid&lt;br&gt;Partner Topic]

  PT --&gt; ES[Event Subscription&lt;br&gt;CloudEvents 1.0]
  ES --&gt;|invoke| GEH[Azure Function&lt;br&gt;**GovernanceEventHandler**]

  GEH --&gt;|dedupe key| Dedupe[Table Storage&lt;br&gt;**DedupeKeys**]
  GEH --&gt;|enqueue| WorkQ[Storage Queue&lt;br&gt;**workitems**]
  GEH --&gt;|enqueue lifecycle| LifeQ[Storage Queue&lt;br&gt;**lifecycle**]

  WorkQ --&gt; BW[Azure Function&lt;br&gt;**BirthrightWorker**]
  LifeQ --&gt; SLW[Azure Function&lt;br&gt;**SubscriptionLifecycleWorker**&lt;br&gt;reauthorize + renew]

  ES --&gt;|dead-letter| DL[Blob Storage&lt;br&gt;**dead-letter** container]

  UAMI[User-assigned&lt;br&gt;managed identity] --&gt; GEH
  UAMI --&gt; BW
  UAMI --&gt; SLW
  UAMI --&gt; Dedupe
  UAMI --&gt; WorkQ
  UAMI --&gt; LifeQ
  UAMI --&gt; DL
</code></pre>
<h2 id="heading-repo-map-where-to-look-first">Repo map (where to look first)</h2>
<p>I split the repo into three chunks: infrastructure (<code>infra/</code>), orchestration scripts (<code>scripts/</code>), and the Function App (<code>src/FunctionApp/</code>). That mirrors how I think about the system: “build the platform”, “wire it up”, then “process events”.</p>
<p>And the directory structure looks like:</p>
<pre><code class="lang-text">project-eventgrid-partnerconfiguration
├── infra
│   ├── main.bicep
│   ├── link.bicep
│   └── parameters.dev.bicepparam
├── scripts
│   ├── Deploy-Infrastructure.ps1
│   ├── Deploy-FunctionCode.ps1
│   ├── New-GraphUsersSubscriptionToEventGrid.ps1
│   ├── Activate-EventGridPartnerTopic.ps1
│   ├── Grant-GraphAppRolesToManagedIdentity.ps1
│   └── Set-Policy.ps1
└── src
    └── FunctionApp
        ├── GovernanceEventHandler
        ├── BirthrightWorker
        ├── SubscriptionLifecycleWorker
        ├── Modules
        │   └── GovernanceAutomation
        │       ├── GovernanceAutomation.psm1
        │       └── GovernanceAutomation.psd1
        ├── policy
        │   └── policy.json
        ├── host.json
        ├── local.settings.example.json
        ├── profile.ps1
        └── requirements.psd1
</code></pre>
<h2 id="heading-whats-where-and-what-it-does">What’s where (and what it does)</h2>
<h3 id="heading-infrastructure">Infrastructure</h3>
<p>I intentionally split IaC into two stages.</p>
<p><code>infra/main.bicep</code> is “core infra only”: it creates the Function App, storage (queues + dedupe table), diagnostics, and the Event Grid partner configuration authorization. The Function runs under a <strong>user-assigned managed identity (UAMI)</strong>.</p>
<p><code>infra/link.bicep</code> is the second stage that I run <em>after</em> Graph has created the partner topic. It attaches the UAMI to the partner topic (while preserving the partner-provided <code>properties.source</code>), creates dead-letter storage + RBAC, and then creates the partner topic → Function event subscription using <code>CloudEventSchemaV1_0</code>.</p>
<p><code>infra/parameters.dev.bicepparam</code> is just the “comfortable defaults” file for <code>main.bicep</code>.</p>
<h3 id="heading-scripts">Scripts</h3>
<p>The scripts are the glue that makes this feel like “one command deploy” instead of “seven manual steps.”</p>
<p><code>scripts/Deploy-Infrastructure.ps1</code> is the main entry point: it creates/uses the UAMI, assigns Azure RBAC, assigns Graph <strong>application</strong> roles to the managed identity (defaults include <code>User.ReadWrite.All</code>, <code>Directory.Read.All</code>, and <code>Group.ReadWrite.All</code>), deploys <code>main.bicep</code>, deploys the function code, bootstraps Graph → Event Grid, activates the partner topic, and then deploys <code>link.bicep</code>. It also makes re-runs sane by picking a unique partner topic name when the deterministic name already exists.</p>
<p>If you want to focus on individual steps, <code>scripts/Deploy-FunctionCode.ps1</code> does the zip deploy for <code>src/FunctionApp</code>, <code>scripts/New-GraphUsersSubscriptionToEventGrid.ps1</code> creates the Graph subscription (and supports <code>-UseAzCliGraphToken</code> so you don’t need the Microsoft Graph PowerShell module), and <code>scripts/Activate-EventGridPartnerTopic.ps1</code> flips the partner topic into the active state so events will flow.</p>
<p><img src alt class="image--center mx-auto" /></p>
<h3 id="heading-function-code">Function code</h3>
<p>I treat the Function App as a small event-processing system.</p>
<p><code>src/FunctionApp/GovernanceEventHandler/run.ps1</code> is the Event Grid trigger. It normalizes the incoming payload (Graph partner events arrive as <strong>CloudEvents 1.0</strong>), generates a stable dedupe key, records idempotency in Table Storage (managed identity auth), and then enqueues a work item. Lifecycle notifications get routed to a dedicated queue.</p>
<p>The heavy lifting lives in <code>src/FunctionApp/Modules/GovernanceAutomation/GovernanceAutomation.psm1</code>: schema normalization, stable dedupe keys, Table Storage idempotency, and the Graph lifecycle calls (reauthorize + renew).</p>
<p>From there, <code>src/FunctionApp/SubscriptionLifecycleWorker/run.ps1</code> processes lifecycle messages and validates <code>clientState</code> (via <code>GRAPH_CLIENT_STATE</code>) before calling Graph, and <code>src/FunctionApp/BirthrightWorker/run.ps1</code> processes work items and applies birthright group assignments for <strong>newly created users</strong>.</p>
<p>Important Graph nuance: for <code>users</code> subscriptions, user creation shows up as an <code>updated</code> notification (Graph does not emit a <code>created</code> changeType for user resources). To avoid accidentally touching every user update, the birthright worker gates “new user” using <code>createdDateTime</code> proximity to the event time.</p>
<p>The policy itself is in <code>src/FunctionApp/policy/policy.json</code>.</p>
<h4 id="heading-how-policyjson-drives-decisions">How <code>policy.json</code> drives decisions</h4>
<p>I wanted the “governance logic” to be editable without touching code, so the Function reads a JSON policy file and uses it to decide whether it should only <em>detect</em> and log something, or whether it should eventually <em>remediate</em>.</p>
<p>At the top level, the policy has a <code>mode</code> (for example <code>detect</code>) and a <code>version</code>. In <code>detect</code> mode, the workers still process events and emit useful logs, but they’re intentionally conservative about making changes.</p>
<p>There are three main ideas inside the policy:</p>
<ol>
<li><strong>Birthright assignments</strong></li>
</ol>
<p>The <code>birthrights</code> block is how I model “when a user appears (or changes), what should they get by default?”</p>
<p>In the current policy file, <code>birthrights.mode</code> is set to <code>remediate</code>, and the default assignment matches <code>userType: "Member"</code> and adds the user to a specific Entra security group:</p>
<pre><code class="lang-json"><span class="hljs-string">"birthrights"</span>: {
  <span class="hljs-attr">"enabled"</span>: <span class="hljs-literal">true</span>,
  <span class="hljs-attr">"mode"</span>: <span class="hljs-string">"remediate"</span>,
  <span class="hljs-attr">"assignments"</span>: [
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"Default-Birthright"</span>,
      <span class="hljs-attr">"when"</span>: { <span class="hljs-attr">"userType"</span>: <span class="hljs-string">"Member"</span> },
      <span class="hljs-attr">"addToGroups"</span>: [<span class="hljs-string">"928bd0ce-8abc-43dd-94a0-d350fe49e991"</span>]
    }
  ]
}
</code></pre>
<p>The <code>BirthrightWorker</code> reads this policy and, for newly created users only, calls Microsoft Graph to add the user as a member of each configured group.</p>
<ol start="2">
<li><strong>Safety rails (break-glass + allow lists)</strong></li>
</ol>
<p>The <code>breakGlass</code> section is where I list accounts that should be treated as special (for example, never auto-remove, never auto-disable). The <code>allowLists</code> section is the opposite: known-good groups/apps/roles that the automation can ignore or treat as explicitly permitted.</p>
<ol start="3">
<li><strong>Rules: match an event, then choose an action</strong></li>
</ol>
<p>The <code>rules</code> array is the “if this, then that” part. Each rule has <code>criteria</code> that matches against the normalized work item (for example <code>eventTypeStartsWith</code>, <code>subjectContains</code>, and a higher-level <code>condition</code> like <code>UserDisabled</code>). When a rule matches, the <code>action</code> describes what to do. In the sample policy there’s a “leaver” style rule that, when a user is disabled, would run steps like <code>RevokeSessions</code> and <code>RemoveFromHighRiskGroups</code>.</p>
<p>The important part is that this policy file is read in the context of the event pipeline: Graph emits change events, the handler normalizes/dedupes/queues them, and the worker uses <code>policy.json</code> to decide what it would do next.</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>This is a PowerShell-first repo. I run it with PowerShell 7.4+ and Azure CLI (<code>az</code>) authenticated via <code>az login</code>. You also need enough Azure permissions to create resources in your target subscription/resource group.</p>
<p>On the Entra side, you need permission to create Microsoft Graph subscriptions (delegated), and admin consent permissions to assign Microsoft Graph application roles to the managed identity.</p>
<p>For birthright group assignment (adding users to groups), the managed identity needs (at minimum) a Graph application permission that can add group members:</p>
<ul>
<li><code>GroupMember.ReadWrite.All</code> (least-privilege for add/remove members)</li>
</ul>
<p>This repo’s deployment script uses <code>Group.ReadWrite.All</code> by default (broader, but simple) plus user/directory read permissions so the worker can query <code>createdDateTime</code> for the new-user gate.</p>
<hr />
<h2 id="heading-the-one-command-deployment">The one-command deployment</h2>
<p>This is the command I use when I want complete setup without thinking too hard about the order of operations:</p>
<pre><code class="lang-powershell">pwsh ./scripts/Deploy<span class="hljs-literal">-Infrastructure</span>.ps1 `
  <span class="hljs-literal">-SubscriptionId</span> &lt;sub&gt; `
  <span class="hljs-literal">-ResourceGroupName</span> &lt;rg&gt; `
  <span class="hljs-literal">-Location</span> &lt;location&gt;
</code></pre>
<p>What it does (in order):</p>
<ol>
<li><p>Creates (or reuses) a user-assigned managed identity (UAMI)</p>
</li>
<li><p>Assigns Azure RBAC on the resource group (so the deployment can create resources)</p>
</li>
<li><p>Assigns Graph app roles to that identity (defaults include <code>User.ReadWrite.All</code>, <code>Directory.Read.All</code>, and <code>Group.ReadWrite.All</code>)</p>
</li>
<li><p>Deploys <code>infra/main.bicep</code></p>
</li>
<li><p>Zip deploys the Function code</p>
</li>
<li><p>Creates a Graph subscription that delivers to Event Grid (partner topic)</p>
</li>
<li><p>Activates the partner topic</p>
</li>
<li><p>Deploys <code>infra/link.bicep</code> to create the partner topic → function event subscription (with dead-lettering)</p>
</li>
</ol>
<p>Notes for re-runs:</p>
<p>If the partner topic name already exists, the script automatically chooses a new unique one. If you want full control (for example, because you’re integrating into another environment), you can pass <code>-PartnerTopicName &lt;partnerTopic&gt;</code>.</p>
<p>Similarly, a new User-Assigned Managed Identity (UAMI) is created by default (each deployment). You can pass in <code>-BootstrapUserAssignedIdentityName '&lt;name&gt;'</code> to use an identity from a previous deployment (or one created outside this project).</p>
<h2 id="heading-how-i-sanity-check-it-events-dedupe-lifecycle">How I sanity-check it (events + dedupe + lifecycle)</h2>
<p>To verify everything is working, create a new user in your Entra tenant. In the <code>GovernanceEventHandler</code> invocation logs you should see a received event with CloudEvents fields like <code>type</code>, <code>subject</code>, and <code>time</code>. You should also see a corresponding event in the <code>BirthrightWorker</code> logs showing the work item being dequeued with the same <code>correlationId</code> and event details.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767067544734/f693b9c6-8b6c-4354-a953-ede22ad55435.png" alt class="image--center mx-auto" /></p>
<p>For a newly created user (within the configured window), you should see a log like:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767067088850/53523b39-1a26-4f62-9aef-b09ec7abdda4.png" alt class="image--center mx-auto" /></p>
<p>If Graph sends multiple events, the dedupe processing logs look like:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767067273500/60d78bd8-9ac4-4adb-b592-3863b383baa5.png" alt class="image--center mx-auto" /></p>
<p>For a normal user update or change event, you should see a log like:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767071164828/bb213fc2-3479-454b-93f5-7140246ed876.png" alt class="image--center mx-auto" /></p>
<p>And when Graph sends lifecycle notifications, you'll see logs for reauthorization and renewal (for example, <code>Graph subscription reauthorized</code> and <code>Graph subscription renewed</code>).</p>
<hr />
<h2 id="heading-the-bits-worth-stealing">The bits worth stealing</h2>
<p>The two-stage IaC split (<code>main.bicep</code> + <code>link.bicep</code>) is a useful pattern when working with Event Grid partner topics, because you often need to create the partner topic first (via Graph) before you can attach identities and event subscriptions.</p>
<p>I also like the security posture here. Dead-lettering uses <code>deadLetterWithResourceIdentity</code>, so there are no storage keys in app settings. For idempotency I store a hashed dedupe key in Table Storage using Entra auth and treat HTTP 409 as “already processed.” And because Graph partner events arrive as CloudEvents 1.0, I normalize them into a stable work-item shape before queuing.</p>
<p>Finally, the lifecycle worker is what makes this demo feel “production-ish”: it reauthorizes and renews subscriptions so the pipeline keeps working across longer-lived test runs.</p>
<hr />
<h2 id="heading-lessons-learned-so-you-dont-hit-the-same-walls">Lessons learned (so you don’t hit the same walls)</h2>
<p>I tripped over a few sharp edges building this.</p>
<p>Partner topics are <strong>partner-created</strong>; if you try to “just create them in Bicep” without the partner metadata, it fails. Even when you’re updating a partner topic, you have to preserve partner-provided fields like <code>properties.source</code>.</p>
<p>I also hit an ARM circular dependency when I tried to “read the existing partner topic source” and “update partner topic identity” in the same template. The pragmatic fix was to pass <code>partnerTopicSource</code> in from the script.</p>
<p>Re-runs matter a lot for demos: if you re-run Graph bootstrap with the same partner topic name and it already exists, Graph can fail. Picking a unique name on re-run makes the whole thing repeatable.</p>
<p>On the Event Grid side, target validation happens up-front, so the Function needs to exist before creating the event subscription. And in Functions, PowerShell queue triggers can hand payloads in different shapes (string/base64/object), so defensive parsing is worth the effort.</p>
<p>Finally, RBAC is non-negotiable for the data plane: use <strong>Storage Queue Data Contributor</strong> for queues, I tried least-privilege roles like <code>Storage Queue Data Sender</code> and <code>Storage Queue Data Processor</code> but they fell short. Use <strong>Storage Table Data Contributor</strong> for tables, and <strong>Storage Blob Data Contributor</strong> for the dead-letter container.</p>
<p>On the Graph side, group membership writes require group permissions (<code>GroupMember.ReadWrite.All</code> or broader). A common failure mode is granting only user permissions (like <code>User.ReadWrite.All</code>) and then getting a 403 when trying to add the user to a group.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Using Event Grid partner events to receive Microsoft Graph change notifications is a powerful pattern that avoids the need to host public webhooks. By combining Event Grid with Azure Functions, managed identities, and proper idempotency and lifecycle handling, you can build robust identity event pipelines that are secure and maintainable. This project provides a solid foundation to get started, and I encourage you to adapt and extend it for your own identity governance scenarios. 🚀</p>
<h2 id="heading-references">References</h2>
<ul>
<li><p>My Repo: <a target="_blank" href="https://github.com/broberts23/vsCode/tree/main/project-eventgrid-partnerconfiguration">vsCode/project-eventgrid-partnerconfiguration at main · broberts23/vsCode</a></p>
</li>
<li><p><a target="_blank" href="https://github.com/broberts23/vsCode/tree/main/project-eventgrid-partnerconfiguration">Microsoft Graph API change events through Azure Event Grid overview</a></p>
</li>
<li><p><a target="_blank" href="https://learn.microsoft.com/en-us/azure/event-grid/partner-events-overview">Event Grid partner events documentation</a></p>
</li>
<li><p><a target="_blank" href="https://learn.microsoft.com/en-us/azure/event-grid/cloud-event-schema">CloudEvents 1.0 schema reference for Event Grid</a></p>
</li>
<li><p><a target="_blank" href="https://learn.microsoft.com/en-us/azure/event-grid/delivery-and-retry#use-managed-identities-to-access-azure-storage">Use managed identities to access Azure Storage from Azure Event Grid</a></p>
</li>
<li><p><a target="_blank" href="https://learn.microsoft.com/en-us/graph/webhooks-lifecycle">Microsoft Graph subscription lifecycle management</a></p>
</li>
<li><p><a target="_blank" href="https://learn.microsoft.com/en-us/azure/event-grid/manage-event-delivery">Set dead-letter location and retry policy</a></p>
</li>
<li><p><a target="_blank" href="https://learn.microsoft.com/en-us/azure/event-grid/subscribe-to-graph-api-events?tabs=powershell#renew-a-microsoft-graph-api-subscription">Renew a Microsoft Graph API subscription</a></p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Building a Secure Password Reset API with Azure Functions, Easy Auth, and LDAPS]]></title><description><![CDATA[Introduction
In my previous infrastructure blog, we built a disposable Active Directory lab: a domain controller in its own subnet, a function app in another, Key Vault for secrets, and just enough networking glue to make it feel like a real hybrid e...]]></description><link>https://benroberts.io/building-a-secure-password-reset-api-with-azure-functions-easy-auth-and-ldaps</link><guid isPermaLink="true">https://benroberts.io/building-a-secure-password-reset-api-with-azure-functions-easy-auth-and-ldaps</guid><category><![CDATA[Azure]]></category><category><![CDATA[automation]]></category><category><![CDATA[Powershell]]></category><category><![CDATA[Active Directory]]></category><dc:creator><![CDATA[Ben Roberts]]></dc:creator><pubDate>Wed, 14 Jan 2026 13:00:13 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1766206703231/14522a71-e1d3-4ce1-9158-5cc17fb41f19.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-introduction">Introduction</h2>
<p>In my previous infrastructure blog, we built a disposable Active Directory lab: a domain controller in its own subnet, a function app in another, Key Vault for secrets, and just enough networking glue to make it feel like a real hybrid environment.</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://benroberts.io/zero-to-dc-building-a-disposable-active-directory-lab-in-azure-with-bicep-and-powershell">https://benroberts.io/zero-to-dc-building-a-disposable-active-directory-lab-in-azure-with-bicep-and-powershell</a></div>
<p> </p>
<p>This post is the other half of the story: the <strong>PowerShell 7.4 Azure Function</strong> that accepts authenticated requests, authorizes them with role claims, and resets passwords in on-prem AD over <strong>LDAPS</strong>.</p>
<p>The goal is intentionally boring: one HTTP POST, a strong password comes back, and the AD change happens safely and repeatably.</p>
<h2 id="heading-where-this-pattern-fits">Where This Pattern Fits</h2>
<p>This “thin API + strong platform auth + pinned LDAPS” pattern is useful anywhere you want a controlled bridge between cloud automation and a directory that still lives behind private networking.</p>
<p>Here are a few practical scenarios:</p>
<ol>
<li><p><strong>ITSM-driven onboarding (e.g., ServiceNow)</strong></p>
<ul>
<li><p>A user onboarding workflow runs in an ITSM tool and, at the right step, calls <code>POST /api/ResetUserPassword</code> using client credentials.</p>
</li>
<li><p>The ITSM app registration gets a tightly scoped role (for example, only the password reset role), and the function enforces that role claim.</p>
</li>
<li><p>The generated password can be handed off to the next step (securely) or used to set an initial password before the user is prompted to change it at first sign-in.</p>
</li>
</ul>
</li>
<li><p><strong>Scheduled service account password rotation</strong></p>
<ul>
<li><p>A rotation service (another API, a runbook, or a GitHub Actions workflow) calls the endpoint on a schedule for a known set of accounts.</p>
</li>
<li><p>Because the function pulls its bind credential and LDAPS pinning material from Key Vault, you can rotate the function’s own dependencies independently.</p>
</li>
<li><p>The caller can store the new password in the system that actually consumes it (Key Vault secret, configuration store, etc.) and trigger downstream restarts.</p>
</li>
</ul>
</li>
<li><p><strong>Internal admin portal / delegated operations</strong></p>
<ul>
<li><p>A small internal portal can call the function on behalf of authorized operators.</p>
</li>
<li><p>The portal becomes the UX layer, while the function remains the audited, least-privileged “knife switch” that performs the directory operation.</p>
</li>
</ul>
</li>
</ol>
<h2 id="heading-architecture-at-a-glance">Architecture at a Glance</h2>
<p>Here’s the moving parts that matter for the function app itself:</p>
<pre><code class="lang-mermaid">flowchart TB
      subgraph Azure["Azure Resource Group"]
            FA["Function App (Windows)
PowerShell 7.4"]
            KV["Key Vault
Secrets"]
            LA["App Insights / Log Analytics"]
      end

      subgraph VNet["VNet"]
            DC["Domain Controller
LDAPS :636"]
      end

      Caller["Calling App
(Entra client credentials)"] --&gt;|Bearer token| FA
      FA --&gt;|Managed Identity| KV
      FA --&gt;|LDAPS &lt;br&gt;pinned cert + hostname validation | DC
      FA --&gt; LA
</code></pre>
<p>Two design choices shape almost everything:</p>
<ol>
<li><p><strong>Authentication is delegated to the platform</strong> (App Service Authentication aka “Easy Auth”).</p>
</li>
<li><p><strong>Directory operations are done via LDAPS</strong> using .NET LDAP APIs, with strict TLS validation.</p>
</li>
</ol>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>To follow along end-to-end you’ll need:</p>
<ul>
<li>An Entra ID app registration for the API, with role assignments for callers.</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766204868932/ddbf48f8-f08b-4082-a416-a88bc3753250.png" alt class="image--center mx-auto" /></p>
<ul>
<li>App Service Authentication enabled for the Function App (configured in IaC).</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766205288094/36dbabfb-e0a9-44fd-b590-2514c02ca026.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766205347011/f9ebc365-d36f-4594-af6a-7b7d38cef541.png" alt class="image--center mx-auto" /></p>
<ul>
<li>A domain controller reachable from the Function App via VNet integration.</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766205459919/68b4b398-669a-4555-891a-cc13abb3a48d.png" alt class="image--center mx-auto" /></p>
<ul>
<li><p>Two Key Vault secrets: - <code>ENTRA-PWDRESET-RW</code> (JSON containing username/password)</p>
<ul>
<li><code>LDAPS-Certificate-CER</code> (the domain controller’s public cert, base64)</li>
</ul>
</li>
</ul>
<h2 id="heading-the-request-walkthrough">The Request Walkthrough</h2>
<p>Let’s walk the request the same way the runtime sees it.</p>
<h3 id="heading-step-1-the-request-arrives-but-your-code-doesnt-validate-the-jwt">Step 1: The request arrives (but your code doesn’t validate the JWT)</h3>
<p>The caller sends <code>Authorization: Bearer ...</code>.</p>
<p>Before PowerShell starts, <strong>Easy Auth</strong> validates the token:</p>
<ul>
<li><p>Signature + issuer via OIDC metadata (<code>.../{tenantId}/v2.0</code>).</p>
</li>
<li><p><code>exp</code> / <code>nbf</code> timing.</p>
</li>
<li><p>Audience (<code>aud</code>). In this project the allowed audiences include both:</p>
<ul>
<li><p>the plain client id, and</p>
</li>
<li><p><code>api://{clientId}</code></p>
</li>
</ul>
</li>
</ul>
<p>If validation fails, Easy Auth returns <strong>401</strong> and the function never runs.</p>
<h3 id="heading-step-2-easy-auth-injects-the-principal">Step 2: Easy Auth injects the principal</h3>
<p>On success, Easy Auth injects <code>X-MS-CLIENT-PRINCIPAL</code> (base64 JSON). The function decodes it with:</p>
<pre><code class="lang-powershell"><span class="hljs-variable">$principal</span> = <span class="hljs-built_in">Get-ClientPrincipal</span> <span class="hljs-literal">-HeaderValue</span> <span class="hljs-variable">$Request</span>.Headers[<span class="hljs-string">'X-MS-CLIENT-PRINCIPAL'</span>]
</code></pre>
<p>That gives us a consistent claim set without having to do token cryptography in PowerShell.</p>
<h3 id="heading-step-3-authorization-is-a-role-claim-check">Step 3: Authorization is a role claim check</h3>
<p>The function enforces a single rule: the caller must have the required role (from <code>REQUIRED_ROLE</code>).</p>
<pre><code class="lang-powershell"><span class="hljs-variable">$hasRole</span> = <span class="hljs-built_in">Test-RoleClaim</span> <span class="hljs-literal">-Principal</span> <span class="hljs-variable">$principal</span> <span class="hljs-literal">-RequiredRole</span> <span class="hljs-variable">$env:REQUIRED_ROLE</span>
</code></pre>
<p>No role claim → <strong>403</strong>.</p>
<h3 id="heading-step-4-parse-the-body-and-choose-the-target-user">Step 4: Parse the body and choose the target user</h3>
<p>The request body is intentionally small:</p>
<pre><code class="lang-json">{ <span class="hljs-attr">"samAccountName"</span>: <span class="hljs-string">"jdoe"</span> }
</code></pre>
<p>If <code>samAccountName</code> is missing → <strong>400</strong>.</p>
<h3 id="heading-step-5-fetch-secrets-with-managed-identity">Step 5: Fetch secrets with Managed Identity</h3>
<p>At this point we have an authorized request, but we still need two things:</p>
<ul>
<li><p><strong>AD service account credential</strong> (from Key Vault)</p>
</li>
<li><p><strong>LDAPS certificate pinning material</strong> (from Key Vault)</p>
</li>
</ul>
<p>The function app uses its <strong>system-assigned managed identity</strong> to call Key Vault. Secrets are cached per runspace inside the helper module, so normal traffic doesn’t hammer Key Vault.</p>
<p>If the LDAPS certificate secret is missing or empty, the function fails fast with <strong>500</strong> (that’s a misconfiguration we don’t want to “best-effort” our way through).</p>
<h2 id="heading-the-ldaps-story-strict-no-hostname-bypass">The LDAPS Story (Strict, No Hostname Bypass)</h2>
<p>Resetting passwords over LDAP is the part that tends to get hand-waved with “just trust the cert.” This project goes the other direction.</p>
<p>The function resets passwords over <strong>LDAPS</strong> using <code>System.DirectoryServices.Protocols.LdapConnection</code>, and validates the server certificate in two ways:</p>
<ol>
<li><p><strong>Certificate pinning</strong>: the presented server cert thumbprint must match the pinned cert retrieved from Key Vault.</p>
</li>
<li><p><strong>Hostname validation</strong>: the cert must match the domain controller hostname (SAN/CN checks).</p>
</li>
</ol>
<p>This keeps TLS strict without requiring the Function App sandbox to write to any certificate store. (On Windows-hosted Functions, opening cert stores for write is commonly blocked.)</p>
<p>Before attempting the TLS handshake, the code also performs a quick TCP preflight to port 636. That makes “network unreachable” failures look different from “TLS validation failed” failures, which is invaluable when debugging.</p>
<h2 id="heading-generating-and-returning-the-password">Generating and Returning the Password</h2>
<p>The function generates a password with <code>New-SecurePassword</code> (length default 16, with required character classes), converts it to <code>SecureString</code> for the directory operation, and returns the plain text password in the response body.</p>
<p>The important operational rule is: <strong>no password is written to logs</strong>. The only place the generated password exists is in memory during that request and in the HTTPS response to an authorized caller.</p>
<h2 id="heading-hosting-and-scaling-notes">Hosting and Scaling Notes</h2>
<p>This function app runs on <strong>Elastic Premium (EP1) on Windows</strong>, because VNet integration is a core requirement for reaching the domain controller.</p>
<p>Concurrency is tuned with:</p>
<ul>
<li><p><code>FUNCTIONS_WORKER_PROCESS_COUNT=2</code></p>
</li>
<li><p><code>PSWorkerInProcConcurrencyUpperBound=10</code></p>
</li>
</ul>
<p>Those settings let a single app instance handle multiple requests in parallel while keeping directory operations responsive.</p>
<h2 id="heading-where-the-logic-lives">Where the Logic Lives</h2>
<p>The entrypoint is intentionally small: it validates the request shape, checks role claims, and orchestrates calls into a helper module.</p>
<p>The heavy lifting lives in <code>PasswordResetHelpers</code>:</p>
<ul>
<li><p><code>Get-ClientPrincipal</code> and <code>Test-RoleClaim</code> (authorization)</p>
</li>
<li><p><code>Get-FunctionAdServiceCredential</code> (Key Vault + MI)</p>
</li>
<li><p><code>Get-FunctionLdapsCertificateBase64</code> (pinned cert from Key Vault)</p>
</li>
<li><p><code>Set-ADUserPassword</code> (LDAPS user lookup + unicodePwd modify)</p>
</li>
</ul>
<p>Keeping the LDAPS plumbing in one place made it much easier to iterate on TLS validation without turning <code>run.ps1</code> into a wall of LDAP code.</p>
<h2 id="heading-how-the-pieces-fit-together-in-the-repo">How the Pieces Fit Together in the Repo</h2>
<p>The function app is intentionally small: one HTTP-triggered endpoint, one helper module, and a profile script for worker initialization.</p>
<p>Here’s the layout under <code>project-functionapp-roles/FunctionApp</code>:</p>
<pre><code class="lang-text">FunctionApp/
      host.json
      local.settings.json               # local-only settings (not deployed)
      profile.ps1                       # runs once per worker instance
      requirements.psd1                 # managed dependencies
      ResetUserPassword/
            function.json                   # httpTrigger + http output binding
            run.ps1                         # endpoint handler
            PasswordResetHelpers.psm1       # core logic (auth parsing, Key Vault, LDAPS)
            PasswordResetHelpers.psd1       # module manifest
</code></pre>
<p>One detail that’s easy to miss: <code>function.json</code> uses <code>"authLevel": "anonymous"</code> because authentication is handled by Easy Auth <em>before</em> PowerShell runs.</p>
<h2 id="heading-the-startup-hook-profileps1">The Startup Hook: profile.ps1</h2>
<p>Azure Functions loads <code>profile.ps1</code> <strong>once per PowerShell worker instance</strong> (think “once per worker process,” not once per request). In this project it does three things:</p>
<ol>
<li><p>Sets strict error behavior (<code>Set-StrictMode -Version Latest</code>, <code>$ErrorActionPreference = 'Stop'</code>).</p>
</li>
<li><p>Detects the Managed Identity endpoint variables (<code>IDENTITY_ENDPOINT/IDENTITY_HEADER</code>, with fallback to legacy <code>MSI_*</code>).</p>
</li>
<li><p>Optionally “warms” secrets by retrieving:</p>
<ul>
<li><p>the AD service account secret (currently <code>ENTRA-PWDRESET-RW</code>), and</p>
</li>
<li><p>the LDAPS public cert (<code>LDAPS-Certificate-CER</code>).</p>
</li>
</ul>
</li>
</ol>
<p>The request path in <code>run.ps1</code> <strong>does not depend</strong> on these global variables; it retrieves secrets on-demand through the helper module and caches per runspace. You can think of <code>profile.ps1</code> as a worker-initialization script and (optionally) an early warning system if Managed Identity / Key Vault access is broken.</p>
<h2 id="heading-the-helper-module-passwordresethelperspsm1">The Helper Module: PasswordResetHelpers.psm1</h2>
<p><code>PasswordResetHelpers.psm1</code> is where the “real work” lives. Each function is small on purpose, so you can test and reason about the behavior in isolation.</p>
<ul>
<li><p><code>Get-ManagedIdentityAccessToken</code> - Calls the App Service / Functions Managed Identity endpoint (new <code>IDENTITY_*</code> or legacy <code>MSI_*</code>) and returns an access token for a given resource.</p>
</li>
<li><p><code>Get-KeyVaultSecretValue</code> - Uses Managed Identity to fetch a secret value from Key Vault via the REST API.</p>
</li>
<li><p><code>Get-FunctionAdServiceCredential</code> - Builds a <code>PSCredential</code> either from local env vars (<code>AD_SERVICE_USERNAME</code>/<code>AD_SERVICE_PASSWORD</code>) or from Key Vault (<code>ENTRA-PWDRESET-RW</code>). It also fixes the common JSON-backslash issue (<code>DOMAIN\svc</code>) before parsing.</p>
</li>
<li><p><code>Get-FunctionLdapsCertificateBase64</code> - Retrieves and caches <code>LDAPS-Certificate-CER</code> (base64). This is the pinning material used to validate the DC’s LDAPS certificate.</p>
</li>
<li><p><code>Get-ClientPrincipal</code> - Decodes the <code>X-MS-CLIENT-PRINCIPAL</code> header (base64 JSON) injected by Easy Auth, returning a PowerShell object with the caller’s claims.</p>
</li>
<li><p><code>Test-RoleClaim</code> - Scans the decoded principal for the required role (handles both <code>roles</code> and <code>role</code> claim types).</p>
</li>
<li><p><code>New-SecurePassword</code> - Generates a random password (default length 16) with required character classes.</p>
</li>
<li><p><code>Test-LdapsTcpConnectivity</code> - Performs a quick TCP connect check to <code>host:636</code> so network problems are easier to distinguish from TLS/cert validation problems.</p>
</li>
<li><p><code>ConvertFrom-LdapsCertificateBase64</code> - Parses the pinned certificate from base64, accepting either DER bytes or PEM text.</p>
</li>
<li><p><code>Get-CertificateDnsNames</code> - Extracts DNS names from the certificate (SANs first, with CN as fallback).</p>
</li>
<li><p><code>Test-CertificateMatchesHostName</code> - Validates that the certificate names match the domain controller hostname, including wildcard handling.</p>
</li>
<li><p><code>New-LdapsConnection</code> - Creates an LDAPS <code>LdapConnection</code>, enables SSL, and attaches a strict <code>VerifyServerCertificate</code> callback that enforces: 1) thumbprint pinning to the Key Vault cert, and 2) hostname validation.</p>
</li>
<li><p><code>Get-ADUserDistinguishedName</code> - Searches AD over LDAPS to find the user DN by <code>sAMAccountName</code>.</p>
</li>
<li><p><code>Set-ADUserPassword</code> - Uses LDAPS to modify <code>unicodePwd</code> for the target user (via <code>ModifyRequest</code>). This is the core “reset” operation.</p>
</li>
</ul>
<h2 id="heading-the-endpoint-runps1">The Endpoint: run.ps1</h2>
<p><code>run.ps1</code> is intentionally written as a single guided flow (not a pile of helper functions). Conceptually, it’s a pipeline:</p>
<ol>
<li><p><strong>Validate request envelope</strong></p>
<ul>
<li>Requires <code>X-MS-CLIENT-PRINCIPAL</code> and checks required env vars (<code>REQUIRED_ROLE</code>, <code>DOMAIN_CONTROLLER_FQDN</code>, <code>DOMAIN_NAME</code>).</li>
</ul>
</li>
<li><p><strong>Decode principal + authorize</strong></p>
<ul>
<li><code>Get-ClientPrincipal</code> → <code>Test-RoleClaim</code> → return <code>401/403</code> early if needed.</li>
</ul>
</li>
<li><p><strong>Parse and validate request body</strong></p>
<ul>
<li>Handles both string JSON and already-deserialized bodies, then requires <code>samAccountName</code>.</li>
</ul>
</li>
<li><p><strong>Load secrets needed for the operation</strong></p>
<ul>
<li><p><code>Get-FunctionAdServiceCredential</code> for the bind credential.</p>
</li>
<li><p><code>Get-FunctionLdapsCertificateBase64</code> for the pinned cert (required).</p>
</li>
</ul>
</li>
<li><p><strong>Generate a password and apply it over LDAPS</strong></p>
<ul>
<li><p><code>New-SecurePassword</code> generates the value returned to the caller.</p>
</li>
<li><p><code>Set-ADUserPassword</code> performs the reset over LDAPS.</p>
</li>
</ul>
</li>
<li><p><strong>Return the response (with security headers)</strong></p>
<ul>
<li>Responds <code>200</code> with <code>{ samAccountName, password, resetTime, message }</code> and <code>Cache-Control: no-store</code> to reduce accidental caching.</li>
</ul>
</li>
</ol>
<h2 id="heading-the-test-driver-test-functionappwithtokenps1">The Test Driver: Test-FunctionAppWithToken.ps1</h2>
<p>The <code>scripts/Test-FunctionAppWithToken.ps1</code> script is designed to <strong>simulate a real calling application</strong>. It uses the same client credentials flow your automation, portal, or service would use in production.</p>
<p>What it does:</p>
<ol>
<li><p>Requests an access token from the Entra v2 token endpoint:</p>
<ul>
<li><code>https://login.microsoftonline.com/{tenantId}/oauth2/v2.0/token</code></li>
</ul>
</li>
<li><p>Uses the <code>.default</code> scope for your API:</p>
<ul>
<li><code>scope=api://{ApiAppId}/.default</code></li>
</ul>
</li>
<li><p>Calls the function endpoint with <code>Authorization: Bearer {token}</code>.</p>
</li>
<li><p>Sends a JSON body that includes <code>samAccountName</code> (derived from <code>UserPrincipalName</code>). The current function only requires <code>samAccountName</code>; extra fields in the test payload are ignored.</p>
</li>
</ol>
<p>Example usage:</p>
<pre><code class="lang-powershell">./scripts/<span class="hljs-built_in">Test-FunctionAppWithToken</span>.ps1 \
      <span class="hljs-literal">-ClientId</span> <span class="hljs-string">"&lt;client-app-id&gt;"</span> \
      <span class="hljs-literal">-ClientSecret</span> <span class="hljs-string">"&lt;client-secret&gt;"</span> \
      <span class="hljs-literal">-TenantId</span> <span class="hljs-string">"&lt;tenant-id&gt;"</span> \
      <span class="hljs-literal">-ApiAppId</span> <span class="hljs-string">"&lt;api-app-id&gt;"</span> \
      <span class="hljs-literal">-FunctionAppUrl</span> <span class="hljs-string">"https://&lt;functionapp&gt;.azurewebsites.net"</span> \
      <span class="hljs-literal">-UserPrincipalName</span> <span class="hljs-string">"testuser1@contoso.com"</span> \
      <span class="hljs-literal">-NewPassword</span> <span class="hljs-string">"IgnoredByCurrentAPI123!"</span>
</code></pre>
<p>It also prints key token claims (<code>aud</code>, <code>iss</code>, roles) so when auth breaks you can quickly see whether you’re dealing with an audience mismatch, issuer mismatch, or missing role assignment.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766203123384/ef5426ae-318b-46c6-b313-f44ce09640c9.png" alt class="image--center mx-auto" /></p>
<p>To generate a app registration and secret for the calling app the <code>scripts/Create-ClientAppRegistration.ps1</code> script can help.</p>
<p>Don't forget to grant admin consent for the API permissions after creating the app registration!</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>This project looks small on the surface—one endpoint that resets a password—but it only stays “boring” because the hard parts are handled deliberately.</p>
<ul>
<li><p><strong>Easy Auth</strong> takes care of token validation so the function can focus on business logic.</p>
</li>
<li><p><strong>Authorization</strong> is reduced to a single, auditable decision: “does the caller have the role?”</p>
</li>
<li><p><strong>Key Vault + Managed Identity</strong> keeps credentials and pinning material out of code and out of deployment scripts.</p>
</li>
<li><p><strong>LDAPS with strict certificate pinning and hostname validation</strong> makes the directory operation secure without relying on fragile trust-store customization.</p>
</li>
</ul>
<p>The result is an API you can demo, redeploy, and troubleshoot confidently: when it fails, it fails for reasons you can explain—and when it succeeds, it does exactly one thing, safely. 🚀</p>
<h2 id="heading-quick-reference">Quick Reference</h2>
<ul>
<li><p>Endpoint: <code>POST /api/ResetUserPassword</code></p>
</li>
<li><p>Auth: Easy Auth (Entra ID v2 issuer) + role claim check</p>
</li>
<li><p>Required request body: <code>{ "samAccountName": "..." }</code></p>
</li>
<li><p>Key Vault secrets: <code>ENTRA-PWDRESET-RW</code>, <code>LDAPS-Certificate-CER</code></p>
</li>
<li><p>Directory transport: LDAPS on <code>:636</code> with certificate pinning + hostname validation</p>
</li>
</ul>
<hr />
<p><strong>Built with</strong>: PowerShell 7.4 • Azure Functions • Easy Auth • Key Vault (Managed Identity) • LDAPS</p>
]]></content:encoded></item><item><title><![CDATA[Workload Identity Risk and Remediation with Microsoft Graph & PowerShell]]></title><description><![CDATA[Introduction
Here's a problem you might not be tracking as closely as you should: every workload identity in your Entra ID tenant—the service principals and app registrations powering your automation, CI/CD pipelines, and third-party integrations—is ...]]></description><link>https://benroberts.io/workload-identity-risk-and-remediation-with-microsoft-graph-and-powershell</link><guid isPermaLink="true">https://benroberts.io/workload-identity-risk-and-remediation-with-microsoft-graph-and-powershell</guid><category><![CDATA[Powershell]]></category><category><![CDATA[Entra ID]]></category><category><![CDATA[github-actions]]></category><dc:creator><![CDATA[Ben Roberts]]></dc:creator><pubDate>Wed, 31 Dec 2025 13:00:42 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1763886568800/3ee8fda1-debc-4185-acda-1fa870a720f9.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-introduction">Introduction</h2>
<p>Here's a problem you might not be tracking as closely as you should: every workload identity in your Entra ID tenant—the service principals and app registrations powering your automation, CI/CD pipelines, and third-party integrations—is a potential security incident waiting to happen.</p>
<p>I'm not talking about user accounts. Those get MFA, Conditional Access policies, password expiry rules, and constant security team scrutiny. But workload identities? They tend to live in the shadows. Someone creates an app registration, generates a client secret, drops it into a GitHub secret or Azure Key Vault, and... that's it. No rotation schedule. No expiry monitoring. No privilege reviews. That secret could have been exfiltrated months ago and you'd never know until it's used to ransomware your tenant.</p>
<p>The wake-up call for most organizations comes from one of three places: an Entra ID Protection alert flagging a risky service principal (yes, that's a thing now), a penetration test report showing credential sprawl, or an audit finding that half your apps have <code>Directory.ReadWrite.All</code> when they only needed <a target="_blank" href="http://User.Read"><code>User.Read</code></a>. By then you're playing catch-up.</p>
<p>This article walks through a practical solution I built to get ahead of that problem: a PowerShell 7.4 toolkit that discovers, triages, and remediates workload identity risk using Microsoft Graph. The philosophy is simple—stop rotating secrets and start eliminating them. Use federated credentials (OIDC workload identity) wherever possible, short-lived certificates where federation isn't an option, and actually monitor what your workload identities are doing.</p>
<p>If you're following Microsoft's guidance on <a target="_blank" href="https://learn.microsoft.com/entra/fundamentals/configure-security#protect-identities-and-secrets">protecting identities and secrets</a>, this toolkit gives you the automation to make it real.</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p><strong>Licensing requirements:</strong> The toolkit's discovery features work with any Entra ID tenant and require no additional licenses—the read-only scan operations use standard Microsoft Graph permissions available to all organizations. However, several of the scenarios and enforcement capabilities discussed in this article require premium licensing:</p>
<ul>
<li><p><strong>Microsoft Entra ID P2 or Microsoft Entra ID Governance</strong> — Required to access Identity Protection risk detections for workload identities (the <code>risky-service-principals.json</code> and <code>risky-service-principal-triage.json</code> artifacts). Basic risk visibility (limited reporting details) is available without premium licenses, but full risk details and risk-based actions require a premium subscription.</p>
</li>
<li><p><strong>Microsoft Entra Workload Identities Premium</strong> — Required to create or modify Conditional Access policies scoped to service principals, to use risk-based Conditional Access conditions for workload identities, and to conduct access reviews of service principals in Privileged Identity Management. You can view, start a trial, and acquire licenses at <a target="_blank" href="https://portal.azure.com/#view/Microsoft_Azure_ManagedServiceIdentity/WorkloadIdentitiesBlade">https://portal.azure.com/#view/Microsoft_Azure_ManagedServiceIdentity/WorkloadIdentitiesBlade</a>.</p>
</li>
<li><p><strong>Access Reviews for service principals</strong> — Requires both Workload Identities Premium and an ID Governance or ID P2 license.</p>
</li>
</ul>
<p>The core scanning and reporting functionality—credential inventory, privileged role enumeration, high-privilege app permissions, and consent settings—operates without premium licenses. You can run the full scan, generate all artifacts, and build remediation plans using the free tier. Premium licensing becomes necessary when you move from visibility to enforcement (Conditional Access) or governance automation (PIM access reviews, advanced Identity Protection actions).</p>
<p>For more information, see:</p>
<ul>
<li><p>Microsoft Entra Workload ID licensing: <a target="_blank" href="https://www.microsoft.com/security/business/identity-access/microsoft-entra-workload-identities">https://www.microsoft.com/security/business/identity-access/microsoft-entra-workload-identities</a></p>
</li>
<li><p>Microsoft Entra ID Governance licensing: <a target="_blank" href="https://learn.microsoft.com/en-us/entra/id-governance/licensing-fundamentals">https://learn.microsoft.com/en-us/entra/id-governance/licensing-fundamentals</a></p>
</li>
<li><p>Conditional Access for workload identities: <a target="_blank" href="https://learn.microsoft.com/en-us/entra/identity/conditional-access/workload-identity">https://learn.microsoft.com/en-us/entra/identity/conditional-access/workload-identity</a></p>
</li>
</ul>
<h2 id="heading-what-this-toolkit-does">What This Toolkit Does</h2>
<p>The <code>WorkloadIdentityTools</code> module is designed to answer a few critical questions about your Entra ID workload identities:</p>
<p><strong>Which secrets are ticking time bombs?</strong> It inventories every credential across your app registrations—client secrets, certificates, federated credentials—and flags the ones that are long-lived (over 180 days) or nearing expiry (under 30 days). You get a risk score for each app so you can prioritize what to fix first.</p>
<p><strong>Who has the keys to the kingdom?</strong> It enumerates which service principals hold privileged directory roles or dangerous Microsoft Graph permissions like <code>Directory.ReadWrite.All</code> or <code>Application.ReadWrite.All</code>. If you've got a CI/CD pipeline with Global Administrator, you'll know about it.</p>
<p><strong>What's your consent posture?</strong> It pulls your tenant's authorization policy settings to show whether users can consent to apps, whether admin consent workflows are enabled, and who's allowed to create new app registrations. These are the knobs that control how workload identities proliferate in your environment.</p>
<p><strong>Are any of your service principals already flagged as risky?</strong> Entra ID Protection now tracks risky workload identities (currently in beta). The toolkit pulls those signals and generates a triage report showing which service principals are at risk, confirmed compromised, or dismissed.</p>
<p><strong>How do I actually fix this?</strong> It includes remediation helpers to create federated credentials (the secretless pattern for GitHub Actions, Azure workload identity federation, etc.) and rotate to short-lived certificates when federation isn't possible.</p>
<p>All of this data gets written to machine-readable JSON and CSV artifacts, so you can feed it into dashboards, SIEM systems, or just open it in Excel and start making a plan. The centerpiece is <a target="_blank" href="http://Scan-And-Report.ps"><code>Scan-And-Report.ps</code></a><code>1</code>, a one-shot script that runs the full discovery sweep and drops everything into an <code>./out/</code> folder.</p>
<h2 id="heading-how-the-repository-is-organized">How the Repository is Organized</h2>
<p>The project follows a standard PowerShell module layout. Everything lives under <code>project-workload-identity/</code>: the <code>scripts/</code> folder has the standalone scan script and dependency installer, <code>src/WorkloadIdentityTools/</code> contains the module itself (manifest, loader, public cmdlets, private helpers), and <code>tests/Unit/</code> has Pester tests to validate the module loads correctly and the risk scoring logic works as expected. The <a target="_blank" href="http://README.md"><code>README.md</code></a> is your quick-start guide if you just want to run a scan; this blog post is the deep dive explaining why it exists and how it works under the hood.</p>
<h2 id="heading-when-youd-use-this">When You'd Use This</h2>
<p>Let's say you're a security engineer at a company that's been using Entra ID (formerly Azure AD) for a few years. You've got hundreds of app registrations scattered across dev, staging, and production. Some were created by developers who've since left. Some were generated by automated deployment scripts. A few are still using client secrets that were pasted into wikis during hackathons.</p>
<p><strong>Scenario 1: The credential audit nobody wants to do manually.</strong> Your CISO wants a report on every long-lived secret in the tenant. You could click through the portal for hours, or you could run <a target="_blank" href="http://Scan-And-Report.ps"><code>Scan-And-Report.ps</code></a><code>1</code> and get a CSV with every credential, its age, and a risk score. Now you've got actionable data to build a migration roadmap.</p>
<p><strong>Scenario 2: Identity Protection started alerting on a risky service principal.</strong> You're used to handling risky users, but this is new. What does a risky service principal even mean? Run the toolkit's risky workload identity triage and you'll see which apps are flagged, their risk levels, and recommendations for whether to confirm them as compromised or investigate further.</p>
<p><strong>Scenario 3: You're migrating from secrets to workload identity federation.</strong> GitHub Actions just added OIDC support, and you want to stop storing <code>AZURE_CLIENT_SECRET</code> in every repository. Use the toolkit to inventory which apps are still using secrets, create federated credentials for the GitHub issuer, and track adoption over time by rerunning scans.</p>
<p><strong>Scenario 4: The compliance team needs evidence.</strong> Auditors want proof that you're monitoring privileged app permissions and consent policies. The JSON artifacts the toolkit generates are timestamped, machine-readable compliance evidence. Drop them in an S3 bucket or Azure Storage container and you've got an audit trail.</p>
<p><strong>Scenario 5: You want this in CI/CD.</strong> Run the scan nightly as a GitHub Actions workflow, publish the results as job summary markdown, and upload artifacts. If a new high-risk app appears, you'll see it in the workflow run without having to remember to check manually.</p>
<h2 id="heading-why-not-just-rotate-secrets-faster">Why Not Just Rotate Secrets Faster?</h2>
<p>I know what you're thinking: "We already have a secret rotation platform. Why not just rotate secrets every 90 days and call it done?"</p>
<p>Because rotation is a band-aid. It assumes the secret is the inevitable part of the design and tries to minimize exposure windows. But here's the thing—secrets don't have to exist at all.</p>
<p><strong>Federated credentials</strong> (OIDC workload identity) and <strong>managed identities</strong> eliminate static secrets completely. Instead of storing a password-equivalent that could leak, you configure trust relationships. GitHub Actions proves it's running in your repository by presenting a signed OIDC token; Entra ID validates the token and issues a short-lived access token. No secret ever hits your CI/CD environment.</p>
<p>The advantages are huge:</p>
<ul>
<li><p><strong>No exfiltration risk.</strong> There's nothing static to steal. An attacker would have to compromise the OIDC issuer itself (GitHub, Azure, AWS) which is significantly harder than grabbing a secret from a Key Vault or environment variable.</p>
</li>
<li><p><strong>No rotation schedules.</strong> Tokens are issued on-demand and expire in minutes or hours. You never have to coordinate "rotate this secret across 12 environments by Friday."</p>
</li>
<li><p><strong>Faster incident response.</strong> If a workload identity is compromised, you revoke the trust relationship in Entra ID. You don't have to hunt down every place a secret was copied.</p>
</li>
<li><p><strong>Better CI ergonomics.</strong> Developers don't need to know about secrets at all. They just configure the OIDC subject claim and it works.</p>
</li>
<li><p><strong>Policy enforcement at the identity layer.</strong> Conditional Access policies can apply to service principals. You can require MFA step-up, device compliance, or network restrictions without touching secret storage.</p>
</li>
</ul>
<p>Where federation isn't an option—legacy systems, third-party integrations that don't support OIDC—short-lived certificates are the next best thing. A cert with a 30-day lifetime that auto-rotates is orders of magnitude safer than a 2-year client secret that someone pasted into Slack.</p>
<p>This toolkit exists to accelerate that transition: discover where secrets still exist, generate migration recommendations, and provide helpers to create federated credentials or rotate to short-lived certs. The goal isn't "rotate faster"; it's "remove the secret."</p>
<p><strong>Layer adaptive controls:</strong> Once high-risk apps are identified, enforce Conditional Access for workload identities (requires Workload Identities Premium licensing) to block risky service principals based on location or risk signals. Pair this with Continuous Access Evaluation (CAE) so revocations—service principal disable, deletion, or risk escalation—take effect immediately without waiting for token expiry. Validate outcomes via the Service Principal sign-in logs to confirm that enforcement is working as expected. The toolkit's <code>risky-service-principals.json</code> and <code>privileged-roles.json</code> artifacts provide the candidate list for scoping these policies.</p>
<h2 id="heading-how-it-works-under-the-hood">How It Works Under the Hood</h2>
<p>The module is built on PowerShell 7.4 with strict mode, <code>[CmdletBinding()]</code> attributes, and proper parameter validation. It's designed to be testable (Pester tests with mocks) and pipeable (objects, not formatted text).</p>
<p><strong>The Microsoft Graph PowerShell SDK</strong> is the engine. The v1.0 cmdlets handle apps, authorization policies, and role enumeration. For risky workload identities—which are currently in preview—it uses the beta endpoints. That means you need the <code>Microsoft.Graph.Beta.*</code> modules for some operations, and those APIs could change before they go GA. The toolkit makes that boundary explicit: cmdlets that touch beta data have <code>Beta</code> in the name (like <code>Get-WiBetaRiskyServicePrincipal</code>), and the documentation warns you upfront.</p>
<p><strong>Authentication is context-aware.</strong> The <code>Connect-WiGraph</code> wrapper handles two modes: interactive delegated scopes for local runs (you authenticate as yourself and get prompted for consent), and automatic environment variable detection for CI/CD. When the toolkit sees <code>AZURE_CLIENT_ID</code>, <code>AZURE_TENANT_ID</code>, and <code>AZURE_FEDERATED_TOKEN_FILE</code> in the environment (which <code>azure/login</code> in GitHub Actions sets), it switches to <code>Connect-MgGraph -EnvironmentVariable</code> and authenticates as the pipeline's service principal with no interaction required. Local runs are unaffected—you just pass scopes like normal.</p>
<p><strong>Discovery is read-only by default.</strong> When you run <a target="_blank" href="http://Scan-And-Report.ps"><code>Scan-And-Report.ps</code></a><code>1</code>, it calls a series of cmdlets that enumerate applications, credentials, role assignments, app permissions, consent settings, and risky service principals. All of this uses the minimum Graph scopes required: <a target="_blank" href="http://Application.Read"><code>Application.Read</code></a><code>.All</code>, <a target="_blank" href="http://Directory.Read"><code>Directory.Read</code></a><code>.All</code>, <a target="_blank" href="http://Policy.Read"><code>Policy.Read</code></a><code>.All</code>, <a target="_blank" href="http://IdentityRiskyServicePrincipal.Read"><code>IdentityRiskyServicePrincipal.Read</code></a><code>.All</code>. Nothing gets modified unless you explicitly invoke a remediation cmdlet.</p>
<p><strong>Remediation cmdlets require higher privileges and support</strong> <code>-WhatIf</code>. Creating a federated credential needs <code>Application.ReadWrite.All</code>. Confirming a service principal as compromised needs <code>IdentityRiskyServicePrincipal.ReadWrite.All</code> and the Security Administrator role. Both scenarios use approved verbs (<code>New-WiFederatedCredential</code>, <code>Set-WiRiskyServicePrincipalCompromised</code>) with <code>SupportsShouldProcess</code>, so you can test with <code>-WhatIf</code> before committing changes.</p>
<p><strong>Everything outputs structured data.</strong> The scan script writes JSON and CSV files to <code>./out/</code>. Each artifact is timestamped and includes metadata like the tenant ID and when the scan ran. You can parse these with <code>ConvertFrom-Json</code>, load them into pandas, or push them to Azure Monitor Logs. There's no proprietary format—just standards.</p>
<p><strong>Future enhancement:</strong> Optional CAE token capability detection and recommendation flags (proposed column <code>SupportsCae</code> in the credential inventory) would enable prioritization of workloads for real-time enforcement. Applications that send the <code>xms_cc=cp1</code> claim in their token requests receive CAE-enabled long-lived tokens (24 hours) subject to instant revocation events—a powerful upgrade over traditional 1-hour token lifetimes.</p>
<h2 id="heading-the-discovery-side">The Discovery Side</h2>
<p>The heavy lifting happens in a handful of cmdlets that map directly to Microsoft Graph queries:</p>
<p><code>Get-WiApplicationCredentialInventory</code> pulls every app registration in the tenant and iterates over its <code>passwordCredentials</code>, <code>keyCredentials</code>, and <code>federatedIdentityCredentials</code>. For each credential, it calculates how long it's been active and how long until it expires. Long-lived (&gt;180 days) or near-expiry (&lt;30 days) credentials get flagged with a risk score. The output includes recommendations: "migrate to federated credential" for secrets, "shorten lifetime" for long-lived certs.</p>
<p><code>Get-WiServicePrincipalPrivilegedAssignments</code> enumerates service principals with directory role assignments. It calls <code>Get-MgDirectoryRole</code> to list all roles, then fetches their members and filters for service principals. If you've got a CI pipeline with Global Administrator or an integration app with Privileged Role Administrator, it shows up here.</p>
<p><code>Get-WiHighPrivilegeAppPermissions</code> looks for applications holding dangerous Graph permissions—things like <code>Directory.ReadWrite.All</code>, <code>Application.ReadWrite.All</code>, <a target="_blank" href="http://RoleManagement.ReadWrite.Directory"><code>RoleManagement.ReadWrite.Directory</code></a>. These are the permissions that let an app modify users, create new apps, or assign roles. You'd be surprised how many apps have these permissions "just in case."</p>
<p><code>Get-WiTenantConsentSettings</code> fetches the authorization policy (<code>Get-MgPolicyAuthorizationPolicy</code>) and extracts the consent knobs: whether users can consent to apps, whether admin consent workflows are enabled, who can create apps, and whether email verification is required. This is the posture that controls how workload identities proliferate in your tenant.</p>
<p><strong>Classification &amp; Attributes:</strong> Beyond discovery, you can map discovered apps to custom security attributes (e.g., <code>RiskTier</code>, <code>RemediationPhase</code>, <code>DataSensitivity</code>) using Microsoft Graph PowerShell. Custom security attributes in Entra ID enable filtered views and targeted policy scope—for example, applying stricter Conditional Access policies to apps tagged with <code>DataSensitivity=High</code> or tracking migration progress with <code>RemediationPhase=InProgress</code>. While the toolkit doesn't automatically assign these attributes, the credential inventory and high-privilege permissions data provide the inputs for classification decisions. See <a target="_blank" href="https://learn.microsoft.com/en-us/entra/identity/enterprise-apps/custom-security-attributes-apps">https://learn.microsoft.com/en-us/entra/identity/enterprise-apps/custom-security-attributes-apps</a> for implementation guidance.</p>
<p><code>Get-WiBetaRiskyServicePrincipal</code> and <code>Get-WiBetaRiskyServicePrincipalHistory</code> hit the Identity Protection beta endpoints to pull risky workload identities. These are service principals that Microsoft's risk detection systems have flagged—maybe they authenticated from an anonymous IP, or their credentials showed up in a breach, or there was anomalous sign-in behavior. The triage report (<code>Get-WiRiskyServicePrincipalTriageReport</code>) aggregates the distribution by risk level and risk state so you can see at a glance how many are at risk vs. confirmed compromised vs. dismissed.</p>
<h2 id="heading-the-remediation-side">The Remediation Side</h2>
<p>Once you've identified problems, the toolkit provides helpers to fix them:</p>
<p><code>New-WiFederatedCredential</code> creates a federated identity credential on an app registration. You specify the issuer (like <a target="_blank" href="https://token.actions.githubusercontent.com"><code>https://token.actions.githubusercontent.com</code></a> for GitHub Actions), the subject (like <code>repo:myorg/myrepo:ref:refs/heads/main</code>), and optionally the audience. The credential is created immediately and you can start using OIDC tokens to authenticate. No secret required.</p>
<p><code>Add-WiApplicationCertificateCredential</code> generates a short-lived certificate credential. By default it's valid for 90 days, but you can configure shorter lifetimes. The cmdlet returns the certificate with private key so you can store it in Key Vault or another secure location. This is the fallback for systems that can't do federation.</p>
<p><code>Set-WiRiskyServicePrincipalCompromised</code> and <code>Clear-WiRiskyServicePrincipalRisk</code> are the approved-verb wrappers around the Identity Protection risk action APIs. If a service principal is flagged and you've confirmed it's compromised (maybe you found the secret in a public repo), you mark it as compromised so Microsoft's signals improve. If it's a false positive, you dismiss the risk. Both cmdlets support <code>-WhatIf</code> so you can preview the action before committing.</p>
<p>All of these require elevated permissions: <code>Application.ReadWrite.All</code> for credential changes, <code>IdentityRiskyServicePrincipal.ReadWrite.All</code> plus the Security Administrator role for risk actions. The module won't prompt you for consent on the fly—you need to authenticate with those scopes upfront.</p>
<p><strong>Post-remediation governance:</strong> After addressing immediate risks, seed Privileged Identity Management (PIM) recurring access reviews from the <code>privileged-roles.json</code> and <code>high-privilege-app-permissions.json</code> artifacts. Access reviews for service principals require Workload Identities Premium plus ID Governance licensing. Generate a CSV of candidate service principals with their role assignments, last sign-in dates, and permission counts, then import that scope into PIM to establish quarterly or semi-annual entitlement hygiene reviews. This closes the loop from discovery → remediation → ongoing governance. See <a target="_blank" href="https://learn.microsoft.com/en-us/entra/id-governance/privileged-identity-management/pim-create-roles-and-resource-roles-review">https://learn.microsoft.com/en-us/entra/id-governance/privileged-identity-management/pim-create-roles-and-resource-roles-review</a> for setup guidance.</p>
<h2 id="heading-running-it-yourself">Running It Yourself</h2>
<p>The quickest way to see what you're dealing with is to run the scan locally. Here's what that looks like:</p>
<p>First, install the Microsoft Graph PowerShell modules:</p>
<pre><code class="lang-powershell">./project<span class="hljs-literal">-workload</span><span class="hljs-literal">-identity</span>/scripts/<span class="hljs-built_in">Install-Dependencies</span>.ps1
</code></pre>
<p>This installs the Graph SDK and sets the PowerShell Gallery as trusted if it isn't already. You only need to do this once per machine.</p>
<p>Now import the module and authenticate:</p>
<pre><code class="lang-powershell"><span class="hljs-built_in">Import-Module</span> ./project<span class="hljs-literal">-workload</span><span class="hljs-literal">-identity</span>/src/WorkloadIdentityTools/WorkloadIdentityTools.psd1
<span class="hljs-built_in">Connect-WiGraph</span> <span class="hljs-literal">-TenantId</span> <span class="hljs-string">'your-tenant-id'</span>
</code></pre>
<p>By default, <code>Connect-WiGraph</code> requests delegated scopes for discovery (<a target="_blank" href="http://Application.Read">Application.Read</a>.All, <a target="_blank" href="http://Directory.Read">Directory.Read</a>.All, etc.). You'll get an OAuth prompt asking for consent. Approve it and you're connected.</p>
<p>Now run the inventory:</p>
<pre><code class="lang-powershell"><span class="hljs-variable">$inventory</span> = <span class="hljs-built_in">Get-WiApplicationCredentialInventory</span> <span class="hljs-literal">-All</span>
<span class="hljs-variable">$inventory</span> | <span class="hljs-built_in">Where-Object</span> { <span class="hljs-variable">$_</span>.RiskLevel <span class="hljs-operator">-eq</span> <span class="hljs-string">'High'</span> } | <span class="hljs-built_in">Format-Table</span> DisplayName, CredentialType, DaysUntilExpiry, RiskReasons
</code></pre>
<p>This returns every high-risk credential in your tenant—secrets that are ancient, certificates about to expire, apps that should have migrated to federation months ago. The <code>RiskReasons</code> column tells you why it's flagged.</p>
<p>If you want to check risky service principals (beta), reconnect with the Identity Protection scope:</p>
<pre><code class="lang-powershell"><span class="hljs-built_in">Connect-WiGraph</span> <span class="hljs-literal">-Scopes</span> <span class="hljs-selector-tag">@</span>(<span class="hljs-string">'IdentityRiskyServicePrincipal.Read.All'</span>) <span class="hljs-literal">-TenantId</span> <span class="hljs-string">'your-tenant-id'</span>
<span class="hljs-variable">$triage</span> = <span class="hljs-built_in">Get-WiRiskyServicePrincipalTriageReport</span>
<span class="hljs-variable">$triage</span>.Distribution.ByRiskLevel | <span class="hljs-built_in">Format-Table</span>
</code></pre>
<p>This shows how many service principals are at each risk level and what states they're in. If you see any confirmed compromised, those need immediate action.</p>
<p>For a full scan with all the artifacts, just run:</p>
<pre><code class="lang-powershell">./project<span class="hljs-literal">-workload</span><span class="hljs-literal">-identity</span>/scripts/Scan<span class="hljs-operator">-And</span><span class="hljs-literal">-Report</span>.ps1
</code></pre>
<p>The script connects, runs every discovery cmdlet, and writes JSON and CSV files to <code>./out/</code>. You can open those in Excel, load them into Power BI, or push them to your SIEM.</p>
<p>Testing in a Dev Tenant</p>
<p>If you're running this in a brand-new dev tenant that doesn't have a ton of workload identities yet, the scan results might be underwhelming. You'll see a handful of app registrations, maybe none with high-risk credentials, and probably zero risky service principals because your tenant hasn't been around long enough for Identity Protection to build a baseline.</p>
<p>That's where the lab seeding scripts come in. Run:</p>
<pre><code class="lang-powershell">./scripts/Bootstrap<span class="hljs-literal">-WiLab</span>.ps1 <span class="hljs-literal">-TenantId</span> <span class="hljs-string">'your-dev-tenant-id'</span>
</code></pre>
<p>This creates a set of <code>wi-lab-*</code> apps and service principals that cover the interesting cases: long-lived secrets, near-expiry secrets, certificate credentials, federated-only identities, and a few with high-privilege permissions. The script is idempotent—if you run it twice, it'll reuse the existing apps and update credentials as needed.</p>
<p>Now rerun the scan:</p>
<pre><code class="lang-powershell">./scripts/Scan<span class="hljs-operator">-And</span><span class="hljs-literal">-Report</span>.ps1
</code></pre>
<p>You'll see the lab identities show up in <code>credential-inventory.json</code>, <code>high-privilege-app-permissions.json</code>, and <code>privileged-roles.json</code>. This gives you realistic data to experiment with—try creating a federated credential on one of the secret-based apps, or rotate a certificate, and see how the scan results change.</p>
<p>When you're done, clean up:</p>
<pre><code class="lang-powershell">./scripts/Cleanup<span class="hljs-literal">-WiLab</span>.ps1 <span class="hljs-literal">-TenantId</span> <span class="hljs-string">'your-dev-tenant-id'</span>
</code></pre>
<p>This removes all the <code>wi-lab-*</code> identities. Use <code>-WhatIf</code> first if you want to preview what it's going to delete.</p>
<p><strong>Important: Do not run these scripts in production.</strong> They're designed for non-production tenants where you can safely create and delete app registrations. The bootstrap script doesn't manipulate Identity Protection risk state directly (you can't forge risk detections anyway), so the risky service principals report will stay empty in a fresh dev tenant—and that's fine.</p>
<h2 id="heading-hooking-it-into-cicd">Hooking It Into CI/CD</h2>
<p>The real power comes from running this continuously. You want to catch new high-risk apps the day they're created, not during the next quarterly audit.</p>
<p>Here's how you set that up in GitHub Actions. First, create a service principal in Entra ID and configure a federated credential for your GitHub repo (see <code>New-WiFederatedCredential</code> or do it in the portal). Grant it the Graph application permissions it needs: <a target="_blank" href="http://Application.Read"><code>Application.Read</code></a><code>.All</code>, <a target="_blank" href="http://Directory.Read"><code>Directory.Read</code></a><code>.All</code>, <a target="_blank" href="http://Policy.Read"><code>Policy.Read</code></a><code>.All</code>, <a target="_blank" href="http://IdentityRiskyServicePrincipal.Read"><code>IdentityRiskyServicePrincipal.Read</code></a><code>.All</code>. Make sure those permissions are admin-consented.</p>
<p>Add the service principal's client ID and your tenant ID as GitHub repository secrets:</p>
<ul>
<li><p><code>AZURE_CLIENT_ID</code></p>
</li>
<li><p><code>AZURE_TENANT_ID</code></p>
</li>
<li><p><code>AZURE_SUBSCRIPTION_ID</code> (required by <code>azure/login</code>)</p>
</li>
<li><p><code>WI_SCAN_TENANT_ID</code> (can be the same as <code>AZURE_TENANT_ID</code>)</p>
</li>
</ul>
<p>Now create a workflow file (<code>.github/workflows/workload-identity-scan.yml</code>) that looks like this:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">name:</span> <span class="hljs-string">Workload</span> <span class="hljs-string">Identity</span> <span class="hljs-string">Scan</span>
<span class="hljs-attr">on:</span>
  <span class="hljs-attr">schedule:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">cron:</span> <span class="hljs-string">"0 2 * * *"</span> <span class="hljs-comment"># Daily at 2 AM UTC</span>
  <span class="hljs-attr">workflow_dispatch:</span>

<span class="hljs-attr">jobs:</span>
  <span class="hljs-attr">scan:</span>
    <span class="hljs-attr">runs-on:</span> <span class="hljs-string">ubuntu-latest</span>
    <span class="hljs-attr">permissions:</span>
      <span class="hljs-attr">id-token:</span> <span class="hljs-string">write</span>
      <span class="hljs-attr">contents:</span> <span class="hljs-string">read</span>
    <span class="hljs-attr">steps:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">uses:</span> <span class="hljs-string">actions/checkout@v3</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Azure</span> <span class="hljs-string">Login</span> <span class="hljs-string">(OIDC)</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">azure/login@v1</span>
        <span class="hljs-attr">with:</span>
          <span class="hljs-attr">client-id:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.AZURE_CLIENT_ID</span> <span class="hljs-string">}}</span>
          <span class="hljs-attr">tenant-id:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.AZURE_TENANT_ID</span> <span class="hljs-string">}}</span>
          <span class="hljs-attr">subscription-id:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.AZURE_SUBSCRIPTION_ID</span> <span class="hljs-string">}}</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Install</span> <span class="hljs-string">Dependencies</span>
        <span class="hljs-attr">shell:</span> <span class="hljs-string">pwsh</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">./project-workload-identity/scripts/Install-Dependencies.ps1</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Run</span> <span class="hljs-string">Scan</span>
        <span class="hljs-attr">shell:</span> <span class="hljs-string">pwsh</span>
        <span class="hljs-attr">env:</span>
          <span class="hljs-attr">WI_SCAN_TENANT_ID:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.WI_SCAN_TENANT_ID</span> <span class="hljs-string">}}</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">./project-workload-identity/scripts/Scan-And-Report.ps1</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Render</span> <span class="hljs-string">HTML</span> <span class="hljs-string">Report</span>
        <span class="hljs-attr">shell:</span> <span class="hljs-string">pwsh</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">./project-workload-identity/scripts/Write-ScanReport.ps1</span> <span class="hljs-string">-OutputFolder</span> <span class="hljs-string">./out</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Publish</span> <span class="hljs-string">Report</span> <span class="hljs-string">Summary</span>
        <span class="hljs-attr">shell:</span> <span class="hljs-string">pwsh</span>
        <span class="hljs-attr">env:</span>
          <span class="hljs-attr">REPORT_PATH:</span> <span class="hljs-string">./out/workload-identity-report.html</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">./project-workload-identity/scripts/Publish-ScanReportSummary.ps1</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Upload</span> <span class="hljs-string">Artifacts</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">actions/upload-artifact@v3</span>
        <span class="hljs-attr">with:</span>
          <span class="hljs-attr">name:</span> <span class="hljs-string">wi-scan-artifacts</span>
          <span class="hljs-attr">path:</span> <span class="hljs-string">./out/</span>
</code></pre>
<p>The <code>azure/login</code> step sets the <code>AZURE_CLIENT_ID</code>, <code>AZURE_TENANT_ID</code>, and <code>AZURE_FEDERATED_TOKEN_FILE</code> environment variables. When the scan script calls <code>Connect-WiGraph</code>, it detects those variables and uses <code>Connect-MgGraph -EnvironmentVariable</code> to authenticate as the service principal—no client secret required.</p>
<p>The workflow runs nightly, scans the tenant, generates an HTML report, publishes a summary to the GitHub Actions job summary (so you can see highlights without downloading artifacts), and uploads the full JSON/CSV artifacts for later analysis.</p>
<p>If a new high-risk app appears, you'll see it in the next morning's workflow run. If someone creates a service principal with Global Administrator, it shows up in <code>privileged-roles.json</code>. If Identity Protection flags a risky workload identity, it's in the triage report. All without manual intervention.</p>
<p><strong>Optionally generate</strong> <code>conditional-access-candidates.json</code>: Extend the scan to produce a list of service principal object IDs exceeding defined risk thresholds (e.g., high-risk credentials + privileged roles, or confirmed risky status from Identity Protection). This artifact can drive out-of-band policy provisioning—import the object IDs into a Conditional Access policy scoped to block access from untrusted locations or at elevated risk levels. The candidates file becomes a living policy scope that updates nightly as new high-risk apps are discovered.</p>
<h2 id="heading-what-you-get-out">What You Get Out</h2>
<p>Every scan writes a set of artifacts to <code>./out/</code>. Here's what each one contains:</p>
<p><code>credential-inventory.json</code> / <code>.csv</code> — Every credential across all app registrations: when it was created, when it expires, its type (secret/cert/federated), risk level, and recommended actions. This is your starting point for building a remediation roadmap.</p>
<p><code>privileged-roles.json</code> — Service principals with directory role assignments. If a CI pipeline has Global Administrator or an integration app has Privileged Role Administrator, it's in here.</p>
<p><code>high-privilege-app-permissions.json</code> — Applications holding dangerous Graph permissions like <code>Directory.ReadWrite.All</code>, <code>Application.ReadWrite.All</code>, or <a target="_blank" href="http://RoleManagement.ReadWrite.Directory"><code>RoleManagement.ReadWrite.Directory</code></a>. These are the apps that could wreak havoc if compromised.</p>
<p><code>consent-settings.json</code> — Your tenant's authorization policy: whether users can consent to apps, whether admin consent workflows are enabled, who can create apps. This is the posture that controls how workload identities proliferate.</p>
<p><code>risky-service-principals.json</code> — Identity Protection's list of risky workload identities (beta). Includes risk level, risk state (at risk, confirmed compromised, dismissed), and when the risk was detected.</p>
<p><code>risky-service-principal-triage.json</code> — Aggregated summary of risky service principals: how many at each risk level, distribution by state, recommendations for action.</p>
<p><code>scan-summary.json</code> — High-level counts: total apps, total credentials, how many are high-risk, how many are federated, etc. This is useful for dashboards or executive summaries.</p>
<p><code>workload-identity-report.html</code> — An HTML dashboard that presents all of the above in a human-readable format. Open it in a browser and you've got a visual overview with sortable tables and color-coded risk levels.</p>
<p>Demo HTML Report - PowerShell:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763790183680/63d8f6b2-3e79-4953-9c39-28834d010c58.png" alt class="image--center mx-auto" /></p>
<p>Demo HTML Report - GitHub Actions:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763790222752/078f7ed9-e909-4737-a138-b90ff1bf0196.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763790231788/1c66d735-d501-48bd-83a7-fc199b0f0dc6.png" alt class="image--center mx-auto" /></p>
<p>All JSON files include metadata (tenant ID, scan timestamp) and follow a consistent schema. You can load them into Power BI, push them to Azure Monitor Logs, or just <code>ConvertFrom-Json</code> and analyze them in PowerShell.</p>
<h2 id="heading-what-this-isnt">What This Isn't</h2>
<p>Let's be clear about scope. This toolkit is designed to discover workload identity risks and provide remediation helpers—it's not trying to be a full-blown identity governance platform.</p>
<p><strong>It's not a SIEM pipeline.</strong> The artifacts are JSON and CSV files you can push to your SIEM, but the toolkit itself doesn't handle log ingestion, correlation, alerting, or retention policies. You'll need to wire that up yourself.</p>
<p><strong>It's not a Conditional Access automation tool.</strong> You can use the privileged app permission data to inform CA policy design, but the toolkit doesn't create or modify Conditional Access policies. That's a separate problem domain.</p>
<p><strong>It's not doing ML-based risk enrichment.</strong> The risk scoring for credentials is rule-based (age &gt; 180 days = long-lived, expiry &lt; 30 days = near-expiry). Identity Protection's risky service principal detections come from Microsoft's ML models, but this toolkit just surfaces them—it doesn't extend or retrain the models.</p>
<p><strong>It's not bundling interactive dashboards.</strong> The HTML report is a standalone file you can open in a browser, but it's not a live dashboard with drill-downs and refresh buttons. If you want that, load the JSON artifacts into Power BI or Grafana.</p>
<p>The goal is to give you the raw material—discovery data, remediation helpers, structured output—so you can build the governance workflow that fits your organization. The toolkit handles the hard part (querying Graph, scoring risk, generating artifacts); you handle the integration layer.</p>
<h2 id="heading-security-and-governance-notes">Security and Governance Notes</h2>
<p>A few things to keep in mind as you use this toolkit:</p>
<p><strong>The artifacts don't contain secrets, but they do reveal privilege posture.</strong> The credential inventory includes metadata (credential IDs, start/end dates) but not the actual secrets or private keys. However, someone with access to these artifacts can see which apps have high-risk credentials, privileged role assignments, or dangerous permissions. Treat the output files accordingly—don't drop them in a public S3 bucket.</p>
<p><strong>Beta APIs are subject to change.</strong> The risky workload identity endpoints are currently in preview. Microsoft could change the schema, retire properties, or move things to v1.0 with breaking changes. Always test in a non-production tenant first, and expect to update the toolkit when those APIs go GA.</p>
<p><strong>Least privilege applies to the scanner too.</strong> The scan runs with read-only Graph permissions by default. Don't grant it <code>Application.ReadWrite.All</code> or <code>Directory.ReadWrite.All</code> unless you're actively using the remediation cmdlets. If you're running this in CI, consider using a separate service principal for discovery vs. remediation, so the nightly scan can't accidentally modify your tenant.</p>
<p><strong>Use the artifacts for compliance evidence.</strong> The JSON files are timestamped and include the tenant ID. If you need to prove to auditors that you're monitoring workload identity risk, archive these artifacts in Azure Storage or an S3 bucket with immutability policies. Now you've got a tamper-evident audit trail.</p>
<p><strong>Integrate with access reviews.</strong> If you're using Entra ID Governance access reviews for service principals, you can use the privileged role and high-privilege permission data from this toolkit to seed the review scope. Export the JSON, identify the high-risk apps, and kick off targeted reviews for those identities.</p>
<p><strong>Enable CAE for eligible workload identities.</strong> Applications accessing Microsoft Graph can opt into Continuous Access Evaluation by requesting tokens with the <code>xms_cc=cp1</code> claim. This enables 24-hour long-lived tokens subject to instant revocation on disable, delete, or risk state changes. Monitor revocations in the Service Principal sign-in logs—look for the "Continuous access evaluation" field and verify that blocked sessions show appropriate failure reasons. Track an adoption metric (percentage of high-privilege principals covered by Conditional Access policies) to measure your enforcement posture over time. Note: Conditional Access for workload identities requires Workload Identities Premium licensing and applies only to single-tenant service principals (managed identities and multi-tenant apps are excluded).</p>
<h2 id="heading-references">References</h2>
<p>PowerShell / Testing:</p>
<ul>
<li>Pester overview: <a target="_blank" href="https://learn.microsoft.com/powershell/scripting/testing/overview?view=powershell-7.4">https://learn.microsoft.com/powershell/scripting/testing/overview?view=powershell-7.4</a></li>
</ul>
<p>Graph SDK (v1.0):</p>
<ul>
<li><p>Connect-MgGraph: <a target="_blank" href="https://learn.microsoft.com/powershell/microsoftgraph/authentication/connect-mggraph?view=graph-powershell-1.0">https://learn.microsoft.com/powershell/microsoftgraph/authentication/connect-mggraph?view=graph-powershell-1.0</a></p>
</li>
<li><p>Get-MgApplication: <a target="_blank" href="https://learn.microsoft.com/powershell/module/microsoft.graph.applications/get-mgapplication?view=graph-powershell-1.0">https://learn.microsoft.com/powershell/module/microsoft.graph.applications/get-mgapplication?view=graph-powershell-1.0</a></p>
</li>
<li><p>New-MgApplicationFederatedIdentityCredential: <a target="_blank" href="https://learn.microsoft.com/powershell/module/microsoft.graph.applications/new-mgapplicationfederatedidentitycredential?view=graph-powershell-1.0">https://learn.microsoft.com/powershell/module/microsoft.graph.applications/new-mgapplicationfederatedidentitycredential?view=graph-powershell-1.0</a></p>
</li>
<li><p>Add-MgApplicationKey: <a target="_blank" href="https://learn.microsoft.com/powershell/module/microsoft.graph.applications/add-mgapplicationkey?view=graph-powershell-1.0">https://learn.microsoft.com/powershell/module/microsoft.graph.applications/add-mgapplicationkey?view=graph-powershell-1.0</a></p>
</li>
<li><p>Get-MgPolicyAuthorizationPolicy: <a target="_blank" href="https://learn.microsoft.com/powershell/module/microsoft.graph.identity.signins/get-mgpolicyauthorizationpolicy?view=graph-powershell-1.0">https://learn.microsoft.com/powershell/module/microsoft.graph.identity.signins/get-mgpolicyauthorizationpolicy?view=graph-powershell-1.0</a></p>
</li>
</ul>
<p>Graph Beta (Risky Workload Identities):</p>
<ul>
<li><p>List risky SPs: <a target="_blank" href="https://learn.microsoft.com/en-us/graph/api/identityprotectionroot-list-riskyserviceprincipals?view=graph-rest-beta">https://learn.microsoft.com/en-us/graph/api/identityprotectionroot-list-riskyserviceprincipals?view=graph-rest-beta</a></p>
</li>
<li><p>Risk history: <a target="_blank" href="https://learn.microsoft.com/en-us/graph/api/riskyserviceprincipal-list-history?view=graph-rest-beta">https://learn.microsoft.com/en-us/graph/api/riskyserviceprincipal-list-history?view=graph-rest-beta</a></p>
</li>
<li><p>Confirm compromised: <a target="_blank" href="https://learn.microsoft.com/en-us/graph/api/riskyserviceprincipal-confirmcompromised?view=graph-rest-beta">https://learn.microsoft.com/en-us/graph/api/riskyserviceprincipal-confirmcompromised?view=graph-rest-beta</a></p>
</li>
<li><p>Dismiss risk: <a target="_blank" href="https://learn.microsoft.com/en-us/graph/api/riskyserviceprincipal-dismiss?view=graph-rest-beta">https://learn.microsoft.com/en-us/graph/api/riskyserviceprincipal-dismiss?view=graph-rest-beta</a></p>
</li>
</ul>
<h2 id="heading-where-to-go-from-here">Where to Go From Here</h2>
<p>Workload identity risk isn't a "rotate secrets faster" problem. It's a "stop using secrets" problem. And until you get there, it's a "know what you have" problem.</p>
<p>This toolkit gives you visibility into the current state: which apps are using ancient secrets, which service principals have privileged access, which identities are already flagged by Identity Protection. That's the baseline. From there, you build a remediation roadmap: migrate GitHub Actions to OIDC federation, rotate long-lived certificates to short-lived ones, remove excessive Graph permissions, revoke privileged roles that aren't actively used.</p>
<p>The key is making this continuous. Run the scan nightly in CI/CD. Publish the results to your security team's dashboard. Alert on new high-risk apps. Track migration progress over time. When the next penetration test happens, you'll have months of audit data showing you've been actively managing workload identity risk—not just reacting to findings.</p>
<p>Extend the artifacts into whatever system you already use: Power BI for executive dashboards, Azure Monitor Logs for alerting, ServiceNow for ticketing, Jira for remediation tracking. The JSON schema is stable and documented; you're not locked into a proprietary format.</p>
<p>The end goal is a tenant where standing secrets don't exist, privileged assignments are time-bound and deliberate, and compliance evidence is generated automatically. That's achievable today with the tools Entra ID already provides—federated credentials, short-lived certificates, managed identities, Conditional Access for service principals, Identity Protection for workload identities. This project just gives you the automation to make it practical.</p>
<p>Start with a scan. See what you're dealing with. Build a plan. Automate the remediation. Repeat.</p>
<p>The blueprint is here. The rest is execution. 🚀’</p>
<p>For all the files and code snippets mentioned, check out my <a target="_blank" href="https://github.com/broberts23/vsCode/tree/main/project-workload-identity">GitHub repository</a>.</p>
<p>License: MIT</p>
]]></content:encoded></item><item><title><![CDATA[Zero-to-DC: Building a Disposable Active Directory Lab in Azure with Bicep and PowerShell]]></title><description><![CDATA[Introduction
This all started from a very practical problem: I wanted a disposable, repeatable Active Directory lab I could spin up for demos, tutorials, and future blog posts (stay tuned). I needed something I could tear down and rebuild without fea...]]></description><link>https://benroberts.io/zero-to-dc-building-a-disposable-active-directory-lab-in-azure-with-bicep-and-powershell</link><guid isPermaLink="true">https://benroberts.io/zero-to-dc-building-a-disposable-active-directory-lab-in-azure-with-bicep-and-powershell</guid><category><![CDATA[Powershell]]></category><category><![CDATA[Azure]]></category><category><![CDATA[Bicep]]></category><category><![CDATA[Active Directory]]></category><category><![CDATA[automation]]></category><dc:creator><![CDATA[Ben Roberts]]></dc:creator><pubDate>Sun, 14 Dec 2025 13:00:53 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1764465107561/493d6aa8-b8cd-4fad-b65f-f0c867b6f1fe.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-introduction">Introduction</h2>
<p>This all started from a very practical problem: I wanted a <strong>disposable, repeatable Active Directory lab</strong> I could spin up for demos, tutorials, and future blog posts (stay tuned). I needed something I could tear down and rebuild without fear—no hand-configured servers, no “click-next-until-it-works” wizards.</p>
<p>The goal was simple: run one command, get a fully functional AD domain in Azure; run another, and everything disappears again. Under the hood, though, that meant solving some surprisingly gnarly problems: secure credential handling, domain controller promotion across reboots, post-configuration of service accounts and test users, and enough logging that I could actually debug it at 2 a.m. when something went sideways. 🤣</p>
<p>This post walks through how that infrastructure works: the <strong>Bicep template</strong> that defines the environment, the <code>Deploy-Complete.ps1</code> script that orchestrates it, and the <strong>PowerShell that runs inside the VM</strong> to actually stand up and configure the domain controller.</p>
<h2 id="heading-architecture-at-a-glance">Architecture at a Glance</h2>
<p>Before diving into the code, here's a bird's-eye view of what gets deployed:</p>
<pre><code class="lang-mermaid">flowchart TB
    subgraph Azure["Azure Resource Group"]
        subgraph VNet["Virtual Network (10.0.0.0/16)"]
            subgraph DCSub["DC Subnet (10.0.1.0/24)"]
                DC["Domain Controller VM
                (Windows Server 2022)"]
            end
            subgraph FuncSub["Function App Subnet (10.0.2.0/24)"]
                FA["Function App
                (PowerShell 7.4)"]
            end
        end
        KV["Key Vault
        (Secrets)"]
        LA["Log Analytics
        + App Insights"]
        ST["Storage Account"]
    end

    FA --&gt;|Managed Identity| KV
    FA --&gt;|LDAP/Password Reset| DC
    FA --&gt; LA
    FA --&gt; ST
</code></pre>
<p>The domain controller sits in its own subnet, the function app in another. Key Vault holds the service account credentials, and everything feeds telemetry into Log Analytics. Simple, but it took some iteration to get the automation right.</p>
<p>One of those iterations was networking. The Function App isn’t automatically “in your VNet,” and for this scenario that matters: the API needs a private path to the DC (LDAPS on 636) and it needs to resolve the DC’s internal name reliably.</p>
<p>So the infrastructure doesn’t just deploy a VNet—it makes the Function App a first-class citizen inside it: a delegated subnet for VNet integration, routing that keeps traffic on the private path, and DNS that intentionally points the function at the domain controller.</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before you can deploy this lab, make sure you have:</p>
<ul>
<li><p>An <strong>Azure subscription</strong> with at least Contributor rights on the target resource group</p>
</li>
<li><p><strong>PowerShell 7.4+</strong> installed locally (the deployment script requires it)</p>
</li>
<li><p>The <strong>Az PowerShell module</strong> (<code>Install-Module Az -Scope CurrentUser</code>)</p>
</li>
<li><p>The <strong>Bicep CLI</strong> (<a target="_blank" href="https://learn.microsoft.com/azure/azure-resource-manager/bicep/install">install instructions</a>)</p>
</li>
<li><p>A clone of the repo with the parameter files configured for your environment</p>
</li>
</ul>
<p>With those in place, you're ready to go.</p>
<h2 id="heading-the-blueprint-bicep-as-our-architect">The Blueprint: Bicep as Our Architect</h2>
<p>I started with Bicep, Azure's declarative language for infrastructure. Instead of clicking through endless Azure Portal screens, I wrote a template that describes everything:</p>
<ul>
<li><p>The domain controller VM, with spot pricing for cost savings (because why pay more for test/dev?)</p>
</li>
<li><p>A dedicated VNet, subnets, and NSGs to keep traffic locked down</p>
</li>
<li><p>Key Vault for secrets, Log Analytics for diagnostics, and all the glue that ties it together</p>
</li>
</ul>
<p>The core of the VM definition in <code>infra/main.bicep</code> looks like this:</p>
<pre><code class="lang-powershell">resource dcVm <span class="hljs-string">'Microsoft.Compute/virtualMachines@2025-04-01'</span> = <span class="hljs-keyword">if</span> (deployDomainController) {
    name: dcVmName
    location: location
    tags: tags
    properties: {
        hardwareProfile: {
            vmSize: <span class="hljs-string">'Standard_D2s_v3'</span>
        }
        priority: <span class="hljs-string">'Spot'</span>
        evictionPolicy: <span class="hljs-string">'Deallocate'</span>
        billingProfile: {
            maxPrice: <span class="hljs-literal">-1</span>
        }
        osProfile: {
            computerName: take(dcVmName, <span class="hljs-number">15</span>)
            adminUsername: vmAdminUsername
            adminPassword: vmAdminPassword
            windowsConfiguration: {
                enableAutomaticUpdates: true
                provisionVMAgent: true
                timeZone: <span class="hljs-string">'UTC'</span>
            }
        }
        // ... storageProfile, networkProfile, diagnosticsProfile ...
    }
}
</code></pre>
<p>The <code>priority: 'Spot'</code> and <code>evictionPolicy: 'Deallocate'</code> combination gives me a cost‑effective lab that can be evicted without data loss, because the actual AD database lives on a managed data disk. The <code>maxPrice: -1</code> setting tells Azure “go up to the regular pay‑as‑you‑go price if needed,” which keeps the lab resilient without me micromanaging spot pricing.</p>
<p>Around that VM, the Bicep file provisions:</p>
<ul>
<li><p>A VNet with separate subnets for the domain controller and the function app</p>
</li>
<li><p>A Network Security Group that allows the handful of ports AD really needs</p>
</li>
<li><p>A Key Vault with a JSON secret that stores the service account credentials</p>
</li>
<li><p>A Log Analytics workspace and Application Insights for observability</p>
</li>
</ul>
<p>Every parameter—domain name, NetBIOS name, admin credentials—can be set at deploy time. If you want to change the environment from dev to prod, it’s just a flag. The Bicep file is our single source of truth, and it’s versioned right alongside the rest of the repo.</p>
<p>And because the function app needs to reach the DC over the VNet, the template also bakes in a couple of important network choices:  </p>
<ul>
<li><p>The Function App runs on <strong>Elastic Premium (EP1)</strong> (Consumption can’t do VNet integration)</p>
</li>
<li><p>The Function App is integrated with the <code>FunctionAppSubnet</code> and configured to route outbound traffic via the VNet</p>
</li>
<li><p>DNS for the Function App is pointed at the domain controller, so <code>dcname.contoso.local</code> resolves the same way it would from a domain-joined machine</p>
</li>
</ul>
<p>If you want to go deeper on Bicep itself, the official docs are here:</p>
<ul>
<li><p>Azure Bicep overview: https://learn.microsoft.com/azure/azure-resource-manager/bicep/overview</p>
</li>
<li><p>Bicep deployment with CLI/PowerShell: https://learn.microsoft.com/azure/azure-resource-manager/bicep/deploy-cli</p>
</li>
</ul>
<h2 id="heading-orchestration-deploy-completeps1-the-maestro">Orchestration: Deploy-Complete.ps1, the Maestro</h2>
<p>Once the infrastructure is described, I need to bring it to life. That's where <code>scripts/Deploy-Complete.ps1</code> comes in. This PowerShell script isn’t just a wrapper—it’s the conductor of our deployment symphony.</p>
<p>At the top, it does some basic but important hygiene:</p>
<pre><code class="lang-powershell"><span class="hljs-comment">#!/usr/bin/env pwsh</span>
<span class="hljs-comment">#Requires -Version 7.4</span>

<span class="hljs-built_in">Set-StrictMode</span> <span class="hljs-literal">-Version</span> Latest
<span class="hljs-variable">$ErrorActionPreference</span> = <span class="hljs-string">'Stop'</span>
</code></pre>
<p><code>Set-StrictMode</code> and <code>$ErrorActionPreference = 'Stop'</code> ensure that typos and unexpected errors don’t silently slip by—exactly what you want in an automation script that’s going to manipulate infrastructure. From there, it validates prerequisites (Az module, Bicep CLI, parameter files) and makes sure you’re logged in with <code>Connect-AzAccount</code>.</p>
<p>The real fun starts when you enable the domain controller flag. The script wires up <strong>secure parameters</strong> for the VM admin and service account passwords, then calls into Bicep using <code>New-AzResourceGroupDeployment</code>:</p>
<pre><code class="lang-powershell"><span class="hljs-variable">$deployment</span> = <span class="hljs-built_in">New-AzResourceGroupDeployment</span> <span class="hljs-selector-tag">@</span>{
    ResourceGroupName     = <span class="hljs-variable">$ResourceGroupName</span>
    TemplateFile          = <span class="hljs-variable">$bicepFile</span>
    TemplateParameterFile = <span class="hljs-variable">$parametersFile</span>
    vmAdminPassword       = <span class="hljs-variable">$VmAdminPassword</span>
    serviceAccountPassword = <span class="hljs-variable">$ServiceAccountPassword</span>
}
</code></pre>
<p>Once the infrastructure is laid down, <code>Deploy-Complete.ps1</code> moves into a second phase: in‑guest configuration. If you’ve asked for a domain controller, the script waits for the VM to be running, then uses <strong>Azure VM Run Command</strong> to invoke <code>Bootstrap-ADDSDomain.ps1</code> on the VM itself:</p>
<pre><code class="lang-powershell"><span class="hljs-variable">$bootstrapScript</span> = <span class="hljs-built_in">Get-Content</span> (<span class="hljs-built_in">Join-Path</span> <span class="hljs-variable">$scriptDir</span> <span class="hljs-string">'Bootstrap-ADDSDomain.ps1'</span>) <span class="hljs-literal">-Raw</span>
<span class="hljs-variable">$plainPassword</span>   = [<span class="hljs-type">System.Net.NetworkCredential</span>]::new(<span class="hljs-string">''</span>, <span class="hljs-variable">$VmAdminPassword</span>).Password
<span class="hljs-variable">$passwordBase64</span>  = [<span class="hljs-type">Convert</span>]::ToBase64String([<span class="hljs-type">System.Text.Encoding</span>]::UTF8.GetBytes(<span class="hljs-variable">$plainPassword</span>))

<span class="hljs-variable">$null</span> = <span class="hljs-built_in">Invoke-AzVMRunCommand</span> <span class="hljs-literal">-AsJob</span> `
    <span class="hljs-literal">-ResourceGroupName</span> <span class="hljs-variable">$ResourceGroupName</span> `
    <span class="hljs-literal">-VMName</span>            <span class="hljs-variable">$VmName</span> `
    <span class="hljs-literal">-CommandId</span>         <span class="hljs-string">'RunPowerShellScript'</span> `
    <span class="hljs-literal">-ScriptString</span>      <span class="hljs-variable">$bootstrapScript</span> `
    <span class="hljs-literal">-Parameter</span>         <span class="hljs-selector-tag">@</span>{
        DomainName                  = <span class="hljs-variable">$DomainName</span>
        DomainNetBiosName           = <span class="hljs-variable">$DomainNetBiosName</span>
        SafeModeAdminPasswordBase64 = <span class="hljs-variable">$passwordBase64</span>
    }
</code></pre>
<p>The important detail here is the <strong>Base64 encoding</strong> of the Safe Mode password. Passing <code>SecureString</code> values across the Run Command boundary can lead to mangled data; by converting to UTF‑8 bytes, then to Base64, then reversing that inside the VM, I keep the password intact without logging it or exposing it in plain text outside controlled boundaries.</p>
<p>Azure Run Command docs if you want to explore this pattern yourself: https://learn.microsoft.com/azure/virtual-machines/run-command</p>
<h2 id="heading-the-heartbeat-domain-controller-promotion">The Heartbeat: Domain Controller Promotion</h2>
<p>Promoting a Windows Server to a domain controller is a delicate dance. The script inside the VM, <code>scripts/Bootstrap-ADDSDomain.ps1</code>, handles the choreography.</p>
<p>First, it formats the attached data disk and mounts it to a stable drive letter so the AD database and SYSVOL have a predictable home. Then it installs the <strong>Active Directory Domain Services</strong> role and related management tools. Only after that does it generate the actual promotion script.</p>
<p>The interesting bit is how the promotion is launched:</p>
<pre><code class="lang-powershell"><span class="hljs-variable">$promotionScript</span> = <span class="hljs-string">@'
Start-Transcript -Path "C:\temp\Promote-ADDSForest-Transcript.log" -Append

Import-Module ADDSDeployment -ErrorAction Stop

$passwordPlain = [System.Text.Encoding]::UTF8.GetString([Convert]::FromBase64String('$SafeModeAdminPasswordBase64'))
$passwordSec   = ConvertTo-SecureString -String $passwordPlain -AsPlainText -Force

Install-ADDSForest -DomainName '$DomainName' -DomainNetbiosName '$DomainNetBiosName' `
    -SafeModeAdministratorPassword $passwordSec -Force:$true -NoRebootOnCompletion:$false

Stop-Transcript
'@</span>

<span class="hljs-built_in">Set-Content</span> <span class="hljs-literal">-Path</span> <span class="hljs-string">'C:\temp\Promote-ADDSForest.ps1'</span> <span class="hljs-literal">-Value</span> <span class="hljs-variable">$promotionScript</span> <span class="hljs-literal">-Encoding</span> UTF8
<span class="hljs-built_in">Start-Process</span> <span class="hljs-literal">-FilePath</span> <span class="hljs-string">'powershell.exe'</span> <span class="hljs-literal">-ArgumentList</span> <span class="hljs-string">'-NoProfile'</span>,<span class="hljs-string">'-ExecutionPolicy'</span>,<span class="hljs-string">'Bypass'</span>,<span class="hljs-string">'-File'</span>,<span class="hljs-string">'C:\temp\Promote-ADDSForest.ps1'</span> <span class="hljs-literal">-WindowStyle</span> <span class="hljs-keyword">Hidden</span>
</code></pre>
<p>Two key design choices here:</p>
<ol>
<li><p>The promotion runs in a <strong>detached Windows PowerShell 5.1 process</strong>. That means when the VM reboots, Azure Run Command doesn’t get “stuck” waiting on a process that no longer exists.</p>
</li>
<li><p>The Safe Mode password is reconstructed from Base64 and converted to <code>SecureString</code> <strong>inside</strong> the VM, which avoids all the strange quoting and whitespace issues you get when trying to pass complex strings across layers.</p>
</li>
</ol>
<p>If you’re curious about the cmdlet doing the heavy lifting, <code>Install-ADDSForest</code> is documented here: https://learn.microsoft.com/powershell/module/addsdeployment/install-addsforest</p>
<h2 id="heading-detecting-the-reboot-smarter-than-powerstate">Detecting the Reboot: Smarter Than PowerState</h2>
<p>After the promotion, the VM reboots. But how do I know when it's back and ready? Early on, I tried watching the VM's <strong>PowerState</strong>, and it worked—until it didn't. There were edge cases where the VM reported <code>running</code> but AD services weren't ready yet.</p>
<p>The final approach lives in <code>Invoke-DomainControllerPostConfig</code> inside <code>Deploy-Complete.ps1</code>. It asks the VM for its boot time via CIM and watches for that value to change:</p>
<pre><code class="lang-powershell"><span class="hljs-variable">$bootTimeScript</span> = <span class="hljs-string">'Get-CimInstance -ClassName Win32_OperatingSystem | Select-Object -ExpandProperty LastBootUpTime'</span>
<span class="hljs-variable">$bootResult</span>     = <span class="hljs-built_in">Invoke-AzVMRunCommand</span> <span class="hljs-literal">-ResourceGroupName</span> <span class="hljs-variable">$ResourceGroupName</span> <span class="hljs-literal">-VMName</span> <span class="hljs-variable">$VmName</span> <span class="hljs-literal">-CommandId</span> <span class="hljs-string">'RunPowerShellScript'</span> <span class="hljs-literal">-ScriptString</span> <span class="hljs-variable">$bootTimeScript</span>

<span class="hljs-variable">$initialBootTime</span> = [<span class="hljs-built_in">DateTime</span>]::Parse((<span class="hljs-variable">$bootResult</span>.Value[<span class="hljs-number">0</span>].Message).Trim())

<span class="hljs-keyword">while</span> (<span class="hljs-variable">$elapsed</span> <span class="hljs-operator">-lt</span> <span class="hljs-variable">$timeout</span>) {
    <span class="hljs-variable">$currentBootResult</span> = <span class="hljs-built_in">Invoke-AzVMRunCommand</span> <span class="hljs-literal">-ResourceGroupName</span> <span class="hljs-variable">$ResourceGroupName</span> <span class="hljs-literal">-VMName</span> <span class="hljs-variable">$VmName</span> <span class="hljs-literal">-CommandId</span> <span class="hljs-string">'RunPowerShellScript'</span> <span class="hljs-literal">-ScriptString</span> <span class="hljs-variable">$bootTimeScript</span> <span class="hljs-literal">-ErrorAction</span> SilentlyContinue

    <span class="hljs-keyword">if</span> (<span class="hljs-operator">-not</span> <span class="hljs-variable">$currentBootResult</span>) {
        <span class="hljs-variable">$rebootWindowObserved</span> = <span class="hljs-variable">$true</span>
    }
    <span class="hljs-keyword">else</span> {
        <span class="hljs-variable">$currentBootTime</span> = [<span class="hljs-built_in">DateTime</span>]::Parse((<span class="hljs-variable">$currentBootResult</span>.Value[<span class="hljs-number">0</span>].Message).Trim())

        <span class="hljs-keyword">if</span> (<span class="hljs-variable">$initialBootTime</span> <span class="hljs-operator">-and</span> <span class="hljs-variable">$currentBootTime</span> <span class="hljs-operator">-and</span> (<span class="hljs-variable">$currentBootTime</span> <span class="hljs-operator">-ne</span> <span class="hljs-variable">$initialBootTime</span>)) {
            <span class="hljs-variable">$rebootDetected</span> = <span class="hljs-variable">$true</span>
        }
        <span class="hljs-keyword">elseif</span> (<span class="hljs-variable">$rebootWindowObserved</span> <span class="hljs-operator">-and</span> <span class="hljs-variable">$currentBootTime</span>) {
            <span class="hljs-variable">$rebootDetected</span> = <span class="hljs-variable">$true</span>
        }
    }

    <span class="hljs-keyword">if</span> (<span class="hljs-variable">$rebootDetected</span>) { <span class="hljs-keyword">break</span> }
    <span class="hljs-built_in">Start-Sleep</span> <span class="hljs-literal">-Seconds</span> <span class="hljs-variable">$checkInterval</span>
}
</code></pre>
<p>Only after a reboot is confidently detected does the script run a second health check:</p>
<pre><code class="lang-powershell"><span class="hljs-variable">$testScript</span> = <span class="hljs-string">'try { Import-Module ActiveDirectory -ErrorAction Stop; Get-ADDomain -ErrorAction Stop | Out-Null; exit 0 } catch { exit 1 }'</span>
<span class="hljs-variable">$testResult</span> = <span class="hljs-built_in">Invoke-AzVMRunCommand</span> <span class="hljs-literal">-ResourceGroupName</span> <span class="hljs-variable">$ResourceGroupName</span> <span class="hljs-literal">-VMName</span> <span class="hljs-variable">$VmName</span> <span class="hljs-literal">-CommandId</span> <span class="hljs-string">'RunPowerShellScript'</span> <span class="hljs-literal">-ScriptString</span> <span class="hljs-variable">$testScript</span> <span class="hljs-literal">-ErrorAction</span> SilentlyContinue
</code></pre>
<p>When <code>Get-ADDomain</code> finally succeeds, I know that AD Web Services are up and the domain controller is genuinely ready for the next phase.</p>
<h2 id="heading-post-configuration-making-ad-useful">Post-Configuration: Making AD Useful</h2>
<p>With the domain up, I run another script, <code>scripts/Configure-ADPostPromotion.ps1</code>, via Run Command. This is where the environment turns from “just a domain” into something that’s actually useful.</p>
<p>The script waits for AD Web Services to be available, then creates an <strong>Organizational Unit</strong> and a <strong>service account</strong> that will later be used by a function app. It also creates a handful of test users so you can validate the end‑to‑end reset flow.</p>
<p>And because the whole point of this lab is “real-world enough to be interesting,” it also wires up <strong>LDAPS</strong> on the domain controller. That ended up being one of the most important (and most finicky) pieces of the entire build.</p>
<p>The heart of the service account creation looks like this:</p>
<pre><code class="lang-powershell"><span class="hljs-variable">$serviceAccountPasswordSecure</span> = <span class="hljs-built_in">ConvertTo-SecureString</span> <span class="hljs-literal">-String</span> <span class="hljs-variable">$ServiceAccountPassword</span> <span class="hljs-literal">-AsPlainText</span> <span class="hljs-literal">-Force</span>

<span class="hljs-variable">$serviceAccountParams</span> = <span class="hljs-selector-tag">@</span>{
    Name                 = <span class="hljs-variable">$ServiceAccountName</span>
    SamAccountName       = <span class="hljs-variable">$ServiceAccountName</span>
    UserPrincipalName    = <span class="hljs-string">"<span class="hljs-variable">$ServiceAccountName</span>@<span class="hljs-variable">$DomainName</span>"</span>
    AccountPassword      = <span class="hljs-variable">$serviceAccountPasswordSecure</span>
    Enabled              = <span class="hljs-variable">$true</span>
    PasswordNeverExpires = <span class="hljs-variable">$true</span>
    CannotChangePassword = <span class="hljs-variable">$true</span>
    Path                 = <span class="hljs-variable">$ouPath</span>
    Description          = <span class="hljs-string">'Service account for Azure Function App password reset operations'</span>
}

<span class="hljs-built_in">New-ADUser</span> @serviceAccountParams
</code></pre>
<p>If the account already exists, the script switches to an update path:</p>
<pre><code class="lang-powershell"><span class="hljs-built_in">Set-ADAccountPassword</span> <span class="hljs-literal">-Identity</span> <span class="hljs-variable">$ServiceAccountName</span> <span class="hljs-literal">-NewPassword</span> <span class="hljs-variable">$serviceAccountPasswordSecure</span> <span class="hljs-literal">-Reset</span>
</code></pre>
<p>Finally, it grants the service account <strong>reset password rights</strong> at the domain level by editing the ACL and adding an access rule for the “Reset Password” extended right. This lines up perfectly with the least‑privilege expectations of the function app.</p>
<h3 id="heading-networking-getting-the-function-app-to-see-the-dc">Networking: Getting the Function App to “See” the DC</h3>
<p>Once the pieces were in place, the actual connectivity requirement was simple: the Function App must be able to reach the DC’s private IP on the right ports, and it must be able to resolve internal AD DNS names.</p>
<p>The part that surprised me was how much of that comes down to DNS behavior.</p>
<p>In <code>infra/main.bicep</code>, the Function App is integrated into the VNet subnet and configured with a couple of app settings that make it behave like a VNet-attached workload:</p>
<ul>
<li><code>WEBSITE_VNET_ROUTE_ALL = 1</code> routes outbound traffic through the VNet (so LDAPS to the DC stays on the private path).</li>
<li><code>WEBSITE_DNS_SERVER = 10.0.1.4</code> points the Function App at the DC for DNS, so it can resolve the AD domain and the DC’s FQDN.</li>
</ul>
<p>But if the Function App uses the DC for DNS, the DC now has to resolve public names too—especially during authentication flows where it may need to reach Entra ID endpoints.</p>
<p>That’s why <code>Configure-ADPostPromotion.ps1</code> configures DNS forwarders on the DC. The default is Azure’s VNet DNS forwarder <code>168.63.129.16</code>, which keeps internal resolution authoritative while still allowing lookups for everything else.</p>
<h3 id="heading-ldaps-giving-the-dc-a-real-tls-identity">LDAPS: Giving the DC a Real TLS Identity</h3>
<p>LDAP over SSL (port 636) isn’t something you “turn on” with a checkbox. On Windows Server, AD DS will only present LDAPS if the DC has a usable certificate in the <strong>Local Machine Personal</strong> store.</p>
<p>Here’s the pattern that ended up being reliable and repeatable:</p>
<ol>
<li><p><strong>Generate a self-signed LDAPS certificate during deployment</strong></p>
<ul>
<li><p><code>Deploy-Complete.ps1</code> generates a cert using OpenSSL with the right constraints:</p>
<ul>
<li><p><code>extendedKeyUsage = serverAuth</code></p>
</li>
<li><p>SANs for the DC FQDN (and a wildcard for the domain)</p>
</li>
</ul>
</li>
<li><p>It exports two forms:</p>
<ul>
<li><p>a <strong>PFX</strong> (private key included) for the domain controller</p>
</li>
<li><p>a <strong>DER</strong> <code>.cer</code> (public key only) that’s easy for clients to parse</p>
</li>
</ul>
</li>
</ul>
</li>
<li><p><strong>Install the PFX on the domain controller</strong></p>
<ul>
<li><p><code>Configure-ADPostPromotion.ps1</code> imports the PFX into <code>Cert:\LocalMachine\My</code>.</p>
</li>
<li><p>It performs basic sanity checks (private key present + “Server Authentication” EKU) and then does a quick <code>Test-NetConnection localhost -Port 636</code> to see if LDAPS is listening.</p>
</li>
</ul>
</li>
<li><p><strong>Publish the public cert for the client to pin</strong></p>
<ul>
<li><p>The public <code>.cer</code> is stored in Key Vault as <code>LDAPS-Certificate-CER</code>.</p>
</li>
<li><p>The Function App retrieves this value at runtime and uses it for <strong>strict certificate pinning + hostname validation</strong>.</p>
</li>
</ul>
</li>
</ol>
<p>That last step is the key architectural shift: rather than trying to “teach” the Function App host to trust a private PKI (which is often blocked in sandboxed hosting), the function validates the server cert directly during the LDAPS handshake. No hostname bypass, no trust-store hacks—just strict TLS with a pinned identity.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764465234241/03f31d35-c756-4d52-8b7f-072b33b9f312.png" alt class="image--center mx-auto" /></p>
<p>If you want to see the building blocks behind these cmdlets, the official docs are a great reference:</p>
<ul>
<li><p><code>New-ADUser</code>: https://learn.microsoft.com/powershell/module/activedirectory/new-aduser</p>
</li>
<li><p><code>New-ADOrganizationalUnit</code>: https://learn.microsoft.com/powershell/module/activedirectory/new-adorganizationalunit</p>
</li>
<li><p><code>Set-ADAccountPassword</code>: https://learn.microsoft.com/powershell/module/activedirectory/set-adaccountpassword</p>
</li>
</ul>
<h2 id="heading-lessons-learned-and-a-few-battle-scars">Lessons Learned (and a Few Battle Scars)</h2>
<p>I didn't get it right the first time. Not even close. Here are some of the detours I took along the way.</p>
<h3 id="heading-the-securestring-saga">The SecureString Saga</h3>
<p>My first attempt passed <code>SecureString</code> values directly through Azure Run Command parameters. That worked fine locally, but when the script ran inside the VM, the password came out garbled—random whitespace, truncated characters, the works. After way too much debugging, I landed on the Base64 approach: convert the password to UTF-8 bytes, encode as Base64, pass that string, then decode and convert back to <code>SecureString</code> inside the VM. It's an extra step, but it's reliable across every boundary.</p>
<h3 id="heading-scheduled-tasks-a-dead-end">Scheduled Tasks: A Dead End</h3>
<p>Originally, the promotion script created a Windows Scheduled Task to run <code>Install-ADDSForest</code>. The idea was that the task would survive the reboot and finish the job. In practice, it was flaky—sometimes the task fired, sometimes it didn't, and debugging was a nightmare. Switching to a detached <code>powershell.exe</code> process (launched via <code>Start-Process</code>) was simpler and worked every time. The promotion runs, the VM reboots, and Run Command just times out gracefully while the real work continues in the background.</p>
<h3 id="heading-powerstate-lies-sometimes">PowerState Lies (Sometimes)</h3>
<p>I wasted hours watching the VM's <code>PowerState</code> flip to <code>running</code> and assuming AD was ready. It wasn't. The VM was up, but AD Web Services were still initializing. The fix was to query the VM's <strong>boot time</strong> via CIM and wait for it to actually change after the reboot. Even better, I treat Run Command failures during the reboot window as a signal that the VM is mid-restart. Once boot time updates and <code>Get-ADDomain</code> succeeds, I know the domain is genuinely online.</p>
<h3 id="heading-log-everything-to-disk">Log Everything to Disk</h3>
<p>This sounds obvious, but it saved me more than once. Every script writes progress and errors to <code>C:\temp</code>. When something breaks, I can RDP in, open the logs, and see exactly where the process stopped. Transcripts, timestamps, and descriptive messages—don't skip them.</p>
<h3 id="heading-ldaps-certificates-are-picky-and-schannel-will-tell-you">LDAPS Certificates Are Picky (and Schannel Will Tell You)</h3>
<p>The fastest way to lose an afternoon is to generate a certificate that looks “fine” but isn’t usable by Schannel. If the certificate doesn’t have a private key, doesn’t include the right EKU, or doesn’t match the hostname the client is connecting to, LDAPS can fail with confusing symptoms.</p>
<p>The fix was to treat the certificate like a real production TLS credential: SANs, proper EKU, and a private key that actually imports correctly. Once that’s in place, LDAPS becomes boring—in the best way.</p>
<h2 id="heading-the-end-result-automated-auditable-and-secure">The End Result: Automated, Auditable, and Secure</h2>
<p>Now, spinning up a new AD environment is as simple as running a script. Every step is automated, every credential is handled securely, and every log is there if you need it in less than 15 minutes! The infrastructure is ready for my password reset API, and I can redeploy, tear down, or troubleshoot with confidence.</p>
<p>If you’re building hybrid cloud solutions, don’t underestimate the value of good orchestration and clear logging. It’s the difference between a fragile dev/lab environment and a production-ready system.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764466831532/2f30782b-2f11-4d68-8f91-88124229b62b.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-how-the-pieces-fit-together-in-the-repo">How the Pieces Fit Together in the Repo</h2>
<p>If you want to explore or reuse this setup, here’s the key structure of the repo:</p>
<pre><code class="lang-text">project-functionapp-roles/
    infra/
        main.bicep                  # All Azure resources, including the domain controller VM
        parameters.dev.json         # Parameter file for deployments
    scripts/
        Deploy-Complete.ps1         # Orchestrates deployment, promotion, and post-config
        Bootstrap-ADDSDomain.ps1    # Runs inside the VM to install AD DS and promote to DC
        Configure-ADPostPromotion.ps1 # Creates OU, service account, and test users
</code></pre>
<p>You can clone the repo, tweak the parameters, and use it as your own disposable AD lab.</p>
<p>Azure PowerShell docs for some of the commands used here:</p>
<ul>
<li><p><code>New-AzResourceGroupDeployment</code>: https://learn.microsoft.com/powershell/module/az.resources/new-azresourcegroupdeployment</p>
</li>
<li><p><code>Get-AzVM</code> and <code>Invoke-AzVMRunCommand</code>: https://learn.microsoft.com/powershell/module/az.compute/invoke-azvmruncommand</p>
</li>
</ul>
<h2 id="heading-tearing-it-all-down">Tearing It All Down</h2>
<p>The whole point of a disposable lab is that you can throw it away when you're done. Since everything lives in a single resource group, cleanup is one command:</p>
<pre><code class="lang-powershell"><span class="hljs-built_in">Remove-AzResourceGroup</span> <span class="hljs-literal">-Name</span> rg<span class="hljs-literal">-pwdreset</span><span class="hljs-literal">-dev</span> <span class="hljs-literal">-Force</span>
</code></pre>
<p>That deletes the VM, disks, VNet, Key Vault, and everything else in one shot. No orphaned resources, no lingering costs. If you're doing iterative development or writing your own blog posts, this is the reset button.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>At the end of the day, this whole setup is about confidence. I can spin up a full AD lab, break it in interesting ways, write about what I learned, and then throw it away—all without touching a production environment or repeating the same manual steps.</p>
<p>If you want to follow along or adapt it for your own scenarios, all of the code in this post lives in my GitHub repository:</p>
<ul>
<li>GitHub repo: https://github.com/broberts23/vsCode/project-functionapp-roles</li>
</ul>
<p>Clone it, customize the parameters, and you’ve got your own disposable AD playground ready for experiments, demos, and (in future posts) the password reset API that sits on top of it. 🚀</p>
<hr />
<p><strong>Built with</strong>: Bicep • PowerShell 7.4 &amp; 5.1 • Azure Run Command • Key Vault • Log Analytics</p>
<p><strong>Ready for</strong>: Real-world hybrid deployments, not just lab experiments.</p>
]]></content:encoded></item><item><title><![CDATA[A Guide to Deploying Self‑Hosted GitHub Runners on Azure Container Apps]]></title><description><![CDATA[Introduction
This post is a practical deep dive into running self‑hosted GitHub Actions runners on Azure Container Apps (ACA). The goal: ephemeral, on‑demand compute that scales up only when there are GitHub workflow jobs in the queue and scales to z...]]></description><link>https://benroberts.io/a-guide-to-deploying-selfhosted-github-runners-on-azure-container-apps</link><guid isPermaLink="true">https://benroberts.io/a-guide-to-deploying-selfhosted-github-runners-on-azure-container-apps</guid><category><![CDATA[Powershell]]></category><category><![CDATA[Entra ID]]></category><category><![CDATA[github-actions]]></category><dc:creator><![CDATA[Ben Roberts]]></dc:creator><pubDate>Sun, 30 Nov 2025 13:00:50 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1763886433421/6627fe51-a96b-457d-8473-dbeabf9c4748.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-introduction">Introduction</h2>
<p>This post is a practical deep dive into running self‑hosted GitHub Actions runners on Azure Container Apps (ACA). The goal: ephemeral, on‑demand compute that scales up only when there are GitHub workflow jobs in the queue and scales to zero when idle. You’ll see how the resources fit together (Log Analytics, Container Apps environment, Azure Container Registry), how the runner containers register with GitHub, how workflows target these runners, and how KEDA automatically scales job executions based on load.</p>
<p>If you maintain repositories that need custom tooling, access to private networks, or predictable performance characteristics, self‑hosted runners on ACA jobs give you serverless elasticity with infrastructure‑as‑code repeatability.</p>
<p>Key references:</p>
<ul>
<li><p>Jobs in Azure Container Apps: https://learn.microsoft.com/azure/container-apps/jobs?tabs=azure-cli</p>
</li>
<li><p>Containers in Azure Container Apps (registries, managed identities): https://learn.microsoft.com/azure/container-apps/containers?tabs=bicep</p>
</li>
<li><p>Tutorial (GitHub pivot): https://learn.microsoft.com/azure/container-apps/tutorial-ci-cd-runners-jobs?tabs=bicep&amp;pivots=container-apps-jobs-self-hosted-ci-cd-github-actions</p>
</li>
<li><p>KEDA GitHub runner scaler: https://keda.sh/docs/latest/scalers/github-runner/</p>
</li>
</ul>
<h2 id="heading-project-summary">Project summary</h2>
<p>The repo is intentionally split into two experiences:</p>
<ul>
<li><p><code>README.md</code> is a quick-start guide: copy a parameters file, build an image, run one script, and point a workflow at the labels.</p>
</li>
<li><p><code>blog.md</code> (this post) is the design notebook: why the Bicep looks the way it does, why KEDA is wired this way, and what trade-offs I made around identity, networking, and security.</p>
</li>
</ul>
<p>At a resource level, the composed Bicep template brings up everything this pattern needs in one shot: a Log Analytics workspace wired into a Container Apps managed environment, an Azure Container Registry (ACR) with managed identity pull configured, and a virtual network with a delegated subnet and network security group to keep traffic on your own address space. On top of that it defines the Azure Container Apps event‑driven job that runs the GitHub Actions runner image, registers with your repo or organization, and exits when the workflow completes, plus a KEDA <code>github-runner</code> scaler that watches your GitHub queue and decides when to trigger executions. The repo also includes a small container build scaffold (<code>Dockerfile.github</code>, <code>github-actions-runner/entrypoint.sh</code>, and <code>scripts/Build-GitHubRunnerImage.ps1</code>) that packages the upstream runner release (<code>v2.329.0</code> at https://github.com/actions/runner/releases/tag/v2.329.0) with the extra tooling you are likely to need.</p>
<p>When the deployment finishes, it returns the environment ID, job ID, managed identity principal IDs, and ACR metadata so you can plug those identifiers into dashboards or additional automation without hunting for them in the portal.</p>
<h2 id="heading-repository-structure-relevant-parts">Repository structure (relevant parts)</h2>
<pre><code class="lang-plaintext">project-github-runner/
├─ Dockerfile.github             # Dockerfile for the GitHub Actions runner image
├─ github-actions-runner/
│  └─ entrypoint.sh              # Container entrypoint configuring the runner
├─ README.md                     # Deployment guide and architecture overview
├─ infra/
│  ├─ parameters.sample.json     # Template for parameter overrides
│  ├─ main.bicep                 # Composes workspace, environment, ACR, and the job
│  ├─ containerapps/             # Bicep modules (job, env, registry)
│  └─ network/
│     └─ vnet.bicep              # Virtual network, delegated subnet, and NSG for Container Apps
└─ scripts/
  ├─ ContainerApps-Deploy-Job.ps1        # Creates Container Apps job via Azure CLI
  ├─ ContainerInit.ps1                   # Optional container init hook to install modules
  ├─ Create-FederatedCredential.ps1      # Adds Entra federated credential for GitHub OIDC
  ├─ Build-GitHubRunnerImage.ps1         # Builds/pushes the Docker image defined in Dockerfile.github
  └─ Deploy-GitHubRunner.ps1             # End-to-end deployment wrapper using Bicep
</code></pre>
<h2 id="heading-architecture-overview">Architecture overview</h2>
<p>At a high level, the moving pieces line up in a pretty natural way. An Azure Monitor Log Analytics workspace captures container logs, scale events, and job execution history at the environment level. That is why <code>infra/main.bicep</code> computes the workspace name from <code>baseName</code> and location and then passes both the workspace ID and a shared key into the managed environment module. A single Container Apps environment then hosts the job and acts as the security and logging boundary for CI workloads—anything that can touch internal build systems or package feeds lives in this environment or its peered VNets.</p>
<p>Network-wise, a dedicated virtual network and delegated subnet isolate the environment, and a network security group lets you enforce egress controls or lock down ingress. This trades a bit of deployment complexity for the ability to keep runners off the public internet. The runner image is pulled from ACR (or another registry), and I default to managed identity authentication so there are no registry passwords or admin credentials in the deployment. The Azure Container Apps job has the system-assigned identity enabled with ACR pull rights and, by default, the deployment also creates and attaches a user-assigned managed identity that you can reuse across deployments or share with other workloads.</p>
<p>Secrets finish the picture. GitHub App metadata (App ID and installation ID) arrives via GitHub environment variables and the private key arrives as an environment secret. The bootstrap pipeline passes those into the Bicep deployment, which copies the PEM into Azure Key Vault and wires a secret reference into the Container Apps job so both the runner bootstrap and KEDA scaler can authenticate without hard-coding secrets in Bicep. On top of that, the KEDA scaler queries GitHub for queued jobs using either PAT or GitHub App credentials; when the queue length exceeds your target, KEDA triggers ACA job executions and each execution runs one ephemeral runner container that exits when the workflow completes.</p>
<p>Flow:</p>
<ol>
<li><p>A GitHub workflow queues a job targeting your self‑hosted labels.</p>
</li>
<li><p>The KEDA scaler polls the GitHub API, scoped to your org/repo and labels, and detects queued work.</p>
</li>
<li><p>ACA starts N job executions (bounded by min/max per interval). Each execution creates a pod with the runner container.</p>
</li>
<li><p>The runner registers with GitHub, picks up the job, executes steps, and exits.</p>
</li>
<li><p>ACA marks the execution complete. With no pending work, subsequent polls result in zero executions and the job scales to zero.</p>
</li>
</ol>
<p>The rest of this post walks through the Bicep and runner image design in more detail, explaining why certain patterns look “heavier” than a minimal sample but pay off in day‑2 operations.</p>
<h2 id="heading-demo-github-actions-pipeline">Demo GitHub Actions pipeline</h2>
<p>To make the experience tangible, the repository includes <code>.github/workflows/demo-self-hosted-runner.yml</code>. When you dispatch this workflow from the <strong>Actions</strong> tab it spins up a simple matrix of nine parallel jobs, each targeting the <code>self-hosted</code> and <code>azure-container-apps</code> labels so work naturally lands on the Container Apps runners created by the Bicep deployment. The matrix uses <code>max-parallel: 9</code> so all jobs can run at once, which forces the KEDA scaler to scale the job up to nine containers (bounded by <code>maxExecutions</code>) and gives you something interesting to watch.</p>
<p>Inside each job, the steps record the runner host name, verify that PowerShell 7.4 is available, and simulate a short workload so the container stays alive long enough for you to observe behavior. When you trigger the workflow, GitHub places nine jobs in the queue (your <code>runnerLabels</code> parameter must match the labels in the workflow), the scaler sees the queue depth (with <code>targetWorkflowQueueLength</code> defaulting to <code>1</code>), and schedules executions in the Container Apps environment. Each execution registers an ephemeral runner, processes its slice of the matrix, and exits. You can monitor progress with <code>az containerapp job execution list --name &lt;jobName&gt; --resource-group &lt;rg&gt;</code>, drill into individual executions with <code>az containerapp job execution show --name &lt;executionName&gt; --job-name &lt;jobName&gt; --resource-group &lt;rg&gt;</code>, and use Log Analytics to look at job output.</p>
<p>In practice this little pipeline becomes a “smoke test” for the whole setup: it exercises scaling, confirms network access, and gives you telemetry before you trust the pattern with production workloads.</p>
<h2 id="heading-container-image-build-workflow">Container image build workflow</h2>
<p>The runner image is built locally from <code>Dockerfile.github</code>, which uses the official GitHub Actions runner base image (<code>ghcr.io/actions/actions-runner:2.329.0</code>) and adds an entrypoint script to handle registration. The <code>scripts/Build-GitHubRunnerImage.ps1</code> PowerShell script simplifies building and pushing the image to your ACR:</p>
<pre><code class="lang-powershell">./scripts/Build<span class="hljs-literal">-GitHubRunnerImage</span>.ps1 `
  <span class="hljs-literal">-ImageTag</span> <span class="hljs-string">"<span class="hljs-variable">$ACR_NAME</span>.azurecr.io/github-actions-runner:2.329.0"</span> `
  <span class="hljs-literal">-Push</span>
</code></pre>
<p>The script wraps <code>docker build</code>/<code>docker push</code>, allowing you to target alternative architectures or versions by adjusting parameters. Authenticate to your registry (for example, <code>az acr login --name $ACR_NAME</code>) before pushing. Because the Bicep deployment provisions the Azure Container Registry and grants the job managed identity <code>AcrPull</code>, update <code>containerImage</code> in your Bicep parameters to reference the new tag and redeploy.</p>
<h2 id="heading-networking-uplift-and-vnet-integration">Networking uplift and VNet integration</h2>
<p>Enterprise deployments often require private networking, static routing requirements, or traffic inspection. The <code>network/vnet.bicep</code> module provisions a virtual network, delegated subnet, and optional custom NSG rules so the Container Apps environment can attach directly to your address space. During deployment, the main Bicep template passes the subnet resource ID to the environment module, enabling <a target="_blank" href="https://learn.microsoft.com/en-us/azure/container-apps/custom-virtual-networks?tabs=bicep">VNet integration</a>.</p>
<p>Key parameters are exposed to keep address planning flexible:</p>
<ul>
<li><p><code>virtualNetworkAddressPrefix</code> and <code>containerAppsSubnetPrefix</code> define the VNet and infrastructure subnet CIDR blocks.</p>
</li>
<li><p><code>platformReservedCidr</code>, <code>platformReservedDnsIp</code>, and <code>dockerBridgeCidr</code> surface the optional networking ranges documented in <a target="_blank" href="https://learn.microsoft.com/en-us/azure/container-apps/vnet-custom?tabs=bash#networking-parameters">Networking parameters for Container Apps environments</a>, preventing conflicts with peered VNets.</p>
</li>
<li><p><code>internalEnvironment</code> toggles whether the managed environment is created as internal-only, removing the public VIP. See https://learn.microsoft.com/azure/container-apps/networking?tabs=bash#accessibility-level.</p>
</li>
</ul>
<p>By managing the VNet resources in Bicep you can apply Azure Policy, configure diagnostics, and enforce NSG rules consistently. Downstream workloads—build agents, package feeds, artifact stores—can live in the same VNet or attached spokes, enabling end-to-end private networking without exposing the runners to the public internet.</p>
<p>I deliberately stopped short of deploying auxiliary perimeter services (Azure Firewall, Private Endpoints, or private DNS zones) in this repo to keep the focus on the runner pattern itself. Those pieces tend to be very environment‑specific. The VNet module is designed so you can plug those in later without re‑architecting the job.</p>
<h2 id="heading-how-github-runners-work-in-aca-jobs-endtoend">How GitHub runners work in ACA jobs (end‑to‑end)</h2>
<h3 id="heading-1-resource-creation-bicep">1) Resource creation (Bicep)</h3>
<p>The main template (<code>infra/main.bicep</code>) composes a few focused modules:</p>
<ul>
<li><p><code>logAnalytics.bicep</code> provisions a workspace (retention configurable) and returns its IDs.</p>
</li>
<li><p><code>managedEnvironment.bicep</code> creates the Container Apps environment and connects it to Log Analytics.</p>
</li>
<li><p><code>containerRegistry.bicep</code> creates ACR and assigns <code>AcrPull</code> to a supplied principal.</p>
</li>
<li><p><code>githubRunnerJob.bicep</code> defines the event‑driven job (<code>Microsoft.App/jobs</code>) and KEDA scale rules.</p>
</li>
</ul>
<p>The interesting parts are the glue and the identity wiring in <code>main.bicep</code>. For example, runner configuration URLs are normalized up front:</p>
<pre><code class="lang-powershell">var githubApiUrlNormalized = endsWith(githubApiUrl, <span class="hljs-string">'/'</span>)
  ? substring(githubApiUrl, <span class="hljs-number">0</span>, max(length(githubApiUrl) - <span class="hljs-number">1</span>, <span class="hljs-number">0</span>))
  : githubApiUrl

var githubServerUrlNormalized = endsWith(githubServerUrl, <span class="hljs-string">'/'</span>)
  ? substring(githubServerUrl, <span class="hljs-number">0</span>, max(length(githubServerUrl) - <span class="hljs-number">1</span>, <span class="hljs-number">0</span>))
  : githubServerUrl
</code></pre>
<p>Normalizing here avoids subtle bugs later (for example double slashes in REST URLs or mismatched hostnames between the runner and the scaler) and keeps the job module simpler. Similarly, the template computes the runner registration URL and token endpoint based on the <code>runnerScope</code> parameter:</p>
<pre><code class="lang-powershell">var githubRunnerUrl = githubRunnerScope == <span class="hljs-string">'org'</span>
  ? <span class="hljs-string">'${githubServerUrlNormalized}/${githubOwner}'</span>
  : (githubRunnerScope == <span class="hljs-string">'ent'</span>
      ? <span class="hljs-string">'${githubServerUrlNormalized}/enterprises/${githubOwner}'</span>
      : <span class="hljs-string">'${githubServerUrlNormalized}/${githubOwner}/${githubRepo}'</span>)

var githubRegistrationTokenApiUrl = githubRunnerScope == <span class="hljs-string">'org'</span>
  ? <span class="hljs-string">'${githubApiUrlNormalized}/orgs/${githubOwner}/actions/runners/registration-token'</span>
  : (githubRunnerScope == <span class="hljs-string">'ent'</span>
      ? <span class="hljs-string">'${githubApiUrlNormalized}/enterprises/${githubOwner}/actions/runners/registration-token'</span>
      : <span class="hljs-string">'${githubApiUrlNormalized}/repos/${githubOwner}/${githubRepo}/actions/runners/registration-token'</span>)
</code></pre>
<p>This gives you one template that works for repo, org, or enterprise‑scoped runners without changing the job module.</p>
<h3 id="heading-2-deploying-the-job">2) Deploying the job</h3>
<p>In day‑to‑day use, the deployment flow is straightforward. You create or reuse a resource group and deploy <code>infra/main.bicep</code> with parameters for your GitHub owner and repository (or org scope), the container image you want to use for the runner (an ACR path or external registry), and your scaling preferences such as <code>minExecutions</code>, <code>maxExecutions</code>, <code>pollingInterval</code>, and <code>targetWorkflowQueueLength</code>. Network and identity parameters control how the job reaches ACR and any internal resources your builds might need. The deployment returns the job and environment IDs, and logs start flowing to Log Analytics automatically.</p>
<p>To keep long‑lived credentials out of source control, the GitHub Actions workflow reads <code>GH_APP_PRIVATE_KEY</code> from an environment secret (see https://docs.github.com/actions/security-guides/using-secrets-in-github-actions). The Bicep deployment writes that value into Azure Key Vault and assigns <code>Key Vault Secrets User</code> to the job identity, so at runtime the Container Apps secret can reference the vault URI and both the runner entrypoint and KEDA scaler can mint JWTs and installation tokens following https://docs.github.com/en/apps/creating-github-apps/authenticating-with-a-github-app/about-authentication-with-a-github-app.</p>
<p>The KEDA scaler only needs a handful of GitHub details to do its work: who owns the workloads (<code>owner</code>), how they are scoped (<code>runnerScope</code> as <code>repo</code>, <code>org</code>, or <code>ent</code>), which repositories to watch, which labels to honor, and how aggressively to scale (<code>targetWorkflowQueueLength</code> and <code>pollingInterval</code>). The job module receives these via parameters such as <code>runnerScope</code>, <code>githubRepositories</code>, and <code>scaleRunnerLabels</code>, so you can tune behavior just by changing parameter values instead of touching the module code.</p>
<h3 id="heading-3-runner-bootstrap-and-registration-with-github">3) Runner bootstrap and registration with GitHub</h3>
<p>When a job execution starts, ACA launches the runner container with environment variables and secrets provided by Bicep. The runner process performs the following steps:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763368811568/8d8f5cd8-3dfe-48fe-8e2c-2b1912eda35e.png" alt class="image--center mx-auto" /></p>
<ol>
<li>Exchanges the GitHub App installation token or PAT for a short‑lived registration token via GitHub REST: <code>POST /repos/{owner}/{repo}/actions/runners/registration-token</code> (or org/ent variant). The job template includes <code>REGISTRATION_TOKEN_API_URL</code> and <code>GH_URL</code>/repo for clarity.</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763369009492/9c12ef61-ef23-47d0-ad57-5fd83650406c.png" alt class="image--center mx-auto" /></p>
<ol>
<li><p>Configures the runner with your labels and repository/organization context.</p>
</li>
<li><p>Registers the runner; it shows up as an online self‑hosted runner in GitHub.</p>
</li>
<li><p>Waits for a workflow job assignment; when received, executes all steps.</p>
</li>
<li><p>On completion, the container exits. Ephemeral patterns typically remove the runner registration automatically on shutdown.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763368734753/8f88b2ff-d8fb-4e81-bf21-2fccbd453bd4.png" alt class="image--center mx-auto" /></p>
<p>This lifecycle ensures no idle, permanently registered VMs—each execution is purpose‑built and disposed.</p>
<h3 id="heading-4-using-the-runners-in-github-actions">4) Using the runners in GitHub Actions</h3>
<p>In your workflow YAML, you point jobs at these runners simply by targeting the right labels:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">runs-on:</span> [<span class="hljs-string">self-hosted</span>, <span class="hljs-string">azure-container-apps</span>]
</code></pre>
<p>You can add repo, org, or language-specific labels—<code>dotnet</code>, <code>arm64</code>, <code>build-tools</code>, and so on—to steer jobs toward particular images or toolchains. From GitHub’s perspective, jobs just queue until a matching self‑hosted runner comes online; behind the scenes, the KEDA scaler is the component that responds to that queue depth and triggers ACA job executions. Concurrency and parallelism end up being a combination of your workflow’s matrix/strategy choices in GitHub, the KEDA scaling parameters (min/max executions per polling interval), and the container resources (CPU/memory) and job settings like <code>parallelism</code> and <code>replicaCompletionCount</code> if you ever decide to run multiple containers per execution. Most of the time, the simple 1:1 mapping between a job execution and a single runner container works well.</p>
<h3 id="heading-5-keda-scaling-behavior">5) KEDA scaling behavior</h3>
<p>The <code>github-runner</code> scaler periodically queries the GitHub API to estimate queued work for the scope and labels you configured. Whenever the queue length meets or exceeds <code>targetWorkflowQueueLength</code>, KEDA triggers job executions up to <code>maxExecutions</code> for that polling interval. There are a couple of practical tuning knobs here. A lower <code>pollingInterval</code> reduces start latency at the cost of more API calls, so it is worth enabling ETags (<code>enableEtags=true</code>) to ease rate‑limit pressure. Narrowing the scope with explicit <code>repos</code> lists and well‑chosen labels both improves responsiveness and avoids over‑scaling generic runners for jobs that really need specialized images. Finally, remember that PATs have lower rate limits than GitHub Apps, so if you expect high throughput or many repositories, GitHub App auth is usually the better choice. For full details, see the KEDA scaler docs at https://keda.sh/docs/latest/scalers/github-runner/.</p>
<h2 id="heading-permissions-identity-and-secrets">Permissions, identity, and secrets</h2>
<p>This design leans hard on managed identities and GitHub Apps so you do not have to scatter long‑lived secrets across scripts and parameter files. For registry access, the job pulls from ACR using a managed identity rather than a username and password: you assign <code>AcrPull</code> to the job’s user‑assigned identity and configure the registry <code>identity</code> as either the UAMI resource ID or <code>system</code> for system‑assigned identity (see https://learn.microsoft.com/azure/container-apps/containers?tabs=bicep#managed-identity-with-azure-container-registry).</p>
<p>On the GitHub side, the recommended path is to use a GitHub App, not a PAT. The environment’s <code>GH_APP_ID</code> and <code>GH_APP_INSTALLATION_ID</code> variables and the <code>GH_APP_PRIVATE_KEY</code> secret supply the app metadata and PEM key to the bootstrap workflow (per https://docs.github.com/actions/security-guides/using-secrets-in-github-actions). The workflow passes these values into the Bicep deployment, which persists the key in Azure Key Vault and grants the job identity <code>Key Vault Secrets User</code>. At runtime the runner and scaler can mint JWTs and installation tokens on demand (see https://learn.microsoft.com/en-us/azure/container-apps/manage-secrets?tabs=arm-bicep and https://docs.github.com/en/apps/creating-github-apps/authenticating-with-a-github-app/about-authentication-with-a-github-app). A minimal‑scope PAT remains supported as a fallback if you need it, but the intent is that most deployments never have to rely on it.</p>
<p>Scope selection matters as well. When you use a personal GitHub user (for example <code>broberts23</code>) as <code>githubOwner</code>, you should set <code>githubRunnerScope</code> to <code>repo</code> so the scaler and runner use <code>/repos/{owner}/{repo}</code> endpoints and the GitHub App only needs <strong>Repository → Administration (Read &amp; write)</strong> on that repo. Reserve <code>githubRunnerScope: "org"</code> for situations where <code>githubOwner</code> is a real GitHub Organization, <code>/orgs/{org}</code> and <code>/orgs/{org}/repos</code> resolve successfully, and the app is installed on that organization with <strong>Organization → Self-hosted runners (Read &amp; write)</strong>.</p>
<p>GitHub’s own REST reference is very explicit about what permissions are required for the various registration endpoints, which is summarized in the table below.</p>
<p>GitHub App permission matrix:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Runner scope</td><td>Registration endpoint</td><td>Required permission</td></tr>
</thead>
<tbody>
<tr>
<td><code>repo</code></td><td><code>POST /repos/{owner}/{repo}/actions/runners/registration-token</code></td><td>Repository → <strong>Administration (Read &amp; write)</strong></td></tr>
<tr>
<td><code>org</code></td><td><code>POST /orgs/{org}/actions/runners/registration-token</code></td><td>Organization → <strong>Self-hosted runners (Read &amp; write)</strong></td></tr>
<tr>
<td><code>ent</code></td><td><code>POST /enterprises/{enterprise}/actions/runners/registration-token</code></td><td>Enterprise-level <strong>Self-hosted runners</strong> permission</td></tr>
</tbody>
</table>
</div><p>These requirements come directly from the GitHub REST reference (see https://docs.github.com/en/rest/actions/self-hosted-runners?apiVersion=2022-11-28#create-a-registration-token-for-a-repository and https://docs.github.com/en/rest/actions/self-hosted-runners?apiVersion=2022-11-28#create-a-registration-token-for-an-organization). If the installation token lacks the relevant permission or repository access, the API returns <code>403 Resource not accessible by integration</code> per https://docs.github.com/en/rest/using-the-rest-api/troubleshooting#resource-not-accessible. Whenever you change app permissions or repository selections, it is worth re‑authorizing the installation so GitHub issues tokens with the expanded scope.</p>
<p>Beyond these specifics, the security posture is intentionally conservative: there are no long‑lived runner nodes (job executions are ephemeral, which reduces standing privilege and image drift), images are versioned and tags pinned, and secrets never live directly in parameter files or scripts. Instead, you retrieve sensitive values locally with SecretManagement or GitHub environments, let the deployment write them into Key Vault, and rely on secret references at runtime while being careful not to log secret material.</p>
<h2 id="heading-observability-and-operations">Observability and operations</h2>
<p>Once everything is deployed, most of your day‑to‑day operational work happens in Log Analytics and through the Container Apps APIs. Container stdout/stderr and scale events flow into the workspace automatically, so you can use Kusto queries to investigate runner bootstrap behavior and job logs. For a quick health check, <code>az containerapp job execution list</code> (or the Azure portal) shows you recent executions and statuses, and you can drill into specific runs when something fails.</p>
<p>Because the pattern uses event‑driven jobs, cost tends to track actual work closely—you pay while containers run and effectively nothing when there is no queued work. It is still worth keeping an eye on image hygiene: keep runner images minimal, refresh them regularly, and, if your builds need large toolchains, consider splitting into multiple label‑targeted images so not every job has to carry every dependency.</p>
<h2 id="heading-troubleshooting-github-app-403-responses">Troubleshooting GitHub App 403 responses</h2>
<p>If you ever run into <code>403 Resource not accessible by integration</code> responses from GitHub, the logs from <code>github-actions-runner/entrypoint.sh</code> are your first stop. The script logs both the HTTP status code and response body whenever GitHub rejects a request, so instead of a silent <code>curl</code> failure you will see something like <code>GitHub API POST ...registration-token failed (403): {"message":"Resource not accessible by integration"...}</code> along with whatever additional detail GitHub provides.</p>
<p>From there, work through a short checklist. First, confirm that the GitHub App’s permissions align with the <code>githubRunnerScope</code> you chose, using the permission matrix above as a guide and the REST docs for the exact endpoint you are calling (for example https://docs.github.com/en/rest/actions/self-hosted-runners?apiVersion=2022-11-28#create-a-registration-token-for-a-repository). Next, look at the <code>X-Accepted-GitHub-Permissions</code> response header; GitHub includes it when a permission is missing, and it tells you exactly what the API expected (see https://docs.github.com/en/rest/using-the-rest-api/troubleshooting#resource-not-accessible). Finally, open the GitHub App installation page and make sure every repository listed in <code>githubRunnerRepositories</code> is actually selected (or that you have chosen “All repositories”). After changing either permissions or repository selection, re‑authorize the installation so new tokens pick up the broadened scope.</p>
<p>Finally, verify that the <code>GH_APP_PRIVATE_KEY</code> secret contains the full PEM content. When looking at the secret in Azure Key Vault, it should be in the form:</p>
<p><code>-----BEGIN RSA PRIVATE KEY-----\nMIIEpQIBAAKCAQEA...\n-----END RSA PRIVATE KEY-----\n</code></p>
<p>Another tip is to use Get-AzSecret to pull the secret locally and confirm it has the expected format:</p>
<pre><code class="lang-powershell">az keyvault secret show \
  -<span class="hljs-literal">-vault</span><span class="hljs-literal">-name</span> &lt;kv<span class="hljs-literal">-name</span>&gt; \
  -<span class="hljs-literal">-name</span> github<span class="hljs-literal">-app</span><span class="hljs-literal">-key</span> \
  -<span class="hljs-literal">-query</span> value \
  -<span class="hljs-literal">-output</span> tsv &gt; /tmp/app<span class="hljs-literal">-key</span>.txt

<span class="hljs-built_in">cat</span> /tmp/app<span class="hljs-literal">-key</span>.txt
</code></pre>
<p>Then use openssl to validate the PEM:</p>
<pre><code class="lang-bash">openssl rsa -<span class="hljs-keyword">in</span> ./github-app-private-key.pem -check
</code></pre>
<p>When the scaler has successfully authenticated with GitHub, you will see the <code>ScaledJob is ready for scaling</code> event log in Log Analystics</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763369171065/d1fff1d9-5a04-4beb-8493-dfe8bc57d9cf.png" alt class="image--center mx-auto" /></p>
<p>Implementation details in this repo</p>
<p>The composed <code>infra/main.bicep</code> surface area is intentionally small but flexible. You tell it which image to run and how much CPU/memory it should have, provide GitHub details like <code>githubOwner</code>, <code>githubRepo</code>, and <code>runnerLabels</code>, and then choose your scaling behavior with parameters such as <code>minExecutions</code>, <code>maxExecutions</code>, <code>pollingInterval</code>, and <code>targetWorkflowQueueLength</code>. Additional knobs control how the scaler talks to GitHub (<code>githubApiUrl</code>, <code>githubRunnerScope</code>, <code>githubRunnerRepositories</code>, <code>disableDefaultRunnerLabels</code>, <code>matchUnlabeledRunnerJobs</code>, <code>enableGithubEtags</code>, and <code>additionalScaleRuleAuth</code>) and how the job authenticates to Azure resources (<code>acrName</code>, <code>userAssignedIdentityId</code>).</p>
<p>The <code>containerapps/githubRunnerJob.bicep</code> module takes those inputs and renders the actual job resource, including the <code>rules: [ { type: 'github-runner', metadata, auth } ]</code> block for KEDA. That split keeps the job specification clean and auditable while still giving you the levers you need at deployment time.</p>
<h2 id="heading-github-actions-infrastructure-bootstrap-workflow">GitHub Actions infrastructure bootstrap workflow</h2>
<p>The repository includes <code>.github/workflows/bootstrap-infra.yml</code>, a GitHub Actions pipeline that builds the runner image and deploys the Bicep template end to end.</p>
<p>High-level behavior:</p>
<ul>
<li><p><strong>Trigger and inputs</strong>: The workflow runs on <code>workflow_dispatch</code> with inputs for <code>environment</code> (<code>dev</code> or <code>prod</code>), an optional <code>imageTagSuffix</code>, and an optional <code>parametersFile</code> override.</p>
</li>
<li><p><strong>Environment resolution</strong>: The <code>Resolve deployment configuration</code> step maps the logical environment to concrete values:</p>
<ul>
<li><p><code>dev</code> → <code>rg-github-runner-dev</code>, <code>eastus</code>, <code>ghrunnerdevacr</code>, parameters file <code>project-github-runner/infra/parameters.dev.json</code>.</p>
</li>
<li><p><code>prod</code> → <code>rg-github-runner-prod</code>, <code>eastus</code>, <code>ghrunnerprodacr</code>, parameters file <code>project-github-runner/infra/parameters.prod.json</code>.</p>
</li>
<li><p>If <code>parametersFile</code> is provided, it overrides the default; the step fails fast if the file does not exist.</p>
</li>
</ul>
</li>
<li><p><strong>Image tagging</strong>: The same step computes a container image tag of the form <code>&lt;acrName&gt;.azurecr.io/github-actions-runner:&lt;suffix&gt;</code>, where <code>&lt;suffix&gt;</code> defaults to <code>v${{ github.run_number }}</code> unless overridden by <code>imageTagSuffix</code>.</p>
</li>
</ul>
<p>Deployment sequence in the job:</p>
<ol>
<li><p><strong>Checkout</strong>: <code>actions/checkout@v4</code> pulls the repo so Docker and Bicep can see <code>project-github-runner</code>.</p>
</li>
<li><p><strong>Azure login (OIDC)</strong>: <a target="_blank" href="https://github.com/Azure/login"><code>azure/login@v2</code></a> authenticates to Azure using federated credentials, driven by <code>AZURE_CLIENT_ID</code>, <code>AZURE_TENANT_ID</code>, and <code>AZURE_SUBSCRIPTION_ID</code> in GitHub secrets. This follows the guidance in <a target="_blank" href="https://learn.microsoft.com/azure/azure-resource-manager/bicep/deploy-github-actions">Deploy Bicep with GitHub Actions</a>.</p>
</li>
<li><p><strong>Resource group creation</strong>: <a target="_blank" href="https://github.com/Azure/cli"><code>azure/cli@v2</code></a> runs <code>az group create</code> to ensure the target resource group exists (see <a target="_blank" href="https://learn.microsoft.com/cli/azure/group?view=azure-cli-latest#az-group-create"><code>az group create</code></a>).</p>
</li>
<li><p><strong>Prepare infrastructure deployment</strong>: An initial <code>azure/bicep-deploy@v2</code> step (<code>Deploy infrastructure (Bicep - prepare)</code>) runs against <code>infra/main.bicep</code> with the resolved parameters file and overrides for:</p>
<ul>
<li><p><code>containerImage</code>: the computed ACR image tag.</p>
</li>
<li><p><code>acrName</code>: the resolved registry name.</p>
</li>
<li><p><code>githubAppApplicationId</code>, <code>githubAppInstallationId</code>, <code>githubAppPrivateKey</code>: sourced from environment variables <code>GH_APP_ID</code>, <code>GH_APP_INSTALLATION_ID</code>, and secret <code>GH_APP_PRIVATE_KEY</code> respectively. This step is <code>continue-on-error: true</code> so the image build can proceed even if the first deployment attempt fails (for example, on a cold start).</p>
</li>
</ul>
</li>
<li><p><strong>ACR login</strong>: A plain <code>az acr login</code> signs the Docker client into the target ACR so the subsequent build can push the image.</p>
</li>
<li><p><strong>Build and push image</strong>: <a target="_blank" href="https://github.com/docker/setup-buildx-action"><code>docker/setup-buildx-action@v3</code></a> prepares Buildx, and <a target="_blank" href="https://github.com/docker/build-push-action#usage"><code>docker/build-push-action@v6</code></a> builds <code>project-github-runner/Dockerfile.github</code> with context <code>project-github-runner</code> and pushes it to the tag computed earlier.</p>
</li>
<li><p><strong>Finalize infrastructure deployment</strong>: A second <code>azure/bicep-deploy@v2</code> step reruns the deployment of <code>infra/main.bicep</code> with the same parameters file and overrides, guaranteeing that the Container Apps job uses the freshly pushed image tag.</p>
</li>
<li><p><strong>Summary</strong>: The final step appends a summary to the GitHub Actions job log containing the environment, resource group, image tag, and the Container Apps job and environment IDs returned by the Bicep deployment.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763368629176/589ee2f3-b797-417f-a89d-b5a4cc3cc48d.png" alt class="image--center mx-auto" /></p>
<p>Required GitHub configuration for the workflow:</p>
<ul>
<li><p><strong>Azure OIDC</strong>: Secrets <code>AZURE_CLIENT_ID</code>, <code>AZURE_TENANT_ID</code>, <code>AZURE_SUBSCRIPTION_ID</code> for federated authentication.</p>
</li>
<li><p><strong>GitHub App metadata</strong>: Environment variables <code>GH_APP_ID</code> and <code>GH_APP_INSTALLATION_ID</code> (environment-level <code>vars</code> in GitHub) supplying the GitHub App ID and installation ID; see <a target="_blank" href="https://docs.github.com/en/apps/creating-github-apps/authenticating-with-a-github-app/authenticating-as-a-github-app-installation#generating-an-installation-access-token">Authenticating as a GitHub App installation</a> for ways to discover the installation ID.</p>
</li>
<li><p><strong>GitHub App private key</strong>: Environment secret <code>GH_APP_PRIVATE_KEY</code>, generated from the GitHub App settings per <a target="_blank" href="https://docs.github.com/en/apps/creating-github-apps/authenticating-with-a-github-app/managing-private-keys-for-github-apps">Managing private keys for GitHub Apps</a>. The Bicep deployment writes this value into Azure Key Vault so the job and KEDA scaler can authenticate without embedding secrets in templates.</p>
</li>
<li><p><strong>Optional PAT</strong>: A repository secret such as <code>GITHUB_PAT_RUNNER</code> remains supported for PAT-based deployments when you do not want to (or cannot) use a GitHub App.</p>
</li>
</ul>
<p>When dispatching manually you can override the parameters file (for example <code>project-github-runner/infra/parameters.prod.json</code> for a production deployment) or supply a custom image tag suffix; the <code>Resolve deployment configuration</code> step ensures the final image tag and parameter file are consistent with the chosen environment.</p>
<h2 id="heading-example-flow-putting-it-all-together">Example flow (putting it all together)</h2>
<p>In a typical run, you deploy the Bicep template to your resource group with the parameters that describe your environment, GitHub owner/repository, and scaling preferences. Once the deployment completes, you confirm that the Container Apps environment and job exist and that the job template has the expected environment variables and secret wiring. You then point a GitHub workflow at <code>runs-on: [self-hosted, azure-container-apps, &lt;your-labels&gt;]</code>, matching <code>&lt;your-labels&gt;</code> to what you configured on the runner.</p>
<p>From there, the experience is pleasantly boring: you queue a workflow, KEDA notices the queued job within roughly <code>pollingInterval</code> seconds, and ACA starts the necessary job executions. Runners register, process their assigned work, and exit. When there is no more work in the queue, subsequent polls result in zero executions and the job scales down to zero again.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763368666383/07208993-f199-45a1-a280-1f4d0c8e7783.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-troubleshooting-tips">Troubleshooting tips</h2>
<ul>
<li><p>No job executions: Verify scaler metadata (owner/scope/repos), confirm the GitHub App secret name in <code>scaleRuleAuth</code> aligns with the Key Vault-backed secretRef, and ensure the job identity retains <code>Key Vault Secrets User</code>. Labels in your workflow must match the runner configuration.</p>
</li>
<li><p>Image pull errors: Check ACR <code>AcrPull</code> assignment and that the <code>registries.identity</code> matches the enabled identity. Confirm ACR “authentication as ARM” status when using MI pulls.</p>
</li>
<li><p>Rate limiting: Reduce API calls via more selective <code>repos</code>, <code>enableEtags</code>, or switch to GitHub App auth.</p>
</li>
<li><p>Runner not registering: Confirm the container env includes <code>REGISTRATION_TOKEN_API_URL</code>, repository URL, <code>APP_ID</code>, and <code>APP_INSTALLATION_ID</code>, and that the Key Vault reference can resolve the private key (review logs for secret retrieval errors). If you revert to PAT fallback, regenerate the PAT and verify network egress to <code>https://api.github.com</code>.</p>
</li>
</ul>
<h2 id="heading-conclusion">Conclusion</h2>
<p>This project now represents a complete, opinionated pattern for running GitHub Actions workloads on Azure Container Apps jobs:</p>
<ul>
<li><p><strong>Infrastructure as code</strong>: A single composed Bicep template (<code>infra/main.bicep</code>) stands up the Container Apps environment, job, virtual network, Log Analytics, ACR, managed identities, and KEDA <code>github-runner</code> scaler with environment-specific parameter files.</p>
</li>
<li><p><strong>Hardened runner image</strong>: <code>Dockerfile.github</code> builds on the official GitHub Actions runner image and layers in PowerShell 7.4, Azure CLI, Bicep, and Az/Microsoft.Graph modules so typical cloud build/test/deploy pipelines work out of the box.</p>
</li>
<li><p><strong>GitHub App–first authentication</strong>: The default path uses a GitHub App with narrowly scoped permissions, Key Vault–backed private key storage, and a KEDA scaler that authenticates via the same app; PAT-based auth remains available as a fallback.</p>
</li>
<li><p><strong>Automated bootstrap workflow</strong>: <code>.github/workflows/bootstrap-infra.yml</code> ties everything together—building and pushing the runner image, deploying the Bicep template with the correct tag and GitHub App metadata, and emitting IDs you can plug into downstream automation.</p>
</li>
<li><p><strong>Ephemeral, scalable execution</strong>: Runners are created per job execution, register just long enough to process a workflow, and then exit. KEDA scales executions up and down based on queue depth so you only pay for active work while avoiding long-lived build agents.</p>
</li>
</ul>
<p>Taken together, the templates, scripts, and workflows in this repository give you a reusable starting point for production-ready self‑hosted runners: auditable, parameterized, and easy to adapt to additional environments, images, or GitHub organizations as your CI footprint grows. 🚀</p>
<h2 id="heading-references">References</h2>
<ul>
<li><p>My GitHub Repo: <a target="_blank" href="https://github.com/broberts23/vsCode/tree/main/project-github-runner">broberts23/vsCode/project-github-runner</a></p>
</li>
<li><p><a target="_blank" href="https://github.com/broberts23/vsCode/tree/main/project-github-runner">Jobs in Azure Container Apps: https://learn.micros</a>oft.com/azure/container-apps/jobs?tabs=azure-cli</p>
</li>
<li><p>Containers in Azure Container Apps (registries, MI): https://learn.microsoft.com/azure/container-apps/containers?tabs=bicep</p>
</li>
<li><p>Azure Container Apps environments (logs): https://learn.microsoft.com/azure/container-apps/environment?tabs=bicep#logs</p>
</li>
<li><p>Monitor logs in Azure Container Apps with Log Analytics: https://learn.microsoft.com/azure/container-apps/log-monitoring?tabs=bash</p>
</li>
<li><p>View log streams in Azure Container Apps: https://learn.microsoft.com/azure/container-apps/log-streaming?tabs=bash</p>
</li>
<li><p>Tutorial: GitHub Actions runners on ACA jobs: https://learn.microsoft.com/azure/container-apps/tutorial-ci-cd-runners-jobs?tabs=bicep&amp;pivots=container-apps-jobs-self-hosted-ci-cd-github-actions</p>
</li>
<li><p>KEDA GitHub runner scaler: https://keda.sh/docs/latest/scalers/github-runner/</p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Ephemeral Pull Request (PR) Environment with Microsoft Graph Bicep]]></title><description><![CDATA[Introduction
This project demonstrates an end-to-end ephemeral pull request (PR) environment pattern using:

Microsoft Graph Bicep to provision an application, service principal, security group, test accounts, custom OAuth2 permission scopes, and an ...]]></description><link>https://benroberts.io/ephemeral-pull-request-pr-environment-with-microsoft-graph-bicep</link><guid isPermaLink="true">https://benroberts.io/ephemeral-pull-request-pr-environment-with-microsoft-graph-bicep</guid><category><![CDATA[Azure]]></category><category><![CDATA[Entra ID]]></category><category><![CDATA[Powershell]]></category><category><![CDATA[github-actions]]></category><category><![CDATA[identity-management]]></category><dc:creator><![CDATA[Ben Roberts]]></dc:creator><pubDate>Sun, 09 Nov 2025 22:20:42 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1762722198334/d9d77659-e710-47ec-becd-53fca2c416a5.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-introduction">Introduction</h2>
<p>This project demonstrates an end-to-end ephemeral pull request (PR) environment pattern using:</p>
<ul>
<li><p>Microsoft Graph Bicep to provision an application, service principal, security group, test accounts, custom OAuth2 permission scopes, and an application role.</p>
</li>
<li><p>Azure resource deployment (Key Vault with RBAC, Storage, App Service Plan, Web App) as disposable per‑PR infrastructure.</p>
</li>
<li><p>.NET 8 Minimal API with standardized v2 JwtBearer authentication accepting both identifier URI and clientId audiences.</p>
</li>
<li><p>GitHub Actions OIDC workload identity federation (no client secrets) to deploy and test.</p>
</li>
<li><p>PowerShell automation for post‑deploy Graph operations (federated credential, app role assignment, ephemeral test users lifecycle).</p>
</li>
<li><p>Smoke tests validating Web App (service) readiness, role claim propagation, authenticated access, and data-plane RBAC for Key Vault and Storage.</p>
</li>
</ul>
<p>The design focus here is <em>NOT</em> deep functional testing of business endpoints, although the smoke tests could easily be extended to include more comprehensive API validation. Instead, it highlights how identity artifacts (scopes, roles, group membership, users) can be created and wired programmatically as part of an ephemeral environment. The only required application tests are:</p>
<ul>
<li><p>A role-gated <code>/healthz</code> endpoint (requires <code>Swagger.Admin</code> application role) proving role claim emission and authorization works.</p>
</li>
<li><p>An authenticated <code>/health</code> endpoint (accepts any valid v2 token for the application) proving basic bearer auth wiring works.</p>
</li>
</ul>
<p>Swagger/UI endpoints and scope-gated API surface are intentionally optional and can be explored manually with the generated test identities while a PR is open.</p>
<h2 id="heading-project-summary">Project summary</h2>
<p>This repository delivers a per‑PR ephemeral environment pattern that automates identity artifacts (application, service principal, scopes, roles, and a tester group) alongside disposable Azure resources (Key Vault, Storage, Web App). It standardizes on v2 Graph tokens for the Minimal API using a single JwtBearer scheme (valid audiences include the identifier URI and clientId). CI provisions the environment, deploys the API, runs smoke tests against the <code>/healthz</code> (role-gated) and <code>/health</code> (any authenticated token), and preserves results as artifacts. Resources remain until a PR is labeled <code>destroy</code>, at which point a gated tear-down removes the Graph objects and the Azure resource group.</p>
<h2 id="heading-repository-structure">Repository structure</h2>
<pre><code class="lang-plaintext">project-bicep-graph/
  bicepconfig.json
  blog.md
  README.md
  infra/
    main.bicep
    modules/
      appInfra.bicep
      identity.bicep
  scripts/
    Assign-AppRoleToGroup.ps1
    Cleanup-GraphEphemeral.ps1
    Create-TestUsers.ps1
    Delete-TestUsers.ps1
    GraphFederation.ps1
    SmokeTests.ps1
  src/
    WebApi/
      appsettings.json
      Program.cs
      WebApi.csproj
  tests/
    SmokeTests.Tests.ps1
</code></pre>
<h2 id="heading-scenarios-use-cases">Scenarios / use cases</h2>
<p>Below are practical, real-world scenarios where this pattern adds value:</p>
<ul>
<li><p>Feature-branch validation environments</p>
<ul>
<li>Spin up a complete stack per PR, run smoke tests, and keep the environment available for iterative commits until explicitly destroyed.</li>
</ul>
</li>
<li><p>Identity wiring and RBAC regression checks</p>
<ul>
<li>Prove role claim propagation (<code>Swagger.Admin</code> → <code>/healthz</code>) and audience/issuer correctness (<code>/health</code>) before merge; catch config drift early.</li>
</ul>
</li>
<li><p>Contract and SDK change verification</p>
<ul>
<li>Validate changes to Minimal API endpoints or OpenAPI contracts with generated clients; exercise <code>Swagger.Read</code> policy gating (optionally) without impacting shared envs.</li>
</ul>
</li>
<li><p>Dependency upgrade confidence</p>
<ul>
<li>Test .NET minor/patch upgrades, App Service runtime updates, Az CLI/PowerShell module bumps, and Graph API changes in isolation per PR.</li>
</ul>
</li>
<li><p>Cross-service integration tests</p>
<ul>
<li>Verify Key Vault and Storage data‑plane RBAC using the workload identity; ensure least‑privilege roles still allow the required operations.</li>
</ul>
</li>
<li><p>Secrets rotation rehearsal</p>
<ul>
<li>Rehearse secret or certificate rotation patterns (paired with a JIT RBAC activation workflow) and verify the app consumes new versions without downtime.</li>
</ul>
</li>
<li><p>Conditional Access and role policy previews</p>
<ul>
<li>Trial tenant policy changes that may affect service‑to‑service flows; confirm protected endpoints still authorize correctly with v2 tokens.</li>
</ul>
</li>
<li><p>Multi-tenant app hardening</p>
<ul>
<li>Exercise deterministic identifier URIs and dual accepted audiences (audience + clientId) to ensure consistent v2 auth in multi‑tenant setups.</li>
</ul>
</li>
<li><p>Performance smoke and cold‑start checks</p>
<ul>
<li>Measure first‑hit latency and basic throughput after deploy; compare over time as dependencies change.</li>
</ul>
</li>
<li><p>Chaos/resiliency drills (lightweight)</p>
<ul>
<li>Intentionally deny KV or Storage access (temporary RBAC change) to confirm the app and pipeline report actionable errors.</li>
</ul>
</li>
<li><p>PR demos and review sandboxes</p>
<ul>
<li>Provide reviewers with test users and a safe, isolated environment for manual exploration during the review window.</li>
</ul>
</li>
<li><p>Bug reproduction and fix validation</p>
<ul>
<li>Reproduce production issues in a throwaway env with the same identity wiring and app settings; validate fixes without risking shared dev/test.</li>
</ul>
</li>
<li><p>Bicep module canary testing</p>
<ul>
<li>Validate changes to shared modules (identity/infra) behind a PR; confirm outputs, RBAC assignments, and app settings are correct end‑to‑end.</li>
</ul>
</li>
<li><p>Workflow and OIDC trust changes</p>
<ul>
<li>Safely evolve GitHub Actions workflow steps (OIDC, artifact handling, smoke steps) and verify behavior in isolation before rolling to other repos.</li>
</ul>
</li>
<li><p>Testing team regression suites</p>
<ul>
<li>Provide QA teams with ephemeral test accounts and a disposable environment to run regression test suites; each PR gets fresh test users with known credentials and role assignments, ensuring repeatable and isolated test runs.</li>
</ul>
</li>
</ul>
<h2 id="heading-architecture-overview">Architecture overview</h2>
<p>Components</p>
<ul>
<li><p>Identity (Entra/Microsoft Graph Bicep beta): application, service principal, OAuth2 scopes (Swagger.Read/Write), and an app role (Swagger.Admin); tester security group.</p>
</li>
<li><p>Azure resources: Key Vault (RBAC), Storage (V2), App Service Plan (B1), Web App with app settings for <code>AzureAd__TenantId</code>, <code>AzureAd__Audience</code>, and <code>AzureAd__ClientId</code>.</p>
</li>
<li><p>Application: .NET 8 Minimal API; single v2 JwtBearer scheme; policies for <code>SwaggerAdmin</code>, <code>SwaggerRead</code>, and <code>AnyAuthenticated</code>.</p>
</li>
<li><p>Automation: PowerShell scripts for role assignment and test user lifecycle; GitHub Actions workflow for provision → smoke-tests → destroy.</p>
</li>
</ul>
<p>Flow (high level)</p>
<ol>
<li><p>PR opened/reopened → OIDC login → Bicep deploys identity + infra → Web API published via zip deploy → role assigned to tester group; optional test users created.</p>
</li>
<li><p>Smoke tests acquire a v2 token via <code>&lt;identifierUri&gt;/.default</code> (fallback to <code>&lt;clientId&gt;/.default</code>), call <code>/healthz</code> with the admin token and <code>/health</code> with an authenticated token, and validate KV/Storage access.</p>
</li>
<li><p>Artifacts (<code>env-outputs.json</code>, <code>test-users.json</code>, <code>smoke-results.json</code>) are uploaded; tokens are not decoded or persisted in results.</p>
</li>
<li><p>When the PR is labeled <code>destroy</code>, CI deletes test users, Graph objects (assignments, group, SP, app), and the resource group.</p>
</li>
</ol>
<h2 id="heading-goals-vs-nongoals">Goals vs Non‑Goals</h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Aspect</td><td>Goal</td><td>Non‑Goal</td></tr>
</thead>
<tbody>
<tr>
<td>Infrastructure</td><td>Rapid, reproducible PR environment spin-up &amp; tear down</td><td>Long-lived shared dev environment</td></tr>
<tr>
<td>Identity Automation</td><td>Programmatically create app, service principal, scopes, role, group, test users</td><td>End‑user production auth flows</td></tr>
<tr>
<td>Testing Scope</td><td>Validate auth wiring via <code>/healthz</code> (admin role) &amp; <code>/health</code> (any authenticated token)</td><td>Automated Swagger / scope matrix tests</td></tr>
<tr>
<td>App Roles</td><td>Demonstrate creation + assignment</td><td>Enforcing complex RBAC logic in code</td></tr>
<tr>
<td>Scopes</td><td>Show deterministic GUID-based scopes</td><td>Full delegated consent workflow automation</td></tr>
<tr>
<td>Users</td><td>Ephemeral test accounts for optional manual exploration</td><td>Persistent test directory population</td></tr>
</tbody>
</table>
</div><h2 id="heading-implementation">Implementation</h2>
<h3 id="heading-identity-layer-bicep-graph-beta">Identity Layer (Bicep Graph Beta)</h3>
<ul>
<li><p>Application + Service Principal created via <code>Microsoft.Graph/*@beta</code> resources.</p>
</li>
<li><p>Application identifier URI (audience) declared as <code>api://pr-&lt;prNumber&gt;-&lt;uniqueSuffix&gt;</code> deterministically; output as <code>appAudience</code> for token acquisition and API configuration.</p>
</li>
<li><p>Two OAuth2 permission scopes defined: <code>Swagger.Read</code>, <code>Swagger.Write</code> (deterministic IDs via <code>guid()</code> seeding).</p>
</li>
<li><p>One application role <code>Swagger.Admin</code> (allowed for User &amp; Application principals) with deterministic ID.</p>
</li>
<li><p>Security group <code>grp-&lt;pr&gt;-&lt;suffix&gt;-testers</code> created unconditionally for ephemeral test accounts; app is configured with <code>groupMembershipClaims: SecurityGroup</code> so role/group claims flow into user tokens (if requested interactively later).</p>
</li>
<li><p>Outputs: <code>appId</code> (clientId), <code>appObjectId</code>, <code>servicePrincipalObjectId</code>, <code>appAudience</code> (identifier URI), scope IDs, role ID, group display name and objectId.</p>
</li>
</ul>
<h3 id="heading-infrastructure-layer">Infrastructure Layer</h3>
<ul>
<li><p>Storage Account (Standard_LRS, StorageV2 kind) and Key Vault (RBAC permission model, soft delete enabled) deployed.</p>
</li>
<li><p>Web App (Minimal API .NET 8) + Basic App Service Plan (B1 tier).</p>
</li>
<li><p>RBAC role assignments:</p>
<ul>
<li><p>Service principal (API) granted "Key Vault Secrets User" on Key Vault.</p>
</li>
<li><p>GitHub runner service principal (OIDC workload identity) optionally granted "Key Vault Secrets User" for smoke test access.</p>
</li>
</ul>
</li>
<li><p>App settings: <code>AzureAd__TenantId</code> (subscription tenant), <code>AzureAd__Audience</code> (app audience URI from Bicep output), <code>AzureAd__ClientId</code> (application appId for secondary accepted audience).</p>
</li>
</ul>
<h3 id="heading-application-minimal-api-standardized-v2-authentication">Application (Minimal API) — Standardized v2 Authentication</h3>
<ul>
<li><p>Endpoints:</p>
<ul>
<li><p><code>/healthz</code> — role-protected heartbeat; requires <code>Swagger.Admin</code> role to authorize (verifies role assignment propagation in CI).</p>
</li>
<li><p><code>/health</code> — requires any authenticated bearer token issued by the tenant for this application (no scope enforcement in CI).</p>
</li>
<li><p><code>/api/mock</code> — requires <code>Swagger.Read</code> scope (demonstrates scope-based policy; not exercised automatically).</p>
</li>
<li><p><code>/swagger</code> redirect — protected by <code>Swagger.Admin</code> role (manual exploration only).</p>
</li>
</ul>
</li>
<li><p>Authentication pipeline (single JWT Bearer scheme) configured via environment variables: <code>AzureAd__TenantId</code>, <code>AzureAd__Audience</code>, <code>AzureAd__ClientId</code>.</p>
<ul>
<li><p>Uses v2 authority: <code>https://login.microsoftonline.com/&lt;tenantId&gt;/v2.0</code>.</p>
</li>
<li><p>Accepts either the identifier URI (<code>AzureAd__Audience</code>) or the <code>clientId</code> as valid audience (<code>ValidAudiences = [ audience, clientId ]</code>).</p>
</li>
<li><p>Explicit issuer validation (<code>https://login.microsoftonline.com/&lt;tenantId&gt;/v2.0</code>) prevents cross‑version token mismatches.</p>
</li>
<li><p>Policies:</p>
<ul>
<li><p><code>SwaggerAdmin</code> — role claim (<code>roles</code> or <code>http://schemas.microsoft.com/ws/2008/06/identity/claims/role</code>) contains <code>Swagger.Admin</code>.</p>
</li>
<li><p><code>SwaggerRead</code> — scope claim (<code>scp</code>) contains <code>Swagger.Read</code>.</p>
</li>
<li><p><code>AnyAuthenticated</code> — generic authenticated user (used for <code>/health</code>).</p>
</li>
</ul>
</li>
</ul>
</li>
<li><p>Rationale for v2-only standardization: avoids issuer/audience ambiguity between v1 (<code>https://sts.windows.net/...</code>) and v2 (<code>https://login.microsoftonline.com/...</code>), simplifies smoke testing token acquisition, and ensures consistent claim shape (e.g., consolidated <code>scp</code> multi‑space scope string).</p>
</li>
</ul>
<h3 id="heading-automation-scripts-scripts">Automation Scripts (<code>scripts/</code>)</h3>
<ul>
<li><p><code>Assign-AppRoleToGroup.ps1</code> — Assigns the <code>Swagger.Admin</code> app role to the tester group using Graph REST API. Supports group lookup by display name or direct objectId (preferred). Safely handles pagination and property existence checks to avoid Graph API filter limitations. Also assigns application permission to the runner SP if provided.</p>
</li>
<li><p><code>Create-TestUsers.ps1</code> — Generates ephemeral test users with aliasing pattern <code>pr&lt;PR_NUMBER&gt;tester&lt;index&gt;&lt;6hex&gt;</code>, sets SecureString passwords, adds each user to the tester group via Graph <code>/members/$ref</code>, outputs JSON with plaintext passwords for artifact (demo only; production should use Key Vault or avoid password-based auth).</p>
</li>
<li><p><code>Delete-TestUsers.ps1</code> — Cleans up users matching the PR prefix heuristic (mailNickname, displayName). Deletes users from directory; group membership implicitly removed.</p>
</li>
<li><p><code>Cleanup-GraphEphemeral.ps1</code> — Removes app role assignments (group + runner SP principals), deletes security group, service principal, and application. Uses client-side filtering to work around Graph filter limitations on relationship endpoints. Outputs JSON summary of deletion operations (type, id, status, error).</p>
</li>
<li><p><code>SmokeTests.ps1</code> — Dot-sourceable PowerShell module; <code>Invoke-EphemeralSmokeTests</code> function validates environment context, Key Vault access (Get-AzKeyVaultSecret), Storage access (Get-AzStorageAccountKey), and API endpoints (<code>/healthz</code> with admin role token, <code>/health</code> with any app token). Returns structured object; captures HTTP status codes for diagnostics and gracefully handles missing properties (stores null/error objects) under StrictMode.</p>
</li>
</ul>
<h3 id="heading-github-actions-workflow-githubworkflowsephemeral-envyml">GitHub Actions Workflow (<code>.github/workflows/ephemeral-env.yml</code>)</h3>
<ul>
<li><p>Jobs: <code>provision</code> → <code>smoke-tests</code> → <code>destroy</code>.</p>
</li>
<li><p><strong>Provision</strong>: Checks out repo; logs in via OIDC; resolves runner service principal objectId; creates resource group; deploys Bicep (identity + infrastructure); builds and publishes .NET 8 Minimal API; zips and deploys to App Service; assigns <code>Swagger.Admin</code> app role to tester group; creates ephemeral test users; uploads artifacts.</p>
</li>
<li><p><strong>Smoke Tests</strong>: Downloads infra outputs; acquires a v2 access token using <code>&lt;identifierUri&gt;/.default</code> (fallback to <code>&lt;clientId&gt;/.default</code> if needed) ensuring correct issuer/audience; sources <code>SmokeTests.ps1</code> and runs <code>Invoke-EphemeralSmokeTests</code> to validate <code>/healthz</code> (role-gated via admin token), <code>/health</code> (any authenticated token), Key Vault access (RBAC), and Storage access (RBAC); generates a concise summary (auth/config and API/resource checks); outputs structured JSON and preserves artifacts even on failure. Tokens are not printed or decoded in logs/artifacts.</p>
</li>
<li><p><strong>Destroy</strong>: Triggered only when <code>destroy</code> label is applied to the PR; downloads artifacts from <code>provision</code> job; logs in via OIDC; deletes test users; cleans up Graph objects (app role assignments, group, service principal, application); deletes resource group asynchronously.</p>
</li>
<li><p><strong>Artifacts</strong>: <code>env-outputs.json</code> (Bicep outputs), <code>test-users.json</code> (ephemeral account credentials), <code>smoke-results.json</code> (test results structure).</p>
</li>
<li><p><strong>Triggers &amp; Conditions</strong>:</p>
<ul>
<li><p>Runs on PR <code>opened</code>, <code>reopened</code>, and <code>labeled</code> events.</p>
</li>
<li><p><code>provision</code> and <code>smoke-tests</code> run only on PR open/reopen (<code>github.event.action != 'labeled'</code>).</p>
</li>
<li><p><code>destroy</code> runs only when "destroy" label is added (<code>github.event.action == 'labeled' &amp;&amp; contains(...labels...*.name, 'destroy')</code>).</p>
</li>
</ul>
</li>
<li><p><strong>PR Merge Protection</strong>: Configure branch protection rules in GitHub repository settings to require <code>provision</code> and <code>smoke-tests</code> status checks to pass before merging. This prevents merging until the pipeline completes successfully, ensuring resources are validated before integration. The pipeline can run independently of merge; resources persist until the <code>destroy</code> label is applied.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762723318307/e0cfef6c-5e6d-4730-8952-dadbaaabb54c.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-clarification-app-roles-amp-swagger-scopes-are-demonstrative">Clarification: App Roles &amp; Swagger Scopes Are Demonstrative</h2>
<p>The project intentionally provisions OAuth2 scopes and an app role to show how they can be:</p>
<ol>
<li><p>Declared with deterministic GUIDs in Bicep (ensuring stability across redeployment so consent/assignments remain valid), and</p>
</li>
<li><p>Assigned programmatically (role to group) post-deployment.</p>
</li>
</ol>
<p>However:</p>
<ul>
<li><p>CI pipelines do NOT depend on or validate <code>Swagger.Read</code> / <code>Swagger.Write</code> scopes today (scope policy is illustrative).</p>
</li>
<li><p>The <code>/api/mock</code> and <code>/swagger</code> endpoints are not part of the automated testing and are merely placeholders to demonstrate how the testing framework could be used.</p>
</li>
<li><p>Test users exist so a human reviewer (during the PR window) could optionally sign in and verify roles/scopes manually. They could also be used for user-based delegated claim auth validation.</p>
</li>
</ul>
<p>Automated testing focuses solely on:</p>
<ul>
<li><p>Role claim propagation for <code>Swagger.Admin</code> (<code>/healthz</code> access).</p>
</li>
<li><p>Basic bearer auth wiring (<code>/health</code> access) with v2 tokens.</p>
</li>
</ul>
<h2 id="heading-ephemeral-identity-lifecycle">Ephemeral Identity Lifecycle</h2>
<ul>
<li><p>Groups are always created (no conditional logic) simplifying downstream steps.</p>
</li>
<li><p>Test users are generated with prefix <code>pr&lt;PR_NUMBER&gt;tester</code> and appended random hex for uniqueness.</p>
</li>
<li><p>Users added to tester group enabling potential role claim emission if interactive delegated tokens are later acquired manually.</p>
</li>
<li><p>Tear down removes users and groups</p>
</li>
</ul>
<h2 id="heading-security-and-governance-considerations">Security and governance considerations</h2>
<ul>
<li><p>Authentication: Use GitHub Actions OIDC for CI login (no secrets). Azure OIDC guidance: https://learn.microsoft.com/azure/developer/github/connect-from-azure-openid-connect</p>
</li>
<li><p>Token hygiene: Tokens are used only in-memory to call protected endpoints; token contents are not written to <code>smoke-results.json</code> or job summaries.</p>
</li>
<li><p>Least privilege: Prefer Key Vault Secrets User over broader roles; scope assignments to only what smoke tests require.</p>
</li>
<li><p>RBAC model: Key Vault uses the RBAC permission model (<code>enableRbacAuthorization=true</code>) to align with identity-first, secretless automation.</p>
</li>
<li><p>Governance: Protect main branches by requiring <code>provision</code> and <code>smoke-tests</code> checks. Use Destroy label to gate cleanup and preserve artifacts for audit.</p>
</li>
<li><p>Secrets: Test user passwords are demo-only; in production, store in Key Vault or avoid password-based flows entirely.</p>
</li>
<li><p>Preview note: Microsoft Graph Bicep beta types are subject to change; pin tooling versions and validate in a test tenant first.</p>
</li>
</ul>
<h2 id="heading-demo-walk-through-end-to-end-pr-flow">Demo / walk-through — end-to-end PR flow</h2>
<ol>
<li><p>Open or reopen a PR.</p>
<ul>
<li><p>CI logs into Azure via OIDC and deploys Bicep (identity + infra).</p>
</li>
<li><p>Builds and zips the Minimal API, deploys to the Web App.</p>
</li>
<li><p>Assigns <code>Swagger.Admin</code> app role to the tester group; optionally creates ephemeral test users and uploads artifacts.</p>
</li>
</ul>
</li>
<li><p>Smoke tests run.</p>
<ul>
<li><p>Acquire a v2 token for <code>&lt;identifierUri&gt;/.default</code> (fallback to <code>&lt;clientId&gt;/.default</code>).</p>
</li>
<li><p>Call <code>/healthz</code> with the admin token; call <code>/health</code> with an authenticated token.</p>
</li>
<li><p>Validate Key Vault and Storage access via Azure AD; upload <code>smoke-results.json</code> (no token contents).</p>
</li>
</ul>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762724928796/287af813-ba4c-48b9-8a7d-c2c0f7cb86a2.png" alt class="image--center mx-auto" /></p>
<ol start="3">
<li>Review results in the PR summary and artifacts. Merge when checks pass.</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762726712780/954f3160-15f2-4bc9-ad14-a81b85a42f33.png" alt class="image--center mx-auto" /></p>
<ol start="3">
<li><p>Apply the Destroy label when you’re done.</p>
<ul>
<li>CI deletes test users, revokes app role assignments, deletes the tester group, service principal, application, and the resource group.\</li>
</ul>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762724881322/b980ea1c-c3c4-49d4-a39d-73a942cb0efc.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-potential-future-enhancements">Potential Future Enhancements</h2>
<ul>
<li><p>Introduce TTL scan job to clean orphaned PR resource groups/users if a workflow run is interrupted or "Destroy" label is never applied.</p>
</li>
<li><p>Encrypt or secret-manage test user credentials (Key Vault + GitHub OIDC retrieval) for improved hygiene; avoid emitting plaintext to artifacts.</p>
</li>
<li><p>Pester unit tests for PowerShell scripts (Mock Graph calls, test error paths).</p>
</li>
<li><p>Conditional destroy (not just label-driven) to clean up on PR merge/close as fallback.</p>
</li>
</ul>
<h2 id="heading-conclusion">Conclusion</h2>
<p>This project emphasizes identity resource automation (app, SP, scopes, roles, group, users) and ephemeral infrastructure rather than broad functional API test coverage. The minimal smoke test (<code>/healthz</code> + authenticated <code>/health</code>) gives just enough validation that identity wiring and deployment succeeded. Everything else—the Swagger-related scopes, role policies, and test users—serves as demonstrative scaffolding that reviewers can manually exercise during a PR’s lifetime.</p>
<h2 id="heading-key-references">Key References</h2>
<ul>
<li><p>Graph app/service principal Bicep (beta): https://learn.microsoft.com/graph/templates/bicep/reference/applications?view=graph-bicep-beta</p>
</li>
<li><p>Azure OIDC federation (GitHub): https://learn.microsoft.com/azure/developer/github/connect-from-azure-openid-connect</p>
</li>
<li><p>Access tokens &amp; claims: https://learn.microsoft.com/azure/active-directory/develop/access-tokens</p>
</li>
<li><p>Key Vault RBAC: https://learn.microsoft.com/azure/key-vault/general/rbac-guide</p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Enhance Workload Identity Security with Just-In-Time RBAC in GitHub Actions and PowerShell]]></title><description><![CDATA[Introduction
This post was born from the idea of, "gee I wonder if you could create a Just-In-Time (JIT) privileged access workflow using GitHub Actions?" After some experimentation and much debugging, the answer is yes 😆 — and the resulting pattern...]]></description><link>https://benroberts.io/enhance-workload-identity-security-with-just-in-time-rbac-in-github-actions-and-powershell</link><guid isPermaLink="true">https://benroberts.io/enhance-workload-identity-security-with-just-in-time-rbac-in-github-actions-and-powershell</guid><category><![CDATA[Azure]]></category><category><![CDATA[Powershell]]></category><category><![CDATA[Entra ID]]></category><category><![CDATA[github-actions]]></category><dc:creator><![CDATA[Ben Roberts]]></dc:creator><pubDate>Fri, 07 Nov 2025 06:38:06 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1762469113715/a2f48477-6f65-4e12-a10d-c076a5e24dc6.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-introduction">Introduction</h2>
<p>This post was born from the idea of, "gee I wonder if you could create a Just-In-Time (JIT) privileged access workflow using GitHub Actions?" After some experimentation and much debugging, the answer is yes 😆 — and the resulting pattern is a powerful way to give automation identities just‑enough, just‑in‑time access to perform high‑value, sensitive tasks while preserving auditability and minimizing standing privilege.</p>
<p>This post demonstrates a practical, secure pattern for integrating PIM-like (Privileged Identity Management) functionality into a CI/CD pipeline so that automation — for example, a GitHub Actions runner — can request, use, and then release elevated privileges in a repeatable, auditable way.</p>
<p>Traditional Entra ID Privileged Identity Manged (PIM) activation is useful for ad-hoc human tasks, but they aren't supported for Workload Identities (at the time of writing). A programmatic PIM activation flow using Microsoft Graph and identity-first automation brings several tangible benefits:</p>
<ul>
<li><p>Repeatability and reproducibility: the activation → perform → revoke sequence is defined as code and can be replayed across tenants and environments.</p>
</li>
<li><p>CI/CD integration: pipelines can request just‑in‑time privileges for a specific deployment run and attach PR/build metadata for clear traceability.</p>
</li>
<li><p>Principle of least privilege: request the smallest role and shortest duration necessary for the job instead of maintaining standing privileges.</p>
</li>
<li><p>Policy-as-code and testability: activation flows can be reviewed in pull requests, linted, and run through CI tests before applying changes to production.</p>
</li>
<li><p>Audit and compliance: workflows produce machine-readable activation records that can be stored with build artifacts or forwarded to SIEM for longer retention.</p>
</li>
<li><p>Faster recovery and consistent rollback: automation can detect activation failures and run deterministic rollback or remediation flows.</p>
</li>
</ul>
<p>This repository contains a working scaffold that illustrates the pattern: PowerShell modules that uses Graph/Azure authentication patterns, a small wrapper script to request activation's from a runner, and a GitHub Actions workflow that demonstrates an approval-gated activation.</p>
<h2 id="heading-project-summary">Project summary</h2>
<p>The implementation in this repository is geared toward CI-driven automation (GitHub Actions) and contains a few pragmatic decisions you'll see reflected throughout the module and workflow:</p>
<ul>
<li>Automation-first JIT: for non-human actors (managed identities or OIDC workload identities) the flow creates a temporary, scoped Azure RBAC assignment, performs the privileged operation, then removes the assignment immediately—capturing structured metadata (vault, rotated secrets, timestamps) for audit and traceability.</li>
</ul>
<p>I’ve covered using OIDC to authenticate Github to Azure in depth in previous blogs. You can find more information on how to set up OIDC authentication in parts 4 and 5 of <a target="_blank" href="https://benroberts.io/azure-mlops-challenge-blog-index">Azure MLOps Challenge Blog</a></p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://benroberts.io/azure-mlops-challenge-blog-index">https://benroberts.io/azure-mlops-challenge-blog-index</a></div>
<p> </p>
<ul>
<li><p>Vault‑wide secret rotation: secret rotation is implemented at the vault level. <code>Set-PimKeyVaultSecret</code> enumerates all secrets in a vault with <code>Get-AzKeyVaultSecret -VaultName</code>, rotates each secret with <code>Set-AzKeyVaultSecret</code> and returns an array of rotation result objects (vault, secret, rotatedAt, secretVersion) so CI can act on and report every change.</p>
</li>
<li><p>Clear CI reporting: the module includes <code>Write-PimSecretSummary</code>, which appends a compact Markdown table of rotated secrets to the file indicated by <code>GITHUB_STEP_SUMMARY</code> so runs that perform rotations display a readable summary in the Actions UI.</p>
</li>
<li><p>Robust RBAC cleanup: removal code was hardened to use supported <code>Remove-AzRoleAssignment</code> parameter sets (preferring <code>-InputObject</code> and falling back to the objectId+roleDefinitionId+scope set) and to defensively handle differences in PSRoleAssignment object shapes across Az versions.</p>
</li>
<li><p>Quiet, predictable CI logs: module imports of <code>Az.*</code> are performed with verbose output suppressed (temporary <code>$VerbosePreference</code> change plus <code>-Verbose:$false</code>) so the Actions log focuses on the steps and results rather than import chatter.</p>
</li>
<li><p>Caller ergonomics: lifecycle functions and the run script return machine‑friendly objects and were updated to accept the multi‑secret rotation return shape so downstream jobs and artifact writers can consume rotation metadata programmatically.</p>
</li>
</ul>
<p>Read on for how the pieces fit together and how to validate the flow in a test subscription.</p>
<h2 id="heading-repository-structure">Repository structure</h2>
<p>Key folders and files and how they fit together (paths relative to the repository root):</p>
<ul>
<li><p><code>project-jit-pim/scripts/</code></p>
<ul>
<li><p><code>project-jit-pim/scripts/PimAutomation.psm1</code> — PowerShell 7.4 module that encapsulates Graph and RBAC logic. Key functions:</p>
<ul>
<li><p><code>Get-GraphAccessToken</code> — prefers Azure CLI token when available; supports interactive Graph login in dev.</p>
</li>
<li><p><code>Connect-PimGraph</code> — normalizes Graph connection with the token.</p>
</li>
<li><p><code>Invoke-PimGraphRequest</code> — Graph v1.0/beta compatibility wrapper for GET/POST/PATCH/DELETE.</p>
<ul>
<li><code>Set-PimAzContext</code> — establishes Az PowerShell context using OIDC federated token or managed identity; required for RBAC and Key Vault operations. Imports of <code>Az.*</code> modules are intentionally performed with verbose output suppressed to keep CI logs clean.</li>
</ul>
</li>
<li><p><code>Resolve-PimRoleResourcePairs</code> — robust pairing of roleIds/resourceIds (zip, one-to-many, or Cartesian product).</p>
</li>
<li><p><code>Set-PimKeyVaultSecret</code> — enumerates all secrets in a vault using <code>Get-AzKeyVaultSecret -VaultName</code> and rotates each secret using <code>Set-AzKeyVaultSecret</code>. The function includes Forbidden-aware retry/backoff to tolerate short RBAC propagation delays and returns an array of rotation result objects for reporting.</p>
</li>
<li><p><code>Write-PimSecretSummary</code> — new helper that appends a Markdown table of rotated secrets to the <code>GITHUB_STEP_SUMMARY</code> file (when the environment variable is present), making a concise summary visible in GitHub Actions UI.</p>
</li>
<li><p><code>New-TemporaryKeyVaultRoleAssignment</code> / <code>Remove-TemporaryKeyVaultRoleAssignment</code> — creates/removes a scoped RBAC assignment on the Key Vault. <code>Remove-TemporaryKeyVaultRoleAssignment</code> was hardened to use supported <code>Remove-AzRoleAssignment</code> parameter sets and defensively handle different property shapes returned by <code>Get-AzRoleAssignment</code> across Az versions.</p>
</li>
<li><p><code>Invoke-TempKeyVaultRotationLifecycle</code> — orchestrates create → rotate secret → delete, and validates removal; used by CI.</p>
</li>
<li><p><code>project-jit-pim/scripts/run-activation.ps1</code> — Entry point for the workflow. Imports the module, determines whether the target is a Key Vault, and runs <code>Invoke-TempKeyVaultRotationLifecycle</code> when appropriate. Emits structured JSON for the pipeline and relies on <code>ASSIGNEE_OBJECT_ID</code> from the workflow environment.</p>
</li>
</ul>
</li>
</ul>
</li>
<li><p><code>.github/workflows/</code></p>
<ul>
<li><code>.github/workflows/pim-elevate.yml</code> — Reusable workflow that collects role/resource inputs, builds an approval table (via <code>project-jit-pim/scripts/build-approval.ps1</code>), blocks on a GitHub Environment gate, then runs the privileged step via <code>project-jit-pim/scripts/run-activation.ps1</code>.</li>
</ul>
</li>
<li><p><code>project-jit-pim/infra/</code></p>
<ul>
<li><code>project-jit-pim/infra/main.bicep</code> — Demo infrastructure to support the JIT scenario: provisions a user-assigned managed identity (UAMI), an RBAC-enabled Key Vault, a Microsoft Entra application and service principal using the Microsoft Graph Bicep extension, and a resource group–scoped role assignment that grants the service principal User Access Administrator at the RG scope. Emits outputs (vault name/id, identity clientId/principalId/id, Graph appId/SP id, storage account id/name) for wiring into CI.</li>
</ul>
</li>
</ul>
<h2 id="heading-scenarios-and-use-cases">Scenarios and use cases</h2>
<p>Below are practical scenarios where a CI runner (or other automation) would request a PIM-like activation via Microsoft Graph. Each example explains why a JIT activation is preferable to a permanent role assignment.</p>
<ol>
<li><p>Privileged infrastructure deployments</p>
<ul>
<li>Pipelines that need to modify RBAC, create or update management groups or subscriptions, or perform owner-level deployments. A JIT activation grants only the needed privileges for the deployment window and records the build/PR that requested it.</li>
</ul>
</li>
<li><p>Emergency or hotfix changes in production</p>
<ul>
<li>Automated runbooks that apply urgent network or configuration changes during incidents. JIT allows automation to act quickly while keeping the elevated window short and auditable.</li>
</ul>
</li>
<li><p>Secrets, certificate, or key rotation tasks</p>
<ul>
<li>Workflows that rotate Key Vault secrets, service principal certificates or subscription keys. Because these operations are high-sensitivity, they should run with time-limited elevation rather than a long-lived owner/service principal.</li>
</ul>
</li>
<li><p>High-impact configuration changes</p>
<ul>
<li>Actions like scaling a managed database, changing VM scale set properties, or upgrading infrastructure control-plane settings. These are low-frequency but risky operations — a JIT window reduces blast radius.</li>
</ul>
</li>
<li><p>Onboard/offboard flows that require temporary promotion</p>
<ul>
<li>Lifecycle workflows that temporarily promote a user or test identity to verify on‑boarding or off‑boarding steps (for example: run verification tests as an eligible admin then remove privileges automatically).</li>
</ul>
</li>
<li><p>Incident-response diagnostics and remediation</p>
<ul>
<li>Automated IR playbooks that need to query sensitive logs, alter firewall rules, or apply containment steps. JIT activations enable automation to remediate rapidly while ensuring short-lived privileges and full audit trails.</li>
</ul>
</li>
<li><p>Governance and testing of access‑review flows</p>
<ul>
<li>CI jobs that exercise access reviews, entitlement-management or PIM workflows in a test tenant by temporarily elevating a test identity. Useful for continuous verification of governance automation.</li>
</ul>
</li>
<li><p>Multi-tenant or partner delegated operations</p>
<ul>
<li>Managed services that perform privileged operations across customer tenants or external environments can request scoped JIT activations per-customer rather than holding standing privileges across all tenants.</li>
</ul>
</li>
<li><p>Identity configuration changes</p>
<ul>
<li>Deployments that create or update app registrations, add federated credentials, or change conditional access policies should run under a temporary, audited elevation rather than a permanent global admin assignment.</li>
</ul>
</li>
<li><p>Scheduled maintenance and controlled windows</p>
<ul>
<li>Scheduled jobs that perform approved maintenance during a defined window can programmatically request elevation (including ticket/maintenance ID justification) so the run is traceable and constrained to the maintenance period.</li>
</ul>
</li>
</ol>
<p>Each scenario benefits from: scoped, time-limited access; machine-readable justification and metadata (PR/build IDs); and an auditable activation lifecycle that can be retained in CI artifacts or forwarded to SIEM and compliance tooling.</p>
<h2 id="heading-benefits-vs-permanent-assignments">Benefits vs Permanent assignments</h2>
<ul>
<li>Minimized attack surface (short TTL)</li>
<li>Tighter auditability (attach PR/build/ticket metadata)</li>
<li>Easier compliance evidence (machine readable)</li>
<li>Predictable rollback &amp; revoke patterns</li>
</ul>
<h2 id="heading-background-and-motivation">Background and motivation</h2>
<p>Traditional automation patterns often rely on a broadly privileged service principal with a long-lived secret. That’s convenient but risky: secrets leak, and standing privileges widen blast radius. PIM shifts that posture to “trust, but timebound”: elevation requires a specific reason, duration, and an approver. For non-human actors (CI/CD), the right approximation is to programmatically create a temporary role assignment guarded by an approval gate, then remove it. This keeps privileges short-lived and auditable while still enabling fully automated flows (no portal clicks).</p>
<p>Compared with static service principals:</p>
<ul>
<li><p>No long-lived secret material is required when you use OIDC or managed identity.</p>
</li>
<li><p>Privileges are granted just-in-time and automatically revoked.</p>
</li>
<li><p>Every activation is traceable to a PR or run — easier audits and incident reconstruction.</p>
</li>
</ul>
<h2 id="heading-architecture-overview">Architecture overview</h2>
<p>Actors</p>
<ul>
<li><p>CI runner (GitHub Actions): requests elevation, renders an approval table, waits for approval (GitHub Environment), and executes privileged operations once approved.</p>
</li>
<li><p>Automation identity: either a user-assigned managed identity (UAMI) deployed by Bicep for demos, or an OIDC-federated workload identity for production pipelines.</p>
</li>
<li><p>Azure Resource Manager (ARM): enforces role assignments and scopes; Key Vault is configured with RBAC data-plane permissions.</p>
</li>
<li><p>Approver(s): GitHub Environment approvers; optionally, an automated approver (Function/Logic App) or notification to external systems like Teams/Slack can be integrated later.</p>
</li>
</ul>
<p>Flow (high level)</p>
<ol>
<li><p>CI job starts, logs into Azure via OIDC, and gathers requested role/resource pairs.</p>
</li>
<li><p>CI builds a Markdown approval table (role names, resource names) and posts it to the run; job waits for GitHub Environment approval.</p>
</li>
<li><p>On approval, CI executes a lifecycle function that:</p>
<ul>
<li><p>Creates a temporary Key Vault–scoped role assignment for the automation identity (for example, Key Vault Secrets Officer),</p>
</li>
<li><p>Performs the privileged action (e.g., rotate a secret),</p>
</li>
<li><p>Removes the temporary role assignment and validates removal.</p>
</li>
</ul>
</li>
<li><p>CI records structured output (vault, rotated secrets, timestamps, secret version) as artifacts.</p>
</li>
</ol>
<h2 id="heading-implementation">Implementation</h2>
<ul>
<li><p>Infrastructure: A Bicep template provisions a demo Key Vault with RBAC, a user-assigned managed identity, and emits helpful outputs (clientId, principalId, resource ids). See <code>infra/main.bicep</code>.</p>
</li>
<li><p>Authentication: OIDC in GitHub Actions for the workflow; for local tests, you can log in interactively. Avoid long-lived secrets.</p>
</li>
<li><p>PowerShell automation: module <code>scripts/PimAutomation.psm1</code> implements a Graph wrapper (v1.0/beta compatibility) and the lifecycle for temporary RBAC assignments and secret rotation.</p>
</li>
<li><p>CI/CD workflow: reusable workflow <code>.github/workflows/pim-elevate.yml</code> accepts roleIds/resourceIds, builds an approval table, blocks on environment approval, then runs <code>scripts/run-activation.ps1</code> to execute the lifecycle.</p>
</li>
<li><p>Approval automation (optional): future enhancement — an Azure Function approver for certain low-risk policies.</p>
</li>
</ul>
<h2 id="heading-demo-infrastructure-project-jit-piminframainbicep">Demo infrastructure: <code>project-jit-pim/infra/main.bicep</code></h2>
<p>What it deploys</p>
<ul>
<li><p>Storage account (for a simple demo resource)</p>
</li>
<li><p>User-assigned managed identity (UAMI) for automation</p>
</li>
<li><p>Azure Key Vault configured with the RBAC permission model (<code>enableRbacAuthorization: true</code>)</p>
</li>
<li><p>Microsoft Entra application and service principal via the Microsoft Graph Bicep extension</p>
</li>
<li><p>A role assignment at the resource group scope that grants the service principal the built-in User Access Administrator role (roleDefinitionId GUID <code>18d7d88d-d35e-4fb5-a5c3-7773c20a72d9</code>) to create/remove role assignments within the RG</p>
</li>
</ul>
<p>Key details reflected in the template</p>
<ul>
<li><p>Graph extension imports at top of file:</p>
<ul>
<li><p><code>extension graphV1</code></p>
</li>
<li><p><code>extension graphBeta</code></p>
</li>
</ul>
</li>
<li><p>Graph resources declared using <code>Microsoft.Graph/*@beta</code> types for <code>applications</code> and <code>servicePrincipals</code>.</p>
</li>
<li><p>Role assignment uses deterministic name via <code>guid(...)</code>, <code>principalId: ghSp.id</code>, and subscription-scoped roleDefinitionId built with <code>subscriptionResourceId('Microsoft.Authorization/roleDefinitions', userAccessAdminRoleId)</code>.</p>
</li>
<li><p>Key Vault enables RBAC model for data-plane operations to align with this JIT pattern.</p>
</li>
</ul>
<p>Outputs (used by CI/workflows)</p>
<ul>
<li><p>Key Vault: <code>keyVaultName</code>, <code>keyVaultId</code></p>
</li>
<li><p>UAMI: <code>userIdentityClientId</code>, <code>userIdentityPrincipalId</code>, <code>userIdentityResourceId</code></p>
</li>
<li><p>Graph: <code>githubAppId</code>, <code>githubServicePrincipalId</code>, <code>githubServicePrincipalResourceId</code></p>
</li>
<li><p>Storage: <code>storageAccountId</code>, <code>storageAccountName</code></p>
</li>
</ul>
<p>Quick deploy and validate</p>
<ul>
<li>Ensure your Bicep CLI is up to date and the Graph Bicep extensions are enabled via <code>bicepconfig.json</code> (see next section).</li>
</ul>
<pre><code class="lang-bash">az bicep upgrade

<span class="hljs-comment"># Create a demo resource group (choose a location)</span>
az group create --name MyDemoRG --location eastus

<span class="hljs-comment"># Validate the template</span>
az deployment group validate \
    --resource-group MyDemoRG \
    --template-file project-jit-pim/infra/main.bicep \
    --parameters location=eastus

<span class="hljs-comment"># Deploy</span>
az deployment group create \
    --resource-group MyDemoRG \
    --template-file project-jit-pim/infra/main.bicep \
    --parameters location=eastus
</code></pre>
<p>Note: The role used in the demo (<code>User Access Administrator</code>) is convenient for a JIT RBAC demo because it can create/delete role assignments. In production, choose the least-privileged role and scope that fits your policy. Built-in roles reference: https://learn.microsoft.com/azure/role-based-access-control/built-in-roles</p>
<h2 id="heading-using-microsoft-graph-resources-in-bicep-beta">Using Microsoft Graph resources in Bicep (beta)</h2>
<p>You can author Entra resources (applications / service principals) directly from Bicep using the Microsoft Graph Bicep templates. This is a preview extensibility feature in Bicep and requires two things:</p>
<ul>
<li><p>enabling Bicep extensibility in your repo-level <code>bicepconfig.json</code>, and</p>
</li>
<li><p>importing the Microsoft Graph extension in any Bicep file that declares <code>Microsoft.Graph/*</code> resources.</p>
</li>
</ul>
<p>What you need</p>
<ul>
<li><p>A recent Bicep CLI (or Azure CLI with the Bicep command) and the VS Code Bicep extension.</p>
</li>
<li><p>A repo-level <code>bicepconfig.json</code> that enables the extensibility feature and maps extension aliases to the registry artifacts. Ensure <code>extensibility</code> is enabled. Example:</p>
</li>
</ul>
<pre><code class="lang-json">{
    <span class="hljs-attr">"experimentalFeaturesEnabled"</span>: {},
    <span class="hljs-attr">"extensions"</span>: {
        <span class="hljs-attr">"graphV1"</span>: <span class="hljs-string">"br:mcr.microsoft.com/bicep/extensions/microsoftgraph/v1.0:1.0.0"</span>,
        <span class="hljs-attr">"graphBeta"</span>: <span class="hljs-string">"br:mcr.microsoft.com/bicep/extensions/microsoftgraph/beta:1.0.0"</span>
    }
}
</code></pre>
<p>See the Bicep configuration docs for details: https://learn.microsoft.com/azure/azure-resource-manager/bicep/bicep-config and the experimental features overview: https://github.com/Azure/bicep/blob/main/docs/experimental-features.md</p>
<p>Importing the Graph extension</p>
<p>Once <code>bicepconfig.json</code> is present and extensibility is enabled, import the Graph extensions at the top of your Bicep file using the aliases defined in <code>bicepconfig.json</code>:</p>
<pre><code class="lang-plaintext">// imports by alias (preferred when you've mapped the extension in bicepconfig.json)
extension graphV1
extension graphBeta

After the extension import, you can declare Microsoft Graph types in Bicep. Example (beta) — consult the Graph Bicep reference for available properties and schema:

```bicep
resource ghApp 'Microsoft.Graph/applications@beta' = {
  uniqueName: toLower('pim-github-app-${uniqueString(resourceGroup().id)}')
  displayName: 'pim-github-oidc-app-${uniqueString(resourceGroup().id)}'
  signInAudience: 'AzureADMyOrg'
}

resource ghSp 'Microsoft.Graph/servicePrincipals@beta' = {
  appId: ghApp.appId
  displayName: ghApp.displayName
}
</code></pre>
<p>Reference: Microsoft Graph Bicep templates — https://learn.microsoft.com/graph/templates/bicep/reference/serviceprincipals?view=graph-bicep-beta</p>
<p>Validate and deploy</p>
<p>See the quick commands in the previous section for validation and deployment to a resource group.</p>
<p>Troubleshooting and fallback strategy</p>
<ul>
<li><p>If you get "resource type is not valid" or similar validation errors, ensure:</p>
<ul>
<li><p>Bicep CLI and the VS Code Bicep extension are up to date.</p>
</li>
<li><p><code>bicepconfig.json</code> is in the repository root (or a parent folder) and contains <code>"experimentalFeaturesEnabled": { "extensibility": true }</code>.</p>
</li>
<li><p>The <code>extension</code> import line appears before any <code>Microsoft.Graph/*</code> resource declarations.</p>
</li>
</ul>
</li>
<li><p>Some developer or CI environments may still lack the provider metadata required to author Graph types (the extensibility path is still evolving). If your environment cannot use the Graph Bicep extension, you can adapt the template to accept <code>githubAppId</code> and <code>githubServicePrincipalId</code> as parameters and only create the role assignment when the service principal id is supplied. This lets teams either:</p>
<ol>
<li><p>Create the app/service principal ahead of time (via CLI or Graph APIs) and pass the returned <code>appId</code>/<code>principalId</code> into the Bicep deployment, or</p>
</li>
<li><p>Enable Bicep extensibility in their dev/CI images and let the template create the app/SP directly.</p>
</li>
</ol>
</li>
</ul>
<p>Related discussion and issues: https://github.com/Azure/bicep/issues/16447</p>
<h2 id="heading-github-actions-workflow-githubworkflowspim-elevateyml">GitHub Actions workflow: <code>.github/workflows/pim-elevate.yml</code></h2>
<p>Purpose: provide a generic, approval-gated JIT elevation workflow reusable across repos and branches.</p>
<p>Inputs (from the caller):</p>
<ul>
<li><p><code>roleIds</code> — JSON array of role definition GUIDs to request (e.g., Key Vault Secrets Officer).</p>
</li>
<li><p><code>resourceIds</code> — JSON array of Azure resource IDs to scope the request/assignment (e.g., Key Vault resource ID).</p>
</li>
<li><p><code>vaultName</code>, <code>secretName</code> — Optional; if provided, the workflow will perform a secret rotation using the lifecycle function after approval.</p>
</li>
</ul>
<p>Jobs and flow:</p>
<ol>
<li><p>request-elevation</p>
<ul>
<li><p>Auth: Logs into Azure using OIDC (federated credentials) so az lookups can run without secrets.</p>
</li>
<li><p>Pairing logic: Builds pairs of roleId/resourceId with the following rules:</p>
<ul>
<li><p>If arrays are equal length → zip items by index.</p>
</li>
<li><p>If one array has length 1 and the other &gt;1 → pair the single item across all items of the other array.</p>
</li>
<li><p>Otherwise → produce a Cartesian product (all combinations).</p>
</li>
</ul>
</li>
<li><p>Reverse lookups + approval table: Implemented by <code>project-jit-pim/scripts/build-approval.ps1</code> — resolves human-readable names/types and renders a Markdown table (IDs and names wrapped in inline code; pipes/backticks/newlines escaped). The table is posted as a comment (for PRs) or added to the job summary.</p>
</li>
<li><p>Gate: The job emits metadata and ends; the next job is gated by a GitHub Environment (for example, <code>pim-rotation-approval</code>).</p>
</li>
</ul>
</li>
<li><p>approve-and-rotate</p>
<ul>
<li><p>Gate: Requires the GitHub Environment approval. Approvers can review the comment table before proceeding.</p>
<ul>
<li>Another option is to implement approval in the pull request itself (for example, require a specific label or review) and skip the Environment gate. This is left as an exercise for the reader.</li>
</ul>
</li>
<li><p>Auth: Logs into Azure using OIDC for RBAC operations.</p>
</li>
<li><p>Action: Resolves role/resource pairs via the module, selects the first pair for the demo, and runs <code>project-jit-pim/scripts/run-activation.ps1</code> which orchestrates <code>Invoke-TempKeyVaultRotationLifecycle</code> to create the temporary Key Vault assignment → rotate the secret → remove the assignment → validate removal.</p>
</li>
<li><p>Outputs: Emits structured JSON and can publish artifacts with rotation details.</p>
</li>
</ul>
</li>
</ol>
<p>Requirements:</p>
<ul>
<li><p>A GitHub Environment configured with approvers (e.g., <code>pim-rotation-approval</code>).</p>
</li>
<li><p>Federated identity (OIDC) configured in Azure with minimal RBAC to create/delete role assignments at the scopes you target.</p>
</li>
<li><p>The target Key Vault must be RBAC-enabled (<code>enableRbacAuthorization: true</code>).</p>
</li>
</ul>
<p>Notes:</p>
<ul>
<li><p>The workflow is designed to be reusable; use the demo caller as a reference for input wiring.</p>
</li>
<li><p>For managed identities, PIM eligibility is not applicable; the workflow relies on temporary RBAC during the approved window and records rotation metadata for traceability.</p>
</li>
</ul>
<h2 id="heading-powershell-module-project-jit-pimscriptspimautomationpsm1">PowerShell module: <code>project-jit-pim/scripts/PimAutomation.psm1</code></h2>
<p>Design goals:</p>
<ul>
<li><p>PowerShell 7.4, cross-platform, non-interactive by default in CI.</p>
</li>
<li><p>Testability: Supports an environment flag to skip live Graph calls during Pester runs.</p>
</li>
<li><p>Clear separation between Graph (request/trace) and Azure RBAC (enforcement at resource scope).</p>
</li>
</ul>
<p>Key functions and behavior:</p>
<ul>
<li><p><code>Get-GraphAccessToken</code> and <code>Connect-PimGraph</code>: establish Graph access using an Azure CLI token when present; fall back to Connect-MgGraph in dev.</p>
</li>
<li><p><code>Invoke-PimGraphRequest</code>: sends requests to v1.0 or beta, with a predictable return shape and error handling for future Graph integrations.</p>
</li>
<li><p><code>Set-PimAzContext</code>: establishes Az PowerShell context using OIDC federated token or managed identity for non-interactive CI runs.</p>
</li>
<li><p><code>Resolve-PimRoleResourcePairs</code>: flexible pairing logic used by the workflow to map roleIds to resourceIds.</p>
</li>
<li><p><code>New-TemporaryKeyVaultRoleAssignment</code> and <code>Remove-TemporaryKeyVaultRoleAssignment</code>: create/delete RBAC assignments at the Key Vault scope for a specific principal (the automation identity). Output uses <code>RoleAssignmentId</code>/<code>RoleAssignmentName</code> (per Az.Resources).</p>
</li>
<li><p><code>Set-PimKeyVaultSecret</code>: sets or rotates a secret using Az.KeyVault under the short-lived RBAC assignment, with Forbidden-aware retry to handle eventual consistency of new role assignments.</p>
</li>
<li><p><code>Invoke-TempKeyVaultRotationLifecycle</code>: orchestrates the full sequence, validates removal, and returns a structured object suitable for CI logs and artifacts.</p>
</li>
</ul>
<p>Contract (high level):</p>
<ul>
<li><p>Inputs: role definition ID, target resource ID (Key Vault), principal object ID, vault and secret names, and optional justification/metadata.</p>
</li>
<li><p>Error modes: Graph unavailability (handled via stub/test mode), RBAC propagation delays (should be retried with backoff), and Azure API transient failures (retryable).</p>
</li>
<li><p>Outputs: rotation metadata (vault, secret name, secret version, timestamps), RBAC assignment details (id, scope), and any contextual lifecycle information surfaced by the helper.</p>
</li>
</ul>
<h2 id="heading-security-and-governance-considerations">Security and governance considerations</h2>
<ul>
<li><p>Permissions: Use least-privilege scopes and roles; prefer RBAC scopes at the minimal resource level required.</p>
</li>
<li><p>Authentication: Prefer OIDC for CI runners and managed identities for Azure-hosted automation; avoid long-lived client secrets.</p>
</li>
<li><p>Approval: Require human approval (GitHub Environment) for production; optionally add policy-based automated approvals for low-risk cases.</p>
</li>
<li><p>TTLs: Keep elevation windows as short as practical. Validate that RBAC propagation is complete before proceeding.</p>
</li>
<li><p>Audit: Persist machine-readable activation artifacts for retention and SIEM ingestion.</p>
</li>
<li><p><strong>Workload identity hardening</strong>: This just-in-time + just-enough access pattern pairs well with <a target="_blank" href="https://learn.microsoft.com/en-us/entra/workload-id/workload-identity-federation-create-trust">Microsoft Entra Workload Identity Protection</a> to further restrict token acquisition and usage. By combining time-limited RBAC assignments with workload identity restrictions (e.g., OIDC token validation, IP allow listing, and token binding), you create a defense-in-depth posture that makes lateral movement and token replay attacks significantly harder.</p>
</li>
</ul>
<h2 id="heading-testing-and-validation">Testing and validation</h2>
<ul>
<li><p>Unit tests: When you add tests, prefer Pester with Graph calls disabled via an environment flag; mock Az/Graph cmdlets to validate logic paths.</p>
</li>
<li><p>Integration tests: Use a test subscription and a disposable Key Vault. Ensure cleanup of temporary role assignments.</p>
</li>
<li><p>Failure modes: Validate behavior for denied approvals, RBAC propagation delays (Forbidden), and transient Azure API errors (retries with back off).</p>
</li>
</ul>
<h2 id="heading-demo-walk-through-secret-key-rotation">Demo / walk-through — Secret / Key Rotation</h2>
<p>This walk-through focuses on a concrete, high-value scenario: rotating a Key Vault secret (for example, a service principal client secret or an application key) under a just-in-time privileged activation requested by the CI runner. Rotating secrets is a common operational task that requires elevated privileges and benefits from short-lived, auditable elevation.</p>
<p>High-level steps for the demo</p>
<ol>
<li><p>Prerequisites</p>
<ul>
<li><p>An Azure Key Vault configured to use the Azure RBAC permission model (<code>enableRbacAuthorization: true</code>). This allows data-plane permissions via RBAC.</p>
</li>
<li><p>A CI runner identity: GitHub OIDC for the workflow, or a user-assigned managed identity for Azure-hosted runs.</p>
</li>
</ul>
</li>
<li><p>CI job starts and validates context</p>
<ul>
<li>The GitHub Action runner verifies branch/ticket/PR metadata and ensures the job is allowed to request elevation (demo mode may allow auto-approvals).</li>
</ul>
</li>
<li><p>Prepare activation context programmatically</p>
<ul>
<li><p>The workflow calls <code>Resolve-PimRoleResourcePairs</code> to expand role/resource combinations into discrete activation targets.</p>
</li>
<li><p>For automation identities (managed identities), the flow focuses on short-lived RBAC assignments rather than PIM request objects, keeping the footprint minimal for CI.</p>
</li>
</ul>
</li>
<li><p>Approval and activation</p>
<ul>
<li><p>The reusable workflow posts an approval table and pauses at a GitHub Environment gate. Approvers can review the requested role/resource pairs before proceeding.</p>
</li>
<li><p>A future enhancement may add an optional automated approver (Function/Logic App) for specific policies or send notifications to Teams/Slack.</p>
</li>
</ul>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762464703184/c16debc5-1724-4307-a830-72639ae35de9.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762464855983/43b52927-8743-41cd-a6d9-674a4395f27c.png" alt class="image--center mx-auto" /></p>
<ol start="5">
<li><p>Rotate the secret (performed under the JIT activation)</p>
<ul>
<li><p>The automation generates a new secret value and calls <code>Set-AzKeyVaultSecret</code> under a temporary, Key Vault–scoped RBAC assignment. The flow implemented here is:</p>
<ol>
<li><p>Wait for approval at the GitHub Environment gate.</p>
</li>
<li><p>Create a temporary Key Vault data-plane RBAC assignment (e.g., Key Vault Secrets Officer) for the automation identity. New assignments may take a short time to propagate; the module retries on Forbidden.</p>
</li>
<li><p>Perform the secret write while the assignment is active.</p>
</li>
<li><p>Remove the temporary assignment and validate removal.</p>
</li>
<li><p>Return rotation metadata (vault name, secret name, version, timestamps) to the caller.</p>
</li>
</ol>
</li>
<li><p>The repository includes <code>scripts/PimAutomation.psm1</code> with helper functions and a lifecycle function <code>Invoke-TempKeyVaultRotationLifecycle</code> that orchestrates the create → rotate → delete sequence.</p>
</li>
<li><p>Optionally rotate dependent credentials and notify consumers in a follow-up step (future enhancement with a safe consumer-rotation harness).</p>
</li>
</ul>
</li>
<li><p>Revoke/expire and audit</p>
<ul>
<li><p>After the rotation completes, the activation ends (either automatically by PIM TTL or by an explicit revocation step). The pipeline records a machine-readable artifact containing activation timing, rotated secret versions, and build/PR metadata.</p>
</li>
<li><p>Upload that artifact to the run's artifacts and forward structured events to SIEM or a compliance store.</p>
</li>
</ul>
</li>
<li><p>Post-rotation validation</p>
<ul>
<li><p>Run integration smoke tests that verify the rotated secret works for consumers (use a test consumer identity and avoid printing secrets to logs).</p>
</li>
<li><p>If validation fails, run an automated rollback plan (store previous secret version reference and use it to restore if necessary).</p>
</li>
</ul>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762465558860/a5046f16-02e2-4c7f-a3bb-0493282d575a.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>JIT elevation for automation brings the control and auditability of PIM into your delivery pipelines. By replacing standing privileges with an approval-gated, short-lived role assignment, you reduce risk without sacrificing speed. The patterns here give you a pragmatic starting point you can adapt — start with the reusable workflow and lifecycle helper, then layer in Graph-backed requests and richer approvals as you mature. 🚀</p>
<h2 id="heading-references">References</h2>
<ul>
<li><p>Github Repo: https://github.com/broberts23/vsCode/tree/main/project-jit-pim</p>
</li>
<li><p>Microsoft Entra PIM docs: https://learn.microsoft.com/en-us/entra/id-governance/privileged-identity-management/pim-configure</p>
</li>
<li><p>Microsoft Graph PIM APIs: https://learn.microsoft.com/en-us/graph/api/resources/privilegedidentitymanagementv3-overview?view=graph-rest-1.0</p>
</li>
<li><p>Azure RBAC for Key Vault: https://learn.microsoft.com/en-us/azure/key-vault/general/rbac-guide</p>
</li>
<li><p>Az PowerShell: Connect-AzAccount — https://learn.microsoft.com/powershell/module/az.accounts/connect-azaccount?view=azps-latest</p>
</li>
<li><p>Az PowerShell: Set-AzKeyVaultSecret — https://learn.microsoft.com/powershell/module/az.keyvault/set-azkeyvaultsecret?view=azps-latest</p>
</li>
<li><p>Az PowerShell: New-AzRoleAssignment — https://learn.microsoft.com/powershell/module/az.resources/new-azroleassignment?view=azps-latest</p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Automating Blob Cleanup with Azure Storage Lifecycle Management Policies]]></title><description><![CDATA[Sometimes the solution you're building doesn't need another Function App, another timer trigger, or another piece of custom code to maintain. Azure Storage has a built-in lifecycle management engine that can handle age-based cleanup policies entirely...]]></description><link>https://benroberts.io/automating-blob-cleanup-with-azure-storage-lifecycle-management-policies</link><guid isPermaLink="true">https://benroberts.io/automating-blob-cleanup-with-azure-storage-lifecycle-management-policies</guid><category><![CDATA[Azure]]></category><category><![CDATA[Bicep]]></category><category><![CDATA[Powershell]]></category><dc:creator><![CDATA[Ben Roberts]]></dc:creator><pubDate>Sun, 31 Aug 2025 14:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1763873869773/72805983-77d9-4329-85fb-cfd6065cbbaf.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Sometimes the solution you're building doesn't need another Function App, another timer trigger, or another piece of custom code to maintain. Azure Storage has a built-in lifecycle management engine that can handle age-based cleanup policies entirely within the service—no compute, no secrets, no runtime to monitor. This post walks through how lifecycle policies work, what you can do with them, and how to deploy them cleanly using Bicep.</p>
<p>Reference: <a target="_blank" href="https://learn.microsoft.com/azure/storage/blobs/lifecycle-management-overview">Azure Blob Storage Lifecycle Management</a></p>
<h2 id="heading-what-are-lifecycle-management-policies">What Are Lifecycle Management Policies?</h2>
<p>A lifecycle management policy is a JSON ruleset attached to a Storage Account that tells Azure, "Here's what I want you to do with blobs that match certain criteria." The Azure Storage service evaluates these rules periodically (timing is internal and service-managed) and applies actions like deleting old blobs, moving them to cooler storage tiers, or cleaning up versions and snapshots.</p>
<p>You define:</p>
<ul>
<li><p><strong>Filters</strong>: Which blobs to target (by type, container prefix, blob name prefix)</p>
</li>
<li><p><strong>Actions</strong>: What to do with them (delete, tier to Cool/Archive, etc.)</p>
</li>
<li><p><strong>Thresholds</strong>: Time-based conditions (days since creation, modification, or last access)</p>
</li>
</ul>
<p>The policy runs automatically, transparently, and scales with your data—no servers, no invocation logs to parse, no scaling concerns.</p>
<h2 id="heading-anatomy-of-a-lifecycle-rule">Anatomy of a Lifecycle Rule</h2>
<p>Here's the simplest possible rule: delete block blobs older than 7 days.</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"rules"</span>: [
    {
      <span class="hljs-attr">"name"</span>: <span class="hljs-string">"DeleteOldBlobs"</span>,
      <span class="hljs-attr">"enabled"</span>: <span class="hljs-literal">true</span>,
      <span class="hljs-attr">"type"</span>: <span class="hljs-string">"Lifecycle"</span>,
      <span class="hljs-attr">"definition"</span>: {
        <span class="hljs-attr">"filters"</span>: {
          <span class="hljs-attr">"blobTypes"</span>: [<span class="hljs-string">"blockBlob"</span>]
        },
        <span class="hljs-attr">"actions"</span>: {
          <span class="hljs-attr">"baseBlob"</span>: {
            <span class="hljs-attr">"delete"</span>: {
              <span class="hljs-attr">"daysAfterModificationGreaterThan"</span>: <span class="hljs-number">7</span>
            }
          }
        }
      }
    }
  ]
}
</code></pre>
<p>This rule:</p>
<ul>
<li><p>Targets all block blobs (<code>blobTypes</code>)</p>
</li>
<li><p>Deletes base blobs (<code>baseBlob.delete</code>) if <code>LastModified</code> is more than 7 days ago</p>
</li>
<li><p>-Runs periodically (service-managed schedule)</p>
</li>
</ul>
<h2 id="heading-filtering-by-container">Filtering by Container</h2>
<p>In production, you rarely want to apply a policy to every blob in the account. The <code>prefixMatch</code> filter lets you target specific containers or blob prefixes.</p>
<pre><code class="lang-json"><span class="hljs-string">"filters"</span>: {
  <span class="hljs-attr">"blobTypes"</span>: [<span class="hljs-string">"blockBlob"</span>],
  <span class="hljs-attr">"prefixMatch"</span>: [<span class="hljs-string">"signin"</span>, <span class="hljs-string">"audit"</span>, <span class="hljs-string">"logs"</span>]
}
</code></pre>
<p>This matches:</p>
<ul>
<li><p><code>signin/anything</code></p>
</li>
<li><p><code>audit/anything</code></p>
</li>
<li><p><code>logs/anything</code></p>
</li>
</ul>
<p>Blob paths are virtual (Azure Storage is flat), so <code>signin/entra/20251123.json</code> matches the <code>signin</code> prefix. You can be more granular: <code>"prefixMatch": ["signin/entra"]</code> would only target that subtree.</p>
<p>Azure allows up to 100 rules per Storage Account. By consolidating multiple containers into a single rule with an array of prefixes, you stay well under quota and keep the policy maintainable.</p>
<h2 id="heading-handling-snapshots">Handling Snapshots</h2>
<p>Blobs can have snapshots (point-in-time immutable copies). If you delete a base blob but leave snapshots orphaned, they continue consuming storage and cost. Lifecycle policies have a dedicated <code>snapshot</code> action:</p>
<pre><code class="lang-json"><span class="hljs-string">"actions"</span>: {
  <span class="hljs-attr">"baseBlob"</span>: {
    <span class="hljs-attr">"delete"</span>: {
      <span class="hljs-attr">"daysAfterModificationGreaterThan"</span>: <span class="hljs-number">7</span>
    }
  },
  <span class="hljs-attr">"snapshot"</span>: {
    <span class="hljs-attr">"delete"</span>: {
      <span class="hljs-attr">"daysAfterCreationGreaterThan"</span>: <span class="hljs-number">7</span>
    }
  }
}
</code></pre>
<p>Note the difference:</p>
<ul>
<li><p><strong>Base blobs</strong> use <code>daysAfterModificationGreaterThan</code> (when the blob was last written)</p>
</li>
<li><p><strong>Snapshots</strong> use <code>daysAfterCreationGreaterThan</code> (when the snapshot was created, which is immutable)</p>
</li>
</ul>
<p>This ensures snapshots older than 7 days are purged alongside their parent, keeping storage tidy.</p>
<h2 id="heading-tiering-before-deletion-cost-optimization">Tiering Before Deletion (Cost Optimization)</h2>
<p>If your access patterns allow, you can tier blobs to cooler storage (Cool or Archive) before deleting them entirely. This reduces storage costs while retaining data for a grace period.</p>
<pre><code class="lang-json"><span class="hljs-string">"actions"</span>: {
  <span class="hljs-attr">"baseBlob"</span>: {
    <span class="hljs-attr">"tierToCool"</span>: {
      <span class="hljs-attr">"daysAfterModificationGreaterThan"</span>: <span class="hljs-number">7</span>
    },
    <span class="hljs-attr">"tierToArchive"</span>: {
      <span class="hljs-attr">"daysAfterModificationGreaterThan"</span>: <span class="hljs-number">30</span>
    },
    <span class="hljs-attr">"delete"</span>: {
      <span class="hljs-attr">"daysAfterModificationGreaterThan"</span>: <span class="hljs-number">90</span>
    }
  }
}
</code></pre>
<p>Lifecycle:</p>
<ol>
<li><p><strong>Day 7</strong>: Blob moves to Cool tier (lower storage cost, higher access cost)</p>
</li>
<li><p><strong>Day 30</strong>: Blob moves to Archive tier (lowest storage cost, high rehydration cost)</p>
</li>
<li><p><strong>Day 90</strong>: Blob deleted permanently</p>
</li>
</ol>
<p>This staged approach is common in compliance scenarios where you need to retain data for auditing but can tolerate slower access as it ages.</p>
<p>Reference: <a target="_blank" href="https://learn.microsoft.com/azure/storage/blobs/access-tiers-overview">Access tiers for blob data</a></p>
<h2 id="heading-blob-versioning-and-lifecycle-policies">Blob Versioning and Lifecycle Policies</h2>
<p>If versioning is enabled on your Storage Account, every overwrite creates a new version. Old versions can accumulate quickly. Lifecycle policies support version-specific actions:</p>
<pre><code class="lang-json"><span class="hljs-string">"actions"</span>: {
  <span class="hljs-attr">"version"</span>: {
    <span class="hljs-attr">"delete"</span>: {
      <span class="hljs-attr">"daysAfterCreationGreaterThan"</span>: <span class="hljs-number">30</span>
    }
  }
}
</code></pre>
<p>This deletes non-current versions older than 30 days, keeping only the latest version and recent history.</p>
<p>Reference: <a target="_blank" href="https://learn.microsoft.com/azure/storage/blobs/versioning-overview">Blob versioning</a></p>
<h2 id="heading-last-access-time-tracking">Last Access Time Tracking</h2>
<p>Azure Storage can optionally track when each blob was last read (requires enabling access time tracking on the account). Policies can then delete blobs that haven't been accessed recently, even if they're modified frequently:</p>
<pre><code class="lang-json"><span class="hljs-string">"actions"</span>: {
  <span class="hljs-attr">"baseBlob"</span>: {
    <span class="hljs-attr">"tierToCool"</span>: {
      <span class="hljs-attr">"daysAfterLastAccessTimeGreaterThan"</span>: <span class="hljs-number">30</span>
    },
    <span class="hljs-attr">"enableAutoTierToHotFromCool"</span>: <span class="hljs-literal">true</span>,
    <span class="hljs-attr">"delete"</span>: {
      <span class="hljs-attr">"daysAfterLastAccessTimeGreaterThan"</span>: <span class="hljs-number">90</span>
    }
  }
}
</code></pre>
<p>This is powerful for log archives or cold data lakes where "staleness" means "nobody's reading this anymore" rather than "nobody's writing to this anymore."</p>
<p>Reference: <a target="_blank" href="https://learn.microsoft.com/azure/storage/blobs/lifecycle-management-policy-configure">Optimize costs by automatically managing the data lifecycle</a></p>
<h2 id="heading-deploying-with-bicep">Deploying with Bicep</h2>
<p>Bicep lets you version, test, and deploy lifecycle policies as infrastructure-as-code. Here's a minimal module:</p>
<pre><code class="lang-powershell">// lifecyclePolicy.bicep
<span class="hljs-keyword">param</span> storageAccountName string
<span class="hljs-keyword">param</span> containerPrefixes array
<span class="hljs-keyword">param</span> retentionDays int = <span class="hljs-number">7</span>

resource storageAccount <span class="hljs-string">'Microsoft.Storage/storageAccounts@2025-06-01'</span> existing = {
  name: storageAccountName
}

resource managementPolicy <span class="hljs-string">'Microsoft.Storage/storageAccounts/managementPolicies@2025-06-01'</span> = {
  name: <span class="hljs-string">'default'</span>
  parent: storageAccount
  properties: {
    policy: {
      rules: [
        {
          <span class="hljs-type">name</span>: <span class="hljs-string">'DeleteOldBlobs'</span>
          <span class="hljs-type">enabled</span>: <span class="hljs-type">true</span>
          <span class="hljs-type">type</span>: <span class="hljs-string">'Lifecycle'</span>
          <span class="hljs-type">definition</span>: {
            <span class="hljs-type">filters</span>: {
              <span class="hljs-type">blobTypes</span>: [<span class="hljs-string">'blockBlob'</span>]
              <span class="hljs-type">prefixMatch</span>: <span class="hljs-type">containerPrefixes</span>
            }
            <span class="hljs-type">actions</span>: {
              <span class="hljs-type">baseBlob</span>: {
                <span class="hljs-type">delete</span>: {
                  <span class="hljs-type">daysAfterModificationGreaterThan</span>: <span class="hljs-type">retentionDays</span>
                }
              }
              <span class="hljs-type">snapshot</span>: {
                <span class="hljs-type">delete</span>: {
                  <span class="hljs-type">daysAfterCreationGreaterThan</span>: <span class="hljs-type">retentionDays</span>
                }
              }
            }
          }
        }
      ]
    }
  }
}
</code></pre>
<p>Reference: <a target="_blank" href="https://learn.microsoft.com/azure/templates/microsoft.storage/storageaccounts/managementpolicies">Microsoft.Storage/storageAccounts/managementPolicies</a></p>
<p>Deploy with environment-specific parameters:</p>
<pre><code class="lang-powershell"><span class="hljs-built_in">New-AzResourceGroupDeployment</span> `
  <span class="hljs-literal">-ResourceGroupName</span> <span class="hljs-string">"rg-prod"</span> `
  <span class="hljs-literal">-TemplateFile</span> ./infra/main.bicep `
  <span class="hljs-literal">-TemplateParameterFile</span> ./infra/parameters.prod.json
</code></pre>
<p>Parameter files let you vary container lists and retention periods across dev/test/prod without duplicating Bicep code.</p>
<p>Reference: <a target="_blank" href="https://learn.microsoft.com/powershell/module/az.resources/new-azresourcegroupdeployment?view=azps-latest">New-AzResourceGroupDeployment</a></p>
<h2 id="heading-multi-environment-strategy">Multi-Environment Strategy</h2>
<p>Structure parameter files like this:</p>
<p><strong>parameters.dev.json</strong> (aggressive cleanup for fast iteration):</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"storageAccountName"</span>: { <span class="hljs-attr">"value"</span>: <span class="hljs-string">"stdevblobcleanup001"</span> },
  <span class="hljs-attr">"containerPrefixes"</span>: { <span class="hljs-attr">"value"</span>: [<span class="hljs-string">"testA"</span>, <span class="hljs-string">"testB"</span>] },
  <span class="hljs-attr">"retentionDays"</span>: { <span class="hljs-attr">"value"</span>: <span class="hljs-number">2</span> }
}
</code></pre>
<p><strong>parameters.prod.json</strong> (conservative retention):</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"storageAccountName"</span>: { <span class="hljs-attr">"value"</span>: <span class="hljs-string">"stprodblobcleanup001"</span> },
  <span class="hljs-attr">"containerPrefixes"</span>: { <span class="hljs-attr">"value"</span>: [<span class="hljs-string">"signin"</span>, <span class="hljs-string">"audit"</span>, <span class="hljs-string">"logs"</span>] },
  <span class="hljs-attr">"retentionDays"</span>: { <span class="hljs-attr">"value"</span>: <span class="hljs-number">90</span> }
}
</code></pre>
<p>Same Bicep template, different behavior per environment. Version control tracks changes; CI/CD pipelines enforce review gates before production deployments.</p>
<h2 id="heading-when-to-use-lifecycle-policies-vs-custom-code">When to Use Lifecycle Policies vs Custom Code</h2>
<p><strong>Use lifecycle policies when:</strong></p>
<ul>
<li><p>Retention logic is purely time-based (days since modification/creation/access)</p>
</li>
<li><p>You want zero operational overhead (no Functions, no logs to monitor)</p>
</li>
<li><p>Filters are simple (blob type, container prefix, blob name prefix)</p>
</li>
<li><p>Tiering and deletion are sufficient actions</p>
</li>
</ul>
<p><strong>Use custom code (e.g., Azure Functions) when:</strong></p>
<ul>
<li><p>You need filename parsing or complex business logic (e.g., "delete if filename matches pattern X")</p>
</li>
<li><p>Per-run reporting is required (deletion counts per container logged to App Insights)</p>
</li>
<li><p>You need conditional behavior (demo mode vs real mode, dry-run logic)</p>
</li>
<li><p>Integration with external systems (send notifications, update databases, emit custom metrics)</p>
</li>
</ul>
<p>Lifecycle policies are elegant for straightforward retention; custom code is the escape hatch for everything else.</p>
<h2 id="heading-testing-your-policy">Testing Your Policy</h2>
<p>Before enabling a policy in production, seed test data with the provided <code>Seed-StorageContainers.ps1</code> script:</p>
<pre><code class="lang-powershell">./scripts/Seed<span class="hljs-literal">-StorageContainers</span>.ps1 `
  <span class="hljs-literal">-StorageAccountName</span> <span class="hljs-string">"stdevblobcleanup001"</span> `
  <span class="hljs-literal">-ResourceGroup</span> <span class="hljs-string">"rg-dev"</span> `
  <span class="hljs-literal">-PastDays</span> <span class="hljs-number">10</span> `
  <span class="hljs-literal">-FutureDays</span> <span class="hljs-number">2</span>
</code></pre>
<p>This creates blobs with filenames encoding timestamps (e.g., <code>signin/entra/20251113000000.json</code>). Deploy your policy with a short retention window (<code>retentionDays: 2</code>) and check after the next daily evaluation cycle (typically within 24 hours) whether old blobs were deleted.</p>
<p><strong>Important caveat</strong>: Lifecycle policies evaluate <code>LastModified</code> or creation time, not filename content. The seeding script creates blobs with timestamps in their filenames, but all blobs will have <code>LastModified</code> set to the upload time. To test retention policies effectively, you need to wait for blobs to age naturally—there is no supported way to backdate blob timestamps. For rapid testing, use very short retention periods (1-2 days) and verify policy execution after 24-48 hours.</p>
<h2 id="heading-validating-the-policy-in-the-azure-portal">Validating the Policy in the Azure Portal</h2>
<p>After deployment (Bicep or ARM), you can confirm the lifecycle policy configuration and observe its effects directly in the Azure Portal.</p>
<h3 id="heading-verify-rule-definition">Verify Rule Definition</h3>
<ol>
<li><p>Navigate to the Storage Account.</p>
</li>
<li><p>In the left menu, under <strong>Data management</strong>, select <strong>Lifecycle management</strong>.</p>
</li>
<li><p>Use the <strong>List view</strong> tab to confirm your rule appears (e.g., <code>DeleteOldBlobs</code>) and its status is <strong>Enabled</strong>.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763881974069/89a6efd4-9463-4764-8f82-819969affd8e.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763881990635/7a365b18-c498-415d-8104-36eeaff8bbba.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763882100114/3c019a5f-9c53-45f3-a53a-c5387dfee375.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763882109887/36ae30a8-5362-47de-9a09-e3c6696930fa.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763882121935/e4e4fb64-b533-4028-a29f-0123146d3c3f.png" alt class="image--center mx-auto" /></p>
<ol start="4">
<li>Switch to <strong>Code view</strong> to inspect the JSON that the portal stored. It should reflect the Bicep deployment (prefixes, <code>daysAfterModificationGreaterThan</code>, snapshot settings).</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763882140566/481ed705-7454-418b-9f40-5e1cc0e51442.png" alt class="image--center mx-auto" /></p>
<p>If the rule was just deployed or modified, allow up to 24 hours for the first evaluation cycle (per Microsoft guidance). The presence of the rule in the portal does not mean deletions have already occurred.</p>
<p>Reference: <a target="_blank" href="https://learn.microsoft.com/azure/storage/blobs/lifecycle-management-policy-configure#create-or-manage-a-policy">Configure a lifecycle management policy (Azure Portal)</a></p>
<h2 id="heading-monitoring-and-observability">Monitoring and Observability</h2>
<p>Lifecycle policy executions don't emit logs to Application Insights or Azure Functions invocation history. To track deletions:</p>
<ol>
<li><p><strong>Storage Analytics Logs</strong>: Enable logging on the Storage Account; deletion operations appear in <code>$logs</code> container</p>
</li>
<li><p><strong>Azure Monitor Metrics</strong>: Track container-level metrics (blob count, capacity)</p>
</li>
<li><p><strong>Log Analytics Integration</strong>: Route diagnostic logs to a Log Analytics workspace and query <code>StorageBlobLogs</code></p>
</li>
</ol>
<p>Reference: <a target="_blank" href="https://learn.microsoft.com/azure/storage/blobs/monitor-blob-storage">Monitor Azure Blob Storage</a></p>
<p>Example Kusto query for deletion tracking:</p>
<pre><code class="lang-powershell">StorageBlobLogs
| <span class="hljs-built_in">where</span> OperationName == <span class="hljs-string">"DeleteBlob"</span>
| <span class="hljs-built_in">where</span> TimeGenerated &gt; ago(<span class="hljs-number">7</span>d)
| summarize DeletionCount = count() by ContainerName = split(Uri, <span class="hljs-string">"/"</span>)[<span class="hljs-number">3</span>]
| order by DeletionCount desc
</code></pre>
<h2 id="heading-extending-the-bicep-module">Extending the Bicep Module</h2>
<p>The lifecycle policy module can grow with your needs. Add parameters for:</p>
<ul>
<li><p><strong>Tiering actions</strong>: Expose <code>tierToCool</code>, <code>tierToArchive</code> with separate thresholds</p>
</li>
<li><p><strong>Version cleanup</strong>: Add <code>version.delete</code> action if versioning enabled</p>
</li>
<li><p><strong>Multiple rules</strong>: Loop over an array of rule definitions for complex scenarios</p>
</li>
<li><p><strong>Conditional deployment</strong>: Use Bicep conditionals to deploy policies only if certain features are enabled (e.g., versioning, soft delete)</p>
</li>
</ul>
<p>Bicep's modular design keeps the core template simple while allowing opt-in complexity.</p>
<p>Reference: <a target="_blank" href="https://learn.microsoft.com/azure/azure-resource-manager/bicep/best-practices">Bicep Best Practices</a></p>
<h2 id="heading-wrapping-up">Wrapping Up</h2>
<p>Azure Storage Lifecycle Management Policies are the right tool when retention is time-based and you value operational simplicity over granular control. Deploy them with Bicep, test with realistic data, and let the service handle the rest. Your infrastructure stays declarative, your storage costs stay predictable, and you avoid the operational overhead of maintaining yet another background job.</p>
<p>For scenarios demanding filename parsing, per-run reporting, or conditional logic, custom code remains the escape hatch—but for most cleanup workloads, the built-in lifecycle engine is enough. 🚀</p>
<h2 id="heading-references">References</h2>
<ul>
<li><a target="_blank" href="https://github.com/broberts23/vsCode/tree/main/project-blob-cleanup">My GitHub Repo</a></li>
<li><a target="_blank" href="https://learn.microsoft.com/azure/storage/blobs/lifecycle-management-overview">Azure Blob Storage Lifecycle Management</a></li>
<li><a target="_blank" href="https://learn.microsoft.com/azure/azure-resource-manager/bicep/best-practices">Bicep Best Practices</a></li>
<li><a target="_blank" href="https://learn.microsoft.com/azure/storage/blobs/monitor-blob-storage">Monitor Azure Blob Storage</a></li>
<li><a target="_blank" href="https://learn.microsoft.com/azure/storage/blobs/lifecycle-management-policy-configure#enable-access-time-tracking">Enable access time tracking</a></li>
<li><a target="_blank" href="https://learn.microsoft.com/azure/storage/blobs/lifecycle-management-policy-monitor">Lifecycle management policy monitoring</a></li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Monitoring Azure Storage Queues with PowerShell]]></title><description><![CDATA[Keeping an eye on Azure Storage Queue backlogs is essential for reliable systems and scale decisions. Most teams want per-queue visibility (not just account-level metrics), simple dashboards/alerts, and a repeatable deployment story. This blog docume...]]></description><link>https://benroberts.io/monitoring-azure-storage-queues-with-powershell</link><guid isPermaLink="true">https://benroberts.io/monitoring-azure-storage-queues-with-powershell</guid><category><![CDATA[Azure]]></category><category><![CDATA[Powershell]]></category><dc:creator><![CDATA[Ben Roberts]]></dc:creator><pubDate>Sat, 23 Aug 2025 05:38:55 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1755927385888/06ea75f5-5a08-4522-a5eb-d3a42a1b38a7.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Keeping an eye on Azure Storage Queue backlogs is essential for reliable systems and scale decisions. Most teams want per-queue visibility (not just account-level metrics), simple dashboards/alerts, and a repeatable deployment story. This blog documents a practical approach I use in this GitHub <a target="_blank" href="https://github.com/broberts23/storage-queue-metric">repo</a>. A PowerShell Azure Function that emits per-queue message counts as custom metrics to Application Insights, plus Bicep and GitHub Actions to deploy the environment and seed test data.</p>
<p>We’ll cover the problem, the design, some tricky implementation details (Managed Identity auth, CloudQueue vs QueueClient data paths), and how to visualize the results. Everything shown lives in this repository so you can clone and run it end-to-end.</p>
<h2 id="heading-the-problem-and-the-constraints">The problem and the constraints</h2>
<ul>
<li><p>Azure Monitor’s built-in QueueMessageCount for storage accounts is hourly and not split per queue. You can’t retrieve per-queue counts via Azure Monitor metrics.</p>
</li>
<li><p>Teams need per-queue counts at a 5 minute cadence to alert on spikes and monitor backlog trends.</p>
</li>
<li><p>We want secure auth (Managed Identity), no keys in code, and a minimal footprint.</p>
</li>
</ul>
<h2 id="heading-solution-overview">Solution overview</h2>
<p>We poll each queue in a storage account with Az.Storage and emit a custom metric per queue to Application Insights using the v2 ingestion endpoint. Workbooks and Metrics Explorer can then visualize and alert on these metrics by the QueueName dimension.</p>
<p>High level flow:</p>
<ol>
<li><p>Timer-trigger function runs every 5 minutes.</p>
</li>
<li><p>Uses Managed Identity to authenticate to the storage account data plane.</p>
</li>
<li><p>Enumerates queues and queries an approximate visible message count per queue.</p>
</li>
<li><p>Sends a custom metric item per queue to Application Insights with dimensions StorageAccount and QueueName.</p>
</li>
</ol>
<h2 id="heading-prerequisites-and-configuration">Prerequisites and configuration</h2>
<h3 id="heading-rbac-and-managed-identity">RBAC and Managed Identity</h3>
<p>The Function App uses a system-assigned managed identity. Assign one of these data-plane roles at the storage account:</p>
<ul>
<li><p>Storage Queue Data Reader (read-only)</p>
</li>
<li><p>Storage Queue Data Contributor (read/write)</p>
</li>
</ul>
<p>Then authenticate with <code>New-AzStorageContext -UseConnectedAccount</code>. This avoids keys and works well in Functions.</p>
<h3 id="heading-app-settings-environment-variables">App settings (environment variables)</h3>
<p>The function expects these settings (set as Function App application settings or local environment variables):</p>
<ul>
<li><p><code>APPLICATIONINSIGHTS_CONNECTION_STRING</code> (or <code>APPINSIGHTS_CONNECTION_STRING</code>)</p>
</li>
<li><p><code>AZURE_SUBSCRIPTION_ID</code></p>
</li>
<li><p><code>STORAGE_RESOURCE_GROUP</code></p>
</li>
<li><p><code>STORAGE_ACCOUNT_NAME</code></p>
</li>
</ul>
<h2 id="heading-function-implementation-powershell">Function implementation (PowerShell)</h2>
<p>The function is <code>functionApp/QueueMessageCount/run.ps1</code>. It uses Managed Identity via <code>New-AzStorageContext -UseConnectedAccount</code> and supports both the legacy WindowsAzure.Storage path and the modern Azure.Storage.Queues path.</p>
<p>Key setup and auth:</p>
<pre><code class="lang-powershell"><span class="hljs-comment"># Build OAuth data-plane context for queues using the Function App's managed identity</span>
<span class="hljs-variable">$ctx</span> = <span class="hljs-built_in">New-AzStorageContext</span> <span class="hljs-literal">-StorageAccountName</span> <span class="hljs-variable">$StorageAccountName</span> <span class="hljs-literal">-UseConnectedAccount</span> <span class="hljs-literal">-ErrorAction</span> Stop

<span class="hljs-comment"># List queues with AAD context</span>
<span class="hljs-variable">$queues</span> = <span class="hljs-built_in">Get-AzStorageQueue</span> <span class="hljs-literal">-Context</span> <span class="hljs-variable">$ctx</span> <span class="hljs-literal">-ErrorAction</span> Stop
</code></pre>
<h3 id="heading-why-useconnectedaccount-matters">Why -UseConnectedAccount matters</h3>
<p><code>-UseConnectedAccount</code> tells Az.Storage to use the Azure AD token from your current Az context (in Functions, the system-assigned managed identity from <code>Connect-AzAccount -Identity</code>) to authenticate to the Storage data plane. That has a few important implications:</p>
<ul>
<li><p>With <code>-UseConnectedAccount</code>:</p>
<ul>
<li><p>Data-plane calls are authorized by RBAC. Grant the identity a data-plane role like Storage Queue Data Reader/Contributor and you’re good—no keys in code.</p>
</li>
<li><p>The returned queue objects are typically backed by the modern Azure.Storage.Queues client, so you’ll find <code>QueueClient</code> and should read counts via <code>QueueClient.GetProperties().Value.ApproximateMessagesCount</code>.</p>
</li>
</ul>
</li>
<li><p>Without <code>-UseConnectedAccount</code>:</p>
<ul>
<li><p>New-AzStorageContext will fall back to shared key or connection-string auth. If you didn’t provide keys/connection string, the context can’t authorize queue data-plane calls and you’ll see 403s or empty/null properties—leading to zeros sent to App Insights.</p>
</li>
<li><p>If you try to reuse <code>$storage.Context</code> from <code>Get-AzStorageAccount</code>, it may not contain keys under a managed-identity scenario (listing keys requires management-plane permissions). You’ll end up with an unusable data-plane context.</p>
</li>
</ul>
</li>
</ul>
<p>In short: Managed Identity + no keys means use <code>-UseConnectedAccount</code>.</p>
<ul>
<li>The legacy <code>CloudQueue</code> path often isn’t available under AAD; preference the <code>QueueClient</code> path when using MI.</li>
</ul>
<p>Reading the data path:</p>
<pre><code class="lang-powershell"><span class="hljs-comment"># Modern (Azure.Storage.Queues) path</span>
<span class="hljs-variable">$props</span> = <span class="hljs-variable">$qref</span>.QueueClient.GetProperties()
<span class="hljs-variable">$approx</span> = <span class="hljs-variable">$props</span>.Value.ApproximateMessagesCount
<span class="hljs-keyword">if</span> (<span class="hljs-variable">$null</span> <span class="hljs-operator">-ne</span> <span class="hljs-variable">$approx</span>) { <span class="hljs-variable">$value</span> = [<span class="hljs-built_in">int</span>]<span class="hljs-variable">$approx</span> }
</code></pre>
<p>Sending a metric to Application Insights (v2 track endpoint):</p>
<pre><code class="lang-powershell"><span class="hljs-variable">$endpoint</span> = <span class="hljs-variable">$IngestionEndpoint</span>.TrimEnd(<span class="hljs-string">'/'</span>) + <span class="hljs-string">'/v2/track'</span>
<span class="hljs-variable">$env</span> = <span class="hljs-selector-tag">@</span>{ 
  name = <span class="hljs-string">'Microsoft.ApplicationInsights.Metric'</span>
  time = (<span class="hljs-built_in">Get-Date</span>).ToString(<span class="hljs-string">'o'</span>)
  iKey = <span class="hljs-variable">$ikey</span>
  <span class="hljs-keyword">data</span> = <span class="hljs-selector-tag">@</span>{ 
    baseType = <span class="hljs-string">'MetricData'</span>
    baseData = <span class="hljs-selector-tag">@</span>{ 
      ver = <span class="hljs-number">2</span>
      metrics = <span class="hljs-selector-tag">@</span>( <span class="hljs-selector-tag">@</span>{ name = <span class="hljs-string">'QueueMessageCount'</span>; value = [<span class="hljs-built_in">double</span>]<span class="hljs-variable">$value</span> } )
      properties = <span class="hljs-selector-tag">@</span>{ StorageAccount = <span class="hljs-variable">$StorageAccountName</span>; QueueName = <span class="hljs-variable">$queueName</span> }
    }
  }
}
<span class="hljs-built_in">Invoke-RestMethod</span> <span class="hljs-literal">-Method</span> Post <span class="hljs-literal">-Uri</span> <span class="hljs-variable">$endpoint</span> <span class="hljs-literal">-ContentType</span> <span class="hljs-string">'application/json'</span> <span class="hljs-literal">-Body</span> (<span class="hljs-variable">$env</span> | <span class="hljs-built_in">ConvertTo-Json</span> <span class="hljs-literal">-Depth</span> <span class="hljs-number">10</span>)
</code></pre>
<p>How the custom metric works:</p>
<ul>
<li><p>Endpoint and identity:</p>
<ul>
<li><p>We post to the ingestion endpoint from your connection string (<code>APPLICATIONINSIGHTS_CONNECTION_STRING</code>) at <code>.../v2/track</code>.</p>
</li>
<li><p>The envelope includes <code>iKey</code> (Instrumentation Key); the service uses it to attribute telemetry to your App Insights resource. If it’s missing/invalid, ingestion fails.</p>
</li>
</ul>
</li>
<li><p>Envelope shape (metrics v2):</p>
<ul>
<li><p><code>name = 'Microsoft.ApplicationInsights.Metric'</code> with <code>data.baseType = 'MetricData'</code> and <code>baseData.ver = 2</code>.</p>
</li>
<li><p><code>baseData.metrics</code> is an array of one or more metrics. Each item has <code>name</code> and a numeric <code>value</code> (double). We send one metric: <code>QueueMessageCount</code>.</p>
</li>
<li><p><code>baseData.properties</code> carries dimensions (key/value strings). We set <code>StorageAccount</code> and <code>QueueName</code>. In Metrics Explorer these become metric dimensions; in Logs they appear under <code>customDimensions</code>.</p>
</li>
</ul>
</li>
<li><p>Aggregation and visualization:</p>
<ul>
<li><p>App Insights treats each posted item as a metric sample. In Metrics Explorer, you can aggregate (Avg, Sum, Min, Max) over time and split by <code>QueueName</code> to chart per-queue trends.</p>
</li>
<li><p>In KQL (Logs &gt; customMetrics), numeric samples are available as <code>value</code>, and dimensions in <code>customDimensions</code>. Example:</p>
<pre><code class="lang-plaintext">  customMetrics
  | where name == "QueueMessageCount"
  | summarize avg(value) by tostring(customDimensions.QueueName), bin(timestamp, 5m)
</code></pre>
</li>
</ul>
</li>
<li><p>Cardinality and cost:</p>
<ul>
<li><p>Keep dimension cardinality reasonable (queue names are fine). Very high-cardinality dimensions increase metric series and cost.</p>
</li>
<li><p>Batch by sending an array of envelopes if needed; my function sends one envelope per queue per run, which is typically acceptable.</p>
</li>
</ul>
</li>
</ul>
<p>Connection string notes and validating ingestion:</p>
<ul>
<li><p>Connection string vs iKey parsing:</p>
<ul>
<li><p>The modern <code>APPLICATIONINSIGHTS_CONNECTION_STRING</code> contains both the ingestion endpoint and the Instrumentation Key. Our function parses the iKey and sets it on each envelope because the raw <code>/v2/track</code> endpoint expects an <code>iKey</code> on every item.</p>
</li>
<li><p>If you’d prefer not to parse the iKey yourself, consider using the official Application Insights/ Azure Monitor SDKs which read the connection string and sign telemetry automatically. For direct HTTP calls to <code>/v2/track</code>, keep including <code>iKey</code> in the envelope.</p>
</li>
</ul>
</li>
<li><p>Validate ingestion quickly:</p>
<ul>
<li><p>Check the HTTP response body from <code>/v2/track</code> — it returns counts. Example:</p>
<pre><code class="lang-powershell">  <span class="hljs-variable">$resp</span> = <span class="hljs-built_in">Invoke-RestMethod</span> <span class="hljs-literal">-Method</span> Post <span class="hljs-literal">-Uri</span> <span class="hljs-variable">$endpoint</span> <span class="hljs-literal">-ContentType</span> <span class="hljs-string">'application/json'</span> <span class="hljs-literal">-Body</span> <span class="hljs-variable">$bodyJson</span> <span class="hljs-literal">-ErrorAction</span> Stop
  <span class="hljs-keyword">if</span> (<span class="hljs-variable">$resp</span>.itemsAccepted <span class="hljs-operator">-lt</span> <span class="hljs-variable">$resp</span>.itemsReceived) {
    <span class="hljs-built_in">Write-Warning</span> (<span class="hljs-string">"AI ingestion partial success: Accepted={0} Received={1} Errors={2}"</span> <span class="hljs-operator">-f</span> <span class="hljs-variable">$resp</span>.itemsAccepted, <span class="hljs-variable">$resp</span>.itemsReceived, (<span class="hljs-variable">$resp</span>.errors | <span class="hljs-built_in">ConvertTo-Json</span> <span class="hljs-literal">-Depth</span> <span class="hljs-number">5</span>))
  }
</code></pre>
</li>
<li><p>Live Metrics: open Live Metrics in your App Insights resource to confirm the instance is receiving telemetry (requests/dependencies/traces). Custom metrics typically appear in Metrics Explorer within ~1–2 minutes even if they don’t show in Live Metrics streams directly.</p>
</li>
</ul>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1755926478251/0bf328f6-8529-4a00-9847-a1d6cd052e8d.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-why-per-queue-via-sdk-and-not-azure-monitor-metrics">Why per-queue via SDK and not Azure Monitor metrics?</h2>
<p>Azure’s built-in metric <code>QueueMessageCount</code> for <code>Microsoft.Storage/storageAccounts/queueServices</code> is sampled hourly and has no per-queue dimension. That’s great for account-level trends, but not for operational backlogs per queue. Reading the approximate visible count with the SDK provides timely, per-queue values suitable for dashboards and alerts.</p>
<h2 id="heading-infrastructure-as-code-bicep">Infrastructure as Code (Bicep)</h2>
<p>The Bicep file <code>infra/main.bicep</code> provisions:</p>
<ul>
<li><p>Storage account (Standard_LRS)</p>
</li>
<li><p>Queue service and six randomly named queues</p>
</li>
<li><p>App Insights instance</p>
</li>
<li><p>Consumption Function App (PowerShell) with a system-assigned managed identity</p>
</li>
<li><p>A file share for function content settings</p>
</li>
</ul>
<p>Notable settings injected into the Function App:</p>
<pre><code class="lang-plaintext">siteConfig: {
  appSettings: [
    { name: 'APPLICATIONINSIGHTS_CONNECTION_STRING', value: appInsights.properties.ConnectionString }
    { name: 'STORAGE_ACCOUNT_NAME', value: st.name }
    { name: 'STORAGE_RESOURCE_GROUP', value: resourceGroup().name }
    { name: 'AZURE_SUBSCRIPTION_ID', value: subscription().subscriptionId }
  ]
}
</code></pre>
<p>Outputs include the queue names and the storage/account info, which our scripts consume.</p>
<h2 id="heading-cicd-with-github-actions">CI/CD with GitHub Actions</h2>
<p>Two workflows are included:</p>
<ul>
<li><p><code>.github/workflows/deploy.yml</code> — end-to-end infra + function:</p>
<ul>
<li><p>Logs into Azure via OIDC (don’t forget to store the service principal details as repository secrets. I’ve covered using OIDC to authenticate Github to Azure in depth in previous blogs. You can find more information on how to set up OIDC authentication in parts 4 and 5 of <a target="_blank" href="https://benroberts.io/azure-mlops-challenge-blog-index">Azure MLOps Challenge Blog</a>)</p>
</li>
<li><p><a target="_blank" href="https://benroberts.io/azure-mlops-challenge-blog-index">Deploys <code>infra/main.bicep</code></a></p>
</li>
<li><p>Zips <code>functionApp/</code> and deploys the Function App</p>
</li>
<li><p>Runs <code>scripts/populate-queues.ps1</code> to add messages</p>
</li>
</ul>
</li>
<li><p><code>.github/workflows/deploy-function.yml</code> — function-only redeploy.</p>
</li>
</ul>
<p>These workflows take optional inputs for names, otherwise they auto-generate compliant names.</p>
<h2 id="heading-seeding-data-and-repeatable-tests-scripts">Seeding data and repeatable tests (scripts)</h2>
<p>Two helper scripts in <code>scripts/</code> create sample queues and populate messages using Azure CLI:</p>
<ul>
<li><code>create-queues.ps1</code> reads deployment outputs and creates queues:</li>
</ul>
<pre><code class="lang-powershell"><span class="hljs-variable">$out</span> = az deployment <span class="hljs-built_in">group</span> show -<span class="hljs-literal">-resource</span><span class="hljs-literal">-group</span> <span class="hljs-variable">$ResourceGroup</span> -<span class="hljs-literal">-name</span> <span class="hljs-variable">$DeploymentName</span> -<span class="hljs-literal">-query</span> properties.outputs <span class="hljs-literal">-o</span> json | <span class="hljs-built_in">ConvertFrom-Json</span>
<span class="hljs-variable">$storageName</span> = <span class="hljs-variable">$out</span>.storageAccount.value
<span class="hljs-variable">$queues</span> = <span class="hljs-variable">$out</span>.queueNames.value
<span class="hljs-variable">$conn</span> = az storage account <span class="hljs-built_in">show-connection</span><span class="hljs-literal">-string</span> -<span class="hljs-literal">-resource</span><span class="hljs-literal">-group</span> <span class="hljs-variable">$ResourceGroup</span> -<span class="hljs-literal">-name</span> <span class="hljs-variable">$storageName</span> <span class="hljs-literal">-o</span> tsv
<span class="hljs-keyword">foreach</span> (<span class="hljs-variable">$q</span> <span class="hljs-keyword">in</span> <span class="hljs-variable">$queues</span>) { az storage queue create -<span class="hljs-literal">-name</span> <span class="hljs-variable">$q</span> -<span class="hljs-literal">-connection</span><span class="hljs-literal">-string</span> <span class="hljs-variable">$conn</span> }
</code></pre>
<ul>
<li><code>populate-queues.ps1</code> fills each queue with a random number of messages:</li>
</ul>
<pre><code class="lang-powershell"><span class="hljs-variable">$out</span> = az deployment <span class="hljs-built_in">group</span> show -<span class="hljs-literal">-resource</span><span class="hljs-literal">-group</span> <span class="hljs-variable">$ResourceGroup</span> -<span class="hljs-literal">-name</span> <span class="hljs-variable">$DeploymentName</span> -<span class="hljs-literal">-query</span> properties.outputs <span class="hljs-literal">-o</span> json | <span class="hljs-built_in">ConvertFrom-Json</span>
<span class="hljs-variable">$storageName</span> = <span class="hljs-variable">$out</span>.storageAccount.value
<span class="hljs-variable">$queues</span> = <span class="hljs-variable">$out</span>.queueNames.value
<span class="hljs-variable">$conn</span> = az storage account <span class="hljs-built_in">show-connection</span><span class="hljs-literal">-string</span> -<span class="hljs-literal">-resource</span><span class="hljs-literal">-group</span> <span class="hljs-variable">$ResourceGroup</span> -<span class="hljs-literal">-name</span> <span class="hljs-variable">$storageName</span> <span class="hljs-literal">-o</span> tsv
<span class="hljs-keyword">foreach</span> (<span class="hljs-variable">$q</span> <span class="hljs-keyword">in</span> <span class="hljs-variable">$queues</span>) {
  <span class="hljs-variable">$count</span> = <span class="hljs-built_in">Get-Random</span> <span class="hljs-literal">-Minimum</span> <span class="hljs-variable">$MinMessages</span> <span class="hljs-literal">-Maximum</span> (<span class="hljs-variable">$MaxMessages</span> + <span class="hljs-number">1</span>)
  <span class="hljs-keyword">for</span> (<span class="hljs-variable">$i</span> = <span class="hljs-number">0</span>; <span class="hljs-variable">$i</span> <span class="hljs-operator">-lt</span> <span class="hljs-variable">$count</span>; <span class="hljs-variable">$i</span>++) { az storage message put -<span class="hljs-literal">-queue</span><span class="hljs-literal">-name</span> <span class="hljs-variable">$q</span> -<span class="hljs-literal">-content</span> <span class="hljs-string">"msg-<span class="hljs-variable">$</span>([random]::new().Next(100000,999999))"</span> -<span class="hljs-literal">-connection</span><span class="hljs-literal">-string</span> <span class="hljs-variable">$conn</span> }
}
</code></pre>
<p>These are used automatically in the GitHub workflow after deployment to generate a non-zero baseline for monitoring.</p>
<h2 id="heading-visualizing-and-alerting">Visualizing and alerting</h2>
<p>You can use either Metrics Explorer or Workbooks.</p>
<ul>
<li><p>Metrics Explorer (App Insights):</p>
<ul>
<li><p>Metric namespace: Custom</p>
</li>
<li><p>Metric: QueueMessageCount</p>
</li>
<li><p>Split by: QueueName</p>
</li>
<li><p>Filter by: StorageAccount if needed</p>
</li>
</ul>
</li>
<li><p>Workbooks (Logs):</p>
</li>
</ul>
<pre><code class="lang-plaintext">customMetrics
| where name == "QueueMessageCount"
| summarize avg(value) by tostring(customDimensions.QueueName), bin(timestamp, 5m)
| order by timestamp desc
</code></pre>
<p>Create a metric alert on the custom metric (dimension: QueueName) or a Log Analytics alert using a scheduled query. You can use the same query to visual the data in Workbooks</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1755926501293/3aaa0dff-47eb-4caf-b55c-e20077f14e4a.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-troubleshooting-why-would-counts-be-0">Troubleshooting: why would counts be 0?</h2>
<ul>
<li><p>Missing data-plane role: assign Storage Queue Data Reader/Contributor to the Function’s managed identity.</p>
</li>
<li><p>Using account keys but wrong context: prefer <code>-UseConnectedAccount</code> with MI.</p>
</li>
<li><p>Wrong data path: on AAD auth, <code>QueueClient.GetProperties().Value.ApproximateMessagesCount</code> is the reliable path; <code>CloudQueue</code> may not be present.</p>
</li>
<li><p>Immediate staleness: approximate counts lag slightly; verify via a quick <code>Peek</code> if in doubt.</p>
</li>
</ul>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Per-queue message counts are not available via Azure Monitor metrics, but they’re straightforward to gather the metrics with Az.Storage and publish as custom metrics in Application Insights. With Managed Identities, Bicep, and GitHub Actions, you can deploy the whole pipeline, seed data, and put dashboards/alerts in front of your team in an hour.</p>
<p>The code here is ready to use with a small footprint and clear extension points (retry/backoff, filtering queues, sampling). Clone the repo, deploy, and start monitoring. 🚀</p>
<hr />
<p>References</p>
<ul>
<li><p>Azure Storage Queues overview: <a target="_blank" href="https://learn.microsoft.com/azure/storage/queues/">https://learn.microsoft.com/azure/storage/queues/</a></p>
</li>
<li><p>Application Insights custom metrics: <a target="_blank" href="https://learn.microsoft.com/azure/azure-monitor/app/metrics">https://learn.microsoft.com/azure/azure-monitor/app/metrics</a></p>
</li>
<li><p>Az.Storage PowerShell: <a target="_blank" href="https://learn.microsoft.com/powershell/module/az.storage/">https://learn.microsoft.com/powershell/module/az.storage/</a></p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Enterprise Proxy Configuration for Azure Function Apps]]></title><description><![CDATA[Configuring outbound connectivity for Azure Function Apps is a critical consideration in enterprise environments, especially when balancing security, compliance, and operational requirements. This post explores how to configure proxy settings for Azu...]]></description><link>https://benroberts.io/enterprise-proxy-configuration-for-azure-function-apps</link><guid isPermaLink="true">https://benroberts.io/enterprise-proxy-configuration-for-azure-function-apps</guid><category><![CDATA[Azure]]></category><dc:creator><![CDATA[Ben Roberts]]></dc:creator><pubDate>Fri, 11 Jul 2025 10:28:57 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1752229619180/bf860032-6330-4636-b50f-6d4c24870e7c.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Configuring outbound connectivity for Azure Function Apps is a critical consideration in enterprise environments, especially when balancing security, compliance, and operational requirements. This post explores how to configure proxy settings for Azure Function Apps using environment variables, when to use a proxy versus Azure Firewall, and how to automate these settings in your CI/CD pipeline.</p>
<p>When I ran into this problem, I found a lot of old and conflicting information online. This post aims to clarify the current best practices for configuring proxies in Azure Function Apps, particularly in enterprise scenarios.</p>
<h2 id="heading-why-proxy-configuration-matters">Why Proxy Configuration Matters</h2>
<p>In large organizations, outbound traffic from cloud workloads often needs to be controlled for security, auditing, and compliance. Proxies and firewalls are the primary tools for this, but knowing when and how to use each is key to a robust architecture.</p>
<h2 id="heading-azure-function-app-networking-overview">Azure Function App Networking Overview</h2>
<p>Azure Function Apps can be integrated with Virtual Networks (VNets), use private endpoints, and route outbound traffic through Azure Firewall or a corporate proxy. Key networking features include:</p>
<ul>
<li><p><strong>VNet Integration</strong>: Allows your function app to access resources in a private network. When you enable VNet integration, you can use user-defined routes (UDRs) to control how outbound traffic is routed—either directly to the internet, through an Azure Firewall, or to a proxy server. This enables granular control over which traffic is inspected, filtered, or bypassed based on your security requirements.</p>
</li>
<li><p><strong>Azure Firewall</strong>: Controls and logs outbound traffic at the network level, providing centralized security and compliance enforcement for all protocols.</p>
</li>
<li><p><strong>Proxies</strong>: Enforce outbound HTTP(S) traffic policies, authentication, and logging. Proxies are typically used for HTTP/S traffic, while firewalls handle all protocols.</p>
</li>
</ul>
<h2 id="heading-proxy-vs-azure-firewall-when-to-use-each">Proxy vs. Azure Firewall: When to Use Each</h2>
<ul>
<li><p><strong>Azure Firewall</strong> is ideal for controlling all outbound traffic, including non-HTTP protocols, and provides centralized logging and threat protection.</p>
</li>
<li><p><strong>Proxy Servers</strong> are best for HTTP(S) traffic, providing granular control, <em>authentication</em>, and content filtering.</p>
</li>
<li><p><strong>Combined Approach</strong>: In many enterprises, HTTP(S) traffic is routed through a proxy, while other protocols go through Azure Firewall. Use route tables (UDRs) and NSGs to direct traffic appropriately.</p>
</li>
</ul>
<p>The exception is when you need to route traffic to Azure services, or Microsoft peered services available via express route. In these cases, you may need to bypass the proxy for specific domains or IP ranges</p>
<h2 id="heading-how-azure-functions-use-proxy-environment-variables">How Azure Functions Use Proxy Environment Variables</h2>
<p>Azure Functions (and App Service) support standard proxy environment variables:</p>
<ul>
<li><p><code>ALL_PROXY</code> or <code>HTTPS_PROXY</code>: URL of the proxy server for all outbound HTTP(S) traffic.</p>
</li>
<li><p><code>NO_PROXY</code>: Comma-separated list of hostnames, domains, or IPs to bypass the proxy (e.g., internal APIs, Azure services).</p>
</li>
</ul>
<p><strong>Example values:</strong></p>
<pre><code class="lang-text">ALL_PROXY: http://proxy.corp.local:8080
NO_PROXY: localhost, .azurewebsites.net, .loganalytics.net, .monitor.azure.com
</code></pre>
<p>Including domains like <code>.azurewebsites.net</code>, <code>.loganalytics.net</code>, and <code>.monitor.azure.com</code> in your <code>NO_PROXY</code> configuration is important because these are core Azure service endpoints that your Function App may need to communicate with directly. For example, <code>.azurewebsites.net</code> is used for platform management, Kudu (deployment), and internal service calls; <code>.loganalytics.net</code> is required for sending telemetry and diagnostic logs to Azure Monitor; and <code>.monitor.azure.com</code> is used for health checks and monitoring integrations. If these domains are routed through a corporate proxy, it can introduce latency, authentication issues, or even block critical platform operations. By explicitly bypassing the proxy for these domains, you ensure reliable connectivity to essential Azure services and avoid disruptions in monitoring, diagnostics, and platform management features. Remembering that NSG and firewall rules should also allow traffic to these domains is crucial.</p>
<blockquote>
<p><strong>Note:</strong> These variables are picked up by most language runtimes and libraries (e.g., .NET, Node.js, Python) automatically. No code changes are required—set them in the app configuration.</p>
</blockquote>
<h2 id="heading-default-network-behavior-and-environment-proxy-variables">Default Network Behavior and Environment Proxy Variables</h2>
<p>By default, outbound network traffic from Azure Function Apps is routed directly to the internet or through any configured virtual network and firewall, depending on your app’s networking setup. When you set environment variables like <code>ALL_PROXY</code> and <code>NO_PROXY</code>, the Azure Functions host and the underlying platform automatically use these values to determine which outbound HTTP(S) requests should be sent through a proxy and which should bypass it. This system-level approach ensures that all supported language runtime and libraries (such as .NET, Node.js, and Python) consistently honor your proxy configuration without requiring any changes to your application code. Using environment variables for proxy settings is preferred over hard-coding proxy logic in your function app because it centralizes control, reduces code complexity, and allows you to update proxy settings without redeploying or modifying your application. This method also aligns with enterprise security and compliance practices by ensuring that proxy policies are enforced uniformly across all workloads.</p>
<h2 id="heading-setting-proxy-variables-in-azure-function-app-configuration">Setting Proxy Variables in Azure Function App Configuration</h2>
<ol>
<li><p>Go to your Function App in the Azure Portal.</p>
</li>
<li><p>Navigate to <strong>Configuration</strong> &gt; <strong>Application settings</strong>.</p>
</li>
<li><p>Add <code>ALL_PROXY</code> and <code>NO_PROXY</code> as new application settings.</p>
</li>
<li><p>Save and restart the app.</p>
</li>
</ol>
<p>This ensures all outbound HTTP(S) traffic from your function app uses the proxy, except for destinations listed in <code>NO_PROXY</code>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1752229673629/f42dfdf0-7643-4e90-ac53-b434fe16edd7.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-example-enterprise-scenario">Example: Enterprise Scenario</h2>
<p>Suppose your organization requires all internet-bound HTTP(S) traffic to go through a proxy, but traffic to Azure Storage and internal APIs should bypass the proxy and go through Azure Firewall. You would:</p>
<ul>
<li><p>Set <code>ALL_PROXY</code> to your corporate proxy URL.</p>
</li>
<li><p>Set <code>NO_PROXY</code> to include Azure service domains and internal address ranges.</p>
</li>
<li><p>Use VNet integration and route tables to ensure non-HTTP(S) traffic is inspected by Azure Firewall.</p>
</li>
</ul>
<h2 id="heading-cicd-setting-proxy-variables-in-azure-devops-pipeline">CI/CD: Setting Proxy Variables in Azure DevOps Pipeline</h2>
<p>To automate proxy configuration during deployment (e.g., zipdeploy), add the following to your Azure DevOps YAML pipeline:</p>
<pre><code class="lang-yaml"><span class="hljs-bullet">-</span> <span class="hljs-attr">task:</span> <span class="hljs-string">AzureFunctionApp@1</span>
  <span class="hljs-attr">displayName:</span> <span class="hljs-string">'Deploy Function App'</span>
  <span class="hljs-attr">inputs:</span>
    <span class="hljs-attr">azureSubscription:</span> <span class="hljs-string">'$(azureSubscription)'</span>
    <span class="hljs-attr">appType:</span> <span class="hljs-string">functionApp</span>
    <span class="hljs-attr">appName:</span> <span class="hljs-string">'$(functionAppName)'</span>
    <span class="hljs-attr">package:</span> <span class="hljs-string">'$(System.DefaultWorkingDirectory)/drop/yourapp.zip'</span>
    <span class="hljs-attr">appSettings:</span> <span class="hljs-string">-ALL_PROXY</span> <span class="hljs-string">"http://proxy.corp.local:8080"</span> <span class="hljs-string">-NO_PROXY</span> <span class="hljs-string">"localhost, .azurewebsites.net, .loganalytics.net"</span>
</code></pre>
<p>This YAML snippet sets the proxy environment variables as application settings during the deployment process, ensuring they are applied without needing to modify your function app code.</p>
<p>If your proxy requirements change after deployment, you can update these environment variables directly in the Azure Portal or automate the update using Azure CLI or ARM/Bicep templates as part of your release process. This flexibility allows you to adapt to evolving network and security requirements without redeploying your application code.</p>
<h2 id="heading-troubleshooting-and-best-practices">Troubleshooting and Best Practices</h2>
<ul>
<li><p><strong>Test connectivity</strong>: Use the Kudu console or diagnostic tools to verify outbound connectivity and proxy usage.</p>
</li>
<li><p><strong>Monitor logs</strong>: Ensure your proxy and firewall logs are capturing expected traffic.</p>
</li>
<li><p><strong>Least privilege</strong>: Only allow required destinations in <code>NO_PROXY</code>.</p>
</li>
<li><p><strong>No code changes</strong>: Always prefer environment variable configuration over hardcoding proxy settings in code.</p>
</li>
<li><p><strong>Document and review exceptions</strong>: Maintain documentation of all domains and IPs included in your <code>NO_PROXY</code> configuration, and review them regularly to ensure they are still required and up to date with Microsoft’s published service tags.</p>
</li>
</ul>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Configuring proxy settings for Azure Function Apps in an enterprise environment is straightforward and robust when using environment variables and infrastructure-as-code. By combining proxy and firewall controls, you can meet security and compliance requirements without sacrificing developer productivity.</p>
<hr />
<p><em>For more tips to manage your function app check out the microsoft documentation at</em> <a target="_blank" href="https://learn.microsoft.com/en-us/azure/azure-functions/functions-how-to-use-azure-function-app-settings?tabs=azure-portal%2Cto-premium"><em>learn.microsoft.com</em></a><em>.</em></p>
]]></content:encoded></item><item><title><![CDATA[Building PowerShell GUI Applications with Windows Presentation Foundation (WPF)]]></title><description><![CDATA[As PowerShell continues to evolve as a versatile automation tool, many administrators and developers are discovering the power of combining it with Windows Presentation Foundation (WPF) to create sophisticated graphical user interfaces. In this post,...]]></description><link>https://benroberts.io/building-powershell-gui-applications-with-windows-presentation-foundation-wpf</link><guid isPermaLink="true">https://benroberts.io/building-powershell-gui-applications-with-windows-presentation-foundation-wpf</guid><category><![CDATA[Azure]]></category><category><![CDATA[Powershell]]></category><dc:creator><![CDATA[Ben Roberts]]></dc:creator><pubDate>Fri, 28 Mar 2025 22:28:14 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1743199995345/3524ddd6-88f3-43bb-8c1d-8c49ab6fef8c.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>As PowerShell continues to evolve as a versatile automation tool, many administrators and developers are discovering the power of combining it with Windows Presentation Foundation (WPF) to create sophisticated graphical user interfaces. In this post, I'll walk through a real-world example of building a multi-input validation form with PowerShell and WPF.</p>
<h3 id="heading-why-powershell-wpf">Why PowerShell + WPF?</h3>
<p>While PowerShell is phenomenal for automation and scripting, sometimes a command-line interface isn't the most user-friendly option, particularly for less technical users. WPF provides a modern, flexible UI framework that can be seamlessly integrated with PowerShell scripts, giving you the best of both worlds:</p>
<ul>
<li>Rich, responsive user interfaces</li>
<li>Access to PowerShell's powerful automation capabilities</li>
<li>Professional-looking applications without needing to learn C# or Visual Studio</li>
</ul>
<h3 id="heading-the-sample-application">The Sample Application</h3>
<p>Our demonstration application is a service desk utility that validates and submits user inputs to an Azure Function using a secret provided by the user. The app includes:</p>
<ul>
<li>Multiple input fields with different validation patterns</li>
<li>Authentication</li>
<li>Responsive UI feedback with a progress bar</li>
<li>Error handling and validation messaging</li>
</ul>
<p>Let's break down the key components.</p>
<h3 id="heading-setting-up-the-wpf-environment">Setting Up the WPF Environment</h3>
<p>First, we need to load the necessary <a target="_blank" href="https://learn.microsoft.com/en-us/dotnet/desktop/wpf/overview/?view=netdesktop-9.0">WPF assembly</a>:</p>
<pre><code class="lang-powershell"><span class="hljs-comment"># Load required WPF assembly</span>
<span class="hljs-built_in">Add-Type</span> <span class="hljs-literal">-AssemblyName</span> PresentationFramework
</code></pre>
<h3 id="heading-defining-the-ui-with-xaml">Defining the UI with XAML</h3>
<p>WPF uses XAML (eXtensible Application Markup Language) to define user interfaces. In PowerShell, we can define our XAML as a here-string:</p>
<pre><code class="lang-powershell">[<span class="hljs-built_in">xml</span>]<span class="hljs-variable">$xaml</span> = <span class="hljs-string">@"
&lt;Window xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        Title="Multi-Input - Authentication Demo" Height="500" Width="500"&gt;
    &lt;!-- Window contents defined here --&gt;
&lt;/Window&gt;
"@</span>
</code></pre>
<p>The XAML defines:</p>
<ol>
<li>A window with grid layout</li>
<li>Text input fields for various data entries</li>
<li>A password box for secure input</li>
<li>Buttons for submitting data and quitting the application</li>
<li>A progress bar for visual feedback</li>
<li>A result area to display messages</li>
</ol>
<h3 id="heading-loading-the-xaml-and-accessing-ui-elements">Loading the XAML and Accessing UI Elements</h3>
<p>After defining the XAML, we need to transform it into a live WPF window:</p>
<pre><code class="lang-powershell"><span class="hljs-variable">$reader</span> = (<span class="hljs-built_in">New-Object</span> System.Xml.XmlNodeReader <span class="hljs-variable">$xaml</span>)
<span class="hljs-variable">$window</span> = [<span class="hljs-type">Windows.Markup.XamlReader</span>]::Load(<span class="hljs-variable">$reader</span>)

<span class="hljs-comment"># Retrieve controls by their names.</span>
<span class="hljs-variable">$Input1</span> = <span class="hljs-variable">$window</span>.FindName(<span class="hljs-string">"Input1"</span>)
<span class="hljs-variable">$Input2</span> = <span class="hljs-variable">$window</span>.FindName(<span class="hljs-string">"Input2"</span>)
<span class="hljs-variable">$Input3</span> = <span class="hljs-variable">$window</span>.FindName(<span class="hljs-string">"Input3"</span>)
<span class="hljs-variable">$Input4</span> = <span class="hljs-variable">$window</span>.FindName(<span class="hljs-string">"Input4"</span>)
<span class="hljs-variable">$PasswordInput</span> = <span class="hljs-variable">$window</span>.FindName(<span class="hljs-string">"PasswordInput"</span>) <span class="hljs-comment"># PasswordBox control</span>
<span class="hljs-variable">$SubmitButton</span> = <span class="hljs-variable">$window</span>.FindName(<span class="hljs-string">"SubmitButton"</span>)
<span class="hljs-variable">$QuitButton</span> = <span class="hljs-variable">$window</span>.FindName(<span class="hljs-string">"QuitButton"</span>)
<span class="hljs-variable">$ProgressBar</span> = <span class="hljs-variable">$window</span>.FindName(<span class="hljs-string">"ProgressBar"</span>)
<span class="hljs-variable">$ResultTextBlock</span> = <span class="hljs-variable">$window</span>.FindName(<span class="hljs-string">"ResultTextBlock"</span>)
</code></pre>
<p>The <code>FindName</code> method allows us to retrieve UI elements by their defined names in the XAML.</p>
<h3 id="heading-input-validation-with-regular-expressions">Input Validation with Regular Expressions</h3>
<p>A highlight of our application is the ability to validate different inputs using specific regex patterns:</p>
<pre><code class="lang-powershell"><span class="hljs-comment"># Define regex patterns for validation</span>
<span class="hljs-variable">$regexPatternCase</span> = <span class="hljs-string">'^(RITM|INC)\d{7}$'</span>  <span class="hljs-comment"># ServiceNow case format</span>
<span class="hljs-variable">$regexPatternEmail</span> = <span class="hljs-string">'^[A-Za-z]+\.[A-Za-z]+@[A-Za-z]+\.com\.au$'</span>  <span class="hljs-comment"># Email format</span>
</code></pre>
<p>The application uses these patterns to validate that:</p>
<ul>
<li>Input conforms to a ServiceNow case ID format (RITM or INC followed by 7 digits)</li>
<li>Inputs follow an email format (firstname.lastname@domain.com.au)</li>
<li>Other input validation like match a user ID, sAMAccountName or Department.</li>
</ul>
<h3 id="heading-event-handling-in-powershell-wpf">Event Handling in PowerShell WPF</h3>
<p>WPF applications are event-driven. We add event handlers using PowerShell script blocks:</p>
<pre><code class="lang-powershell"><span class="hljs-variable">$SubmitButton</span>.Add_Click({
    <span class="hljs-comment"># Button click handling code</span>
})

<span class="hljs-variable">$QuitButton</span>.Add_Click({
    <span class="hljs-variable">$window</span>.Close()
})
</code></pre>
<p>The Submit button's event handler contains the core logic of our application, while the Quit button simply closes the window.</p>
<h3 id="heading-security-considerations-handling-passwords">Security Considerations: Handling Passwords</h3>
<p>For secure input, we use the WPF PasswordBox control:</p>
<pre><code class="lang-powershell">&lt;PasswordBox Name=<span class="hljs-string">"PasswordInput"</span> Width=<span class="hljs-string">"330"</span> Height=<span class="hljs-string">"25"</span>/&gt;
</code></pre>
<p>This displays input as asterisks, and doesn't hard code the secret.</p>
<p>This helps protect sensitive information in memory but ideally this would be handled using machine identities. In my use case I didn't have programmatic access to a secret store to retrieve the secrets, but if you In an Azure environment or have Arc enables machines this could be substituted.</p>
<h3 id="heading-advanced-input-validation-with-powershell">Advanced Input Validation with PowerShell</h3>
<p>Our application takes a sophisticated approach to validation by defining a collection of input definitions:</p>
<pre><code class="lang-powershell"><span class="hljs-variable">$inputs</span> = <span class="hljs-selector-tag">@</span>(
    <span class="hljs-selector-tag">@</span>{ Control = <span class="hljs-variable">$Input1</span>; Label = <span class="hljs-string">"Input 1 (Case)"</span>; Pattern = <span class="hljs-variable">$regexPatternCase</span>; GetValue = { <span class="hljs-keyword">param</span>(<span class="hljs-variable">$ctrl</span>) <span class="hljs-variable">$ctrl</span>.Text.Trim() } },
    <span class="hljs-selector-tag">@</span>{ Control = <span class="hljs-variable">$Input2</span>; Label = <span class="hljs-string">"Input 2"</span>; Pattern = <span class="hljs-variable">$regexPatternEmail</span>; GetValue = { <span class="hljs-keyword">param</span>(<span class="hljs-variable">$ctrl</span>) <span class="hljs-variable">$ctrl</span>.Text.Trim() } },
    <span class="hljs-selector-tag">@</span>{ Control = <span class="hljs-variable">$Input3</span>; Label = <span class="hljs-string">"Input 3"</span>; Pattern = <span class="hljs-variable">$regexPatternEmail</span>; GetValue = { <span class="hljs-keyword">param</span>(<span class="hljs-variable">$ctrl</span>) <span class="hljs-variable">$ctrl</span>.Text.Trim() } },
    <span class="hljs-selector-tag">@</span>{ Control = <span class="hljs-variable">$Input4</span>; Label = <span class="hljs-string">"Input 4"</span>; Pattern = <span class="hljs-variable">$regexPatternEmail</span>; GetValue = { <span class="hljs-keyword">param</span>(<span class="hljs-variable">$ctrl</span>) <span class="hljs-variable">$ctrl</span>.Text.Trim() } },
    <span class="hljs-selector-tag">@</span>{ Control = <span class="hljs-variable">$window</span>.FindName(<span class="hljs-string">"PasswordInput"</span>); Label = <span class="hljs-string">"Client Secret"</span>; Pattern = <span class="hljs-variable">$null</span>; GetValue = { <span class="hljs-keyword">param</span>(<span class="hljs-variable">$ctrl</span>) <span class="hljs-variable">$ctrl</span>.Password.Trim() } }
)

<span class="hljs-comment"># Initialize a variable to store validation failures.</span>
<span class="hljs-variable">$validationFailures</span> = <span class="hljs-selector-tag">@</span>()

<span class="hljs-comment"># Validate each input in the list.</span>
<span class="hljs-keyword">foreach</span> (<span class="hljs-variable">$i</span> <span class="hljs-keyword">in</span> <span class="hljs-variable">$inputs</span>) {
    <span class="hljs-variable">$value</span> = &amp; <span class="hljs-variable">$i</span>.GetValue <span class="hljs-variable">$i</span>.Control

    <span class="hljs-keyword">if</span> ([<span class="hljs-built_in">string</span>]::IsNullOrWhiteSpace(<span class="hljs-variable">$value</span>)) {
        <span class="hljs-variable">$validationFailures</span> += <span class="hljs-string">"<span class="hljs-variable">$</span>(<span class="hljs-variable">$i</span>.Label) failure: No value provided."</span>
    }
    <span class="hljs-keyword">elseif</span> (<span class="hljs-variable">$i</span>.Pattern <span class="hljs-operator">-and</span> <span class="hljs-operator">-not</span> [<span class="hljs-type">System.Text.RegularExpressions.Regex</span>]::IsMatch(<span class="hljs-variable">$value</span>, <span class="hljs-variable">$i</span>.Pattern)) {
        <span class="hljs-variable">$validationFailures</span> += <span class="hljs-string">"<span class="hljs-variable">$</span>(<span class="hljs-variable">$i</span>.Label) failure: '<span class="hljs-variable">$value</span>' does not match the required pattern."</span>
    }
}
<span class="hljs-comment"># Additionally, validate the password field.</span>
<span class="hljs-variable">$passwordValue</span> = <span class="hljs-variable">$PasswordInput</span>.Password.Trim()
<span class="hljs-keyword">if</span> ([<span class="hljs-built_in">string</span>]::IsNullOrWhiteSpace(<span class="hljs-variable">$passwordValue</span>)) {
    <span class="hljs-variable">$validationFailures</span> += <span class="hljs-string">"Client Secret failure: No value provided."</span>
}

<span class="hljs-comment"># If there are any validation failures, output them and re-enable the button.</span>
<span class="hljs-keyword">if</span> (<span class="hljs-variable">$validationFailures</span>.Count <span class="hljs-operator">-gt</span> <span class="hljs-number">0</span>) {
    <span class="hljs-variable">$ResultTextBlock</span>.Text = (<span class="hljs-variable">$validationFailures</span> <span class="hljs-operator">-join</span> <span class="hljs-string">"`n"</span>)
    <span class="hljs-variable">$ResultTextBlock</span>.Foreground = [<span class="hljs-type">System.Windows.Media.Brushes</span>]::Red
    <span class="hljs-variable">$SubmitButton</span>.IsEnabled = <span class="hljs-variable">$true</span>
    <span class="hljs-keyword">return</span>
}
</code></pre>
<p>This approach allows for:</p>
<ul>
<li>Centralized validation logic</li>
<li>Different validation rules for each input</li>
<li>Specialized value extraction (notice how password values are handled differently)</li>
<li>Captures null values</li>
<li>Clear, descriptive error messaging</li>
</ul>
<h3 id="heading-keeping-the-ui-responsive">Keeping the UI Responsive</h3>
<p>One common challenge in GUI applications is keeping the interface responsive during long-running operations. Our example addresses this using the Dispatcher:</p>
<pre><code class="lang-powershell"><span class="hljs-variable">$window</span>.Dispatcher.Invoke([<span class="hljs-type">Action</span>] {}, [<span class="hljs-type">System.Windows.Threading.DispatcherPriority</span>]::Render)
</code></pre>
<p>This forces the UI to refresh after updating progress indicators, ensuring users see the current state of the application.</p>
<h3 id="heading-making-http-requests-from-powershell-wpf">Making HTTP Requests from PowerShell WPF</h3>
<p>The following code demonstrates making HTTP requests to an API endpoint using Bearer token authentication.</p>
<p>For context, this use case was the allow users to manually create an account which was only accessible from an API hosted in Azure.</p>
<p>We're using a service principle with App Role authorization to the API to authenticate. If you'd like more information of Entra Application authentication and bearer tokens I cover it <a target="_blank" href="https://benroberts.io/api-authentication-using-azure-managed-identity">here</a>.</p>
<pre><code class="lang-powershell"><span class="hljs-variable">$tenant</span> = <span class="hljs-string">"YOUR_TENANT_ID"</span>
<span class="hljs-variable">$appId</span> = <span class="hljs-string">"YOUR_SERVICE_PRINCIPAL_APP_ID"</span>
<span class="hljs-variable">$secret</span> = <span class="hljs-variable">$passwordValue</span>
<span class="hljs-variable">$scope</span> = <span class="hljs-string">"api://your-api-client-id/.default"</span>

<span class="hljs-comment"># Construct the token endpoint URL</span>
<span class="hljs-variable">$tokenEndpoint</span> = <span class="hljs-string">"https://login.microsoftonline.com/<span class="hljs-variable">$tenant</span>/oauth2/v2.0/token"</span>

<span class="hljs-comment"># Define the body for the token request</span>
<span class="hljs-variable">$body</span> = <span class="hljs-selector-tag">@</span>{
    client_id     = <span class="hljs-variable">$appId</span>
    client_secret = <span class="hljs-variable">$secret</span>
    scope         = <span class="hljs-variable">$scope</span>
    grant_type    = <span class="hljs-string">"client_credentials"</span>
}

<span class="hljs-comment"># Request the token</span>
<span class="hljs-variable">$response</span> = <span class="hljs-built_in">Invoke-RestMethod</span> <span class="hljs-literal">-Method</span> Post <span class="hljs-literal">-Uri</span> <span class="hljs-variable">$tokenEndpoint</span> <span class="hljs-literal">-Body</span> <span class="hljs-variable">$body</span>
<span class="hljs-variable">$accessToken</span> = <span class="hljs-variable">$response</span>.access_token

<span class="hljs-comment"># Use the access token to call an API</span>
<span class="hljs-variable">$headers</span> = <span class="hljs-selector-tag">@</span>{
    Authorization = <span class="hljs-string">"Bearer <span class="hljs-variable">$accessToken</span>"</span>
}

<span class="hljs-comment"># Example: Call your Azure Function App (update the URL accordingly).</span>
<span class="hljs-variable">$Uri</span> = <span class="hljs-string">"https://&lt;your-wep-app-name&gt;.azurewebsites.net/api/CreateAccount"</span>

<span class="hljs-comment"># Build a payload including all inputs.</span>
<span class="hljs-variable">$payload</span> = <span class="hljs-selector-tag">@</span>{
  input1 = <span class="hljs-variable">$Input1</span>.Text.Trim()
  input2 = <span class="hljs-variable">$Input2</span>.Text.Trim()
  input3 = <span class="hljs-variable">$Input3</span>.Text.Trim()
  input4 = <span class="hljs-variable">$Input4</span>.Text.Trim()
} | <span class="hljs-built_in">ConvertTo-Json</span>

<span class="hljs-built_in">Invoke-WebRequest</span> <span class="hljs-literal">-Method</span> POST <span class="hljs-literal">-Uri</span> <span class="hljs-variable">$Uri</span> <span class="hljs-literal">-Headers</span> <span class="hljs-variable">$headers</span> <span class="hljs-literal">-ContentType</span> <span class="hljs-string">'application/json'</span> <span class="hljs-literal">-Body</span> <span class="hljs-variable">$payload</span> <span class="hljs-literal">-UseBasicParsing</span>
</code></pre>
<p>This illustrates how PowerShell GUI applications can interact with external services while maintaining a user-friendly interface.</p>
<h3 id="heading-conclusion">Conclusion</h3>
<p>This PowerShell WPF example demonstrates how administrators and developers can create sophisticated, user-friendly interfaces for their automation tools. By combining the flexibility of PowerShell with the rich UI capabilities of WPF, we can build professional applications that:</p>
<ol>
<li>Validate user input with clear error messages</li>
<li>Provide visual feedback on progress</li>
<li>Handle passwords securely</li>
<li>Interact with external services</li>
<li>Present complex functionality through a simple interface
The full code example includes additional features like preventing multiple submissions and properly reporting errors to users.</li>
</ol>
<p>Whether you're building tools for service desk staff, administrators, or end users, PowerShell and WPF together provide a powerful platform for creating maintainable, effective GUI applications without leaving the PowerShell ecosystem.</p>
<p>You can find the code in full on <a target="_blank" href="https://github.com/broberts23/blogs/blob/main/wpfDemoUI.ps1">GitHub</a></p>
<p>Thanks for reading! 🚀</p>
]]></content:encoded></item><item><title><![CDATA[API Authentication using Azure Managed Identity]]></title><description><![CDATA[Authentication to Azure APIs has traditionally relied on service principals with client secrets or certificates. While effective, managing these credentials introduces operational complexity and security risks. Azure Managed Identity offers a compell...]]></description><link>https://benroberts.io/api-authentication-using-azure-managed-identity</link><guid isPermaLink="true">https://benroberts.io/api-authentication-using-azure-managed-identity</guid><category><![CDATA[Azure]]></category><category><![CDATA[automation]]></category><category><![CDATA[authentication]]></category><dc:creator><![CDATA[Ben Roberts]]></dc:creator><pubDate>Mon, 03 Mar 2025 20:53:17 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1740995314095/4a3e820e-e421-4b88-8c21-4bccca41a610.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Authentication to Azure APIs has traditionally relied on service principals with client secrets or certificates. While effective, managing these credentials introduces operational complexity and security risks. Azure Managed Identity offers a compelling alternative by eliminating the need to manage credentials entirely. I recently had the opportunity to work with Managed Identity tokens and found some nuances in token acquisition and usage that are important to understand for successful implementation.</p>
<h2 id="heading-the-challenge-with-service-principal-authentication">The Challenge with Service Principal Authentication</h2>
<p>When authenticating to Azure APIs using service principals, you typically need to:</p>
<p>1. Create an app registration in Entra<br />2. Generate and securely store a client secret<br />3. Request tokens using that secret</p>
<p>This approach works but has several drawbacks:</p>
<p>- Secrets require secure storage, rotation and governance<br />- Leaked secrets pose significant security risks<br />- Managing secret lifetimes adds operational overhead</p>
<p>Here's what a typical authentication flow using service principal credentials looks like:</p>
<pre><code class="lang-powershell"><span class="hljs-comment"># Define your tenant, client ID and client secret</span>
<span class="hljs-variable">$tenant</span> = <span class="hljs-string">"YOUR_TENANT_ID"</span>
<span class="hljs-variable">$appId</span> = <span class="hljs-string">"YOUR_SERVICE_PRINCIPAL_APP_ID"</span>
<span class="hljs-variable">$secret</span> = <span class="hljs-string">"YOUR_SERVICE_PRINCIPAL_SECRET"</span>
<span class="hljs-variable">$scope</span> = <span class="hljs-string">"api://your-api-client-id/.default"</span>

<span class="hljs-comment"># Construct the token endpoint URL</span>
<span class="hljs-variable">$tokenEndpoint</span> = <span class="hljs-string">"https://login.microsoftonline.com/<span class="hljs-variable">$tenant</span>/oauth2/v2.0/token"</span>

<span class="hljs-comment"># Define the body for the token request</span>
<span class="hljs-variable">$body</span> = <span class="hljs-selector-tag">@</span>{
    client_id     = <span class="hljs-variable">$appId</span>
    client_secret = <span class="hljs-variable">$secret</span>
    scope         = <span class="hljs-variable">$scope</span>
    grant_type    = <span class="hljs-string">"client_credentials"</span>
}

<span class="hljs-comment"># Request the token</span>
<span class="hljs-variable">$response</span> = <span class="hljs-built_in">Invoke-RestMethod</span> <span class="hljs-literal">-Method</span> Post <span class="hljs-literal">-Uri</span> <span class="hljs-variable">$tokenEndpoint</span> <span class="hljs-literal">-Body</span> <span class="hljs-variable">$body</span>
<span class="hljs-variable">$accessToken</span> = <span class="hljs-variable">$response</span>.access_token

<span class="hljs-comment"># Use the access token to call an API</span>
<span class="hljs-variable">$headers</span> = <span class="hljs-selector-tag">@</span>{
    Authorization = <span class="hljs-string">"Bearer <span class="hljs-variable">$accessToken</span>"</span>
}

<span class="hljs-built_in">Invoke-WebRequest</span> <span class="hljs-literal">-Method</span> POST <span class="hljs-literal">-Uri</span> <span class="hljs-string">"https://your-api-endpoint"</span> <span class="hljs-literal">-Headers</span> <span class="hljs-variable">$headers</span> <span class="hljs-literal">-ContentType</span> <span class="hljs-string">'application/json'</span>
</code></pre>
<h2 id="heading-introducing-azure-managed-identity">Introducing Azure Managed Identity</h2>
<p>Managed Identity provides Azure resources with an automatically managed identity in Entra ID. This eliminates credential management completely, as Azure handles everything behind the scenes.</p>
<h3 id="heading-benefits-of-managed-identity">Benefits of Managed Identity</h3>
<ul>
<li><p><em>No credential management</em>: Azure creates and rotates credentials automatically</p>
</li>
<li><p><em>Enhanced security</em>: No secrets to store in code or configuration</p>
</li>
<li><p><em>Simplified operations</em>: No rotation schedules to maintain</p>
</li>
<li><p><em>Seamless integration</em>: Works natively with Azure services</p>
</li>
</ul>
<h2 id="heading-how-to-use-managed-identity-for-api-authentication">How to Use Managed Identity for API Authentication</h2>
<h3 id="heading-step-1-enable-managed-identity-on-your-azure-resource">Step 1: Enable Managed Identity on Your Azure Resource</h3>
<p>First, enable a system-assigned managed identity on your Azure VM, App Service, or other supported resource through the Azure Portal or via PowerShell/CLI.</p>
<h3 id="heading-step-2-grant-the-managed-identity-access-to-your-application">Step 2: Grant the Managed Identity Access to Your Application</h3>
<p>The managed identity operates as a service principal in Entra ID. You'll need to assign it appropriate permissions after identifying your Service Principals by Display Name or Application ID:</p>
<pre><code class="lang-powershell"><span class="hljs-comment"># Required Modules</span>
<span class="hljs-built_in">Import-Module</span> Az.Accounts
<span class="hljs-built_in">Import-Module</span> Az.Resources

<span class="hljs-comment"># Connect to Azure AD</span>
<span class="hljs-built_in">Connect-AzAccount</span>

<span class="hljs-comment"># Get the managed identity's service principal</span>
<span class="hljs-variable">$managedIdentitySP</span> = <span class="hljs-built_in">Get-AzADServicePrincipal</span> <span class="hljs-literal">-DisplayName</span> <span class="hljs-string">"Your-VM-DisplayName"</span>

<span class="hljs-comment"># Get your API's service principal</span>
<span class="hljs-variable">$apiSP</span> = <span class="hljs-built_in">Get-AzADServicePrincipal</span> <span class="hljs-literal">-ApplicationId</span> <span class="hljs-string">"YOUR_API_APP_ID"</span>

<span class="hljs-comment"># Get your Application Role ID</span>
<span class="hljs-variable">$appRoleId</span> = (<span class="hljs-variable">$apiSP</span>.AppRoles | <span class="hljs-built_in">Where-Object</span> {<span class="hljs-variable">$_</span>.DisplayName <span class="hljs-operator">-eq</span> <span class="hljs-string">"YourRequiredRole"</span>}).Id

<span class="hljs-comment"># Assign the role to the managed identity</span>
<span class="hljs-built_in">New-AzADServicePrincipalAppRoleAssignment</span> `
    <span class="hljs-literal">-ServicePrincipalId</span> <span class="hljs-variable">$managedIdentitySP</span>.Id `
    <span class="hljs-literal">-ResourceId</span> <span class="hljs-variable">$apiSP</span>.Id `
    <span class="hljs-literal">-AppRoleId</span> <span class="hljs-variable">$appRoleId</span>
</code></pre>
<p>The assignment will be visible in Users and Group of the Enterprise Application</p>
<h3 id="heading-step-31-request-a-token-using-managed-identity-virtual-machine">Step 3.1: Request a Token Using Managed Identity (Virtual Machine)</h3>
<p>From within an Azure resource with managed identity enabled (like a VM), you can request a token using the local IMDS endpoint. The Instance Metadata Service (IMDS) provides a simple HTTP interface for obtaining tokens. </p>
<pre><code class="lang-powershell"><span class="hljs-comment"># Define your target resource/audience</span>
<span class="hljs-variable">$resource</span> = <span class="hljs-string">"api://your-api-client-id"</span>
<span class="hljs-variable">$apiVersion</span> = <span class="hljs-string">"2018-02-01"</span>


<span class="hljs-comment"># Construct the token request URL</span>
<span class="hljs-variable">$tokenUrl</span> = <span class="hljs-string">"http://169.254.169.254/metadata/identity/oauth2/token?api-version=<span class="hljs-variable">$apiVersion</span>&amp;resource=<span class="hljs-variable">$</span>([uri]::EscapeDataString(<span class="hljs-variable">$resource</span>))"</span>

<span class="hljs-comment"># Request the token</span>
<span class="hljs-variable">$response</span> = <span class="hljs-built_in">Invoke-RestMethod</span> <span class="hljs-literal">-Method</span> GET <span class="hljs-literal">-Uri</span> <span class="hljs-variable">$tokenUrl</span> <span class="hljs-literal">-Headers</span> <span class="hljs-selector-tag">@</span>{Metadata=<span class="hljs-string">"true"</span>}
<span class="hljs-variable">$accessToken</span> = <span class="hljs-variable">$response</span>.access_token

<span class="hljs-comment"># Use the token to call your API</span>
<span class="hljs-variable">$headers</span> = <span class="hljs-selector-tag">@</span>{
    Authorization = <span class="hljs-string">"Bearer <span class="hljs-variable">$accessToken</span>"</span>
}

<span class="hljs-built_in">Invoke-WebRequest</span> <span class="hljs-literal">-Method</span> POST <span class="hljs-literal">-Uri</span> <span class="hljs-string">"https://your-service/endpoint"</span> <span class="hljs-literal">-Headers</span> <span class="hljs-variable">$headers</span> <span class="hljs-literal">-ContentType</span> <span class="hljs-string">'application/json'</span>
</code></pre>
<p>You can find example code for different languages, and more information about managed identity access tokens here: <a target="_blank" href="https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/how-to-use-vm-token">https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/how-to-use-vm-token</a></p>
<h3 id="heading-step-32-request-a-token-using-managed-identity-web-and-function-apps">Step 3.2: Request a Token Using Managed Identity (Web and Function Apps)</h3>
<p>For Azure Web Apps and Function Apps, you can use the built-in endpoint to request a token once the managed identity is enabled. This is available through the <code>IDENTITY_ENDPOINT</code> and <code>IDENTITY_HEADER</code> environment variables. These are REST API endpoints that provide a simple way to obtain tokens. e.g. <code>http://127.0.0.1:41854/MSI/token/</code>, IDENTITY_HEADER is the secret used to authenticate the request.</p>
<pre><code class="lang-powershell"><span class="hljs-comment"># Define your target resource/audience</span>
<span class="hljs-variable">$resourceURI</span> = <span class="hljs-string">"api://your-api-client-id"</span>
<span class="hljs-variable">$tokenAuthURI</span> = <span class="hljs-variable">$env:IDENTITY_ENDPOINT</span> + <span class="hljs-string">"?resource=<span class="hljs-variable">$resourceURI</span>&amp;api-version=2019-08-01"</span>
<span class="hljs-variable">$tokenResponse</span> = <span class="hljs-built_in">Invoke-RestMethod</span> <span class="hljs-literal">-Method</span> Get <span class="hljs-literal">-Headers</span> <span class="hljs-selector-tag">@</span>{<span class="hljs-string">"X-IDENTITY-HEADER"</span>=<span class="hljs-string">"<span class="hljs-variable">$env:IDENTITY_HEADER</span>"</span>} <span class="hljs-literal">-Uri</span> <span class="hljs-variable">$tokenAuthURI</span>
<span class="hljs-variable">$accessToken</span> = <span class="hljs-variable">$tokenResponse</span>.access_token

<span class="hljs-comment"># Use the token to call your API</span>
<span class="hljs-variable">$headers</span> = <span class="hljs-selector-tag">@</span>{
    Authorization = <span class="hljs-string">"Bearer <span class="hljs-variable">$</span>(<span class="hljs-variable">$accessToken</span>)"</span>
}

<span class="hljs-built_in">Invoke-WebRequest</span> <span class="hljs-literal">-Method</span> POST <span class="hljs-literal">-Uri</span> <span class="hljs-string">"https://your-service/endpoint"</span> <span class="hljs-literal">-Headers</span> <span class="hljs-variable">$headers</span> <span class="hljs-literal">-ContentType</span> <span class="hljs-string">'application/json'</span>
</code></pre>
<p>If you would like more information on how to use managed identity in Azure Functions, or examples for other languages, you can refer to the <a target="_blank" href="https://learn.microsoft.com/en-us/azure/app-service/overview-managed-identity">official documentation</a></p>
<h2 id="heading-troubleshooting-common-issues">Troubleshooting Common Issues</h2>
<h3 id="heading-401-unauthorized-errors">401 Unauthorized Errors</h3>
<p>If you receive a 401 Unauthorized response, check:</p>
<ul>
<li><p>Is the managed identity properly enabled?</p>
</li>
<li><p>Has the identity been granted appropriate permissions (Roles) on your API?</p>
</li>
<li><p>Are you requesting a token for the correct resource/audience?</p>
</li>
</ul>
<h3 id="heading-415-unsupported-media-type">415 Unsupported Media Type</h3>
<p>This error indicates the API doesn't support the Content-Type of your request. Ensure you're setting the appropriate Content-Type header (e.g., 'application/json') that your API expects.</p>
<h3 id="heading-detailed-error-information">Detailed Error Information</h3>
<p>For more detailed error information, use the -Verbose flag with your requests:</p>
<pre><code class="lang-powershell"><span class="hljs-variable">$webResponse</span> = <span class="hljs-built_in">Invoke-WebRequest</span> <span class="hljs-literal">-Uri</span> <span class="hljs-variable">$apiUrl</span> <span class="hljs-literal">-Headers</span> <span class="hljs-variable">$headers</span> <span class="hljs-literal">-Method</span> Get <span class="hljs-literal">-Verbose</span>
</code></pre>
<p>Or catch exceptions to see detailed error messages:</p>
<pre><code class="lang-powershell"><span class="hljs-keyword">try</span> {
    <span class="hljs-variable">$response</span> = <span class="hljs-built_in">Invoke-RestMethod</span> <span class="hljs-literal">-Uri</span> <span class="hljs-variable">$apiUrl</span> <span class="hljs-literal">-Headers</span> <span class="hljs-variable">$headers</span> <span class="hljs-literal">-Method</span> Get
} <span class="hljs-keyword">catch</span> {
    <span class="hljs-built_in">Write-Host</span> <span class="hljs-string">"Error: <span class="hljs-variable">$_</span>"</span>
    <span class="hljs-built_in">Write-Host</span> <span class="hljs-string">"Status Code: <span class="hljs-variable">$</span>(<span class="hljs-variable">$_</span>.Exception.Response.StatusCode.value__)"</span>
}
</code></pre>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Azure Managed Identity simplifies authentication flows, enhances security posture, and reduces operational overhead by eliminating credential management. By adopting managed identities for your inter-service authentication needs, you can build more secure and maintainable solutions in Azure.</p>
<p>This modern approach to authentication represents a significant improvement over traditional service principal authentication methods, particularly for services running within the Azure ecosystem.</p>
]]></content:encoded></item><item><title><![CDATA[Enhancing Azure Security with Checkov and GitHub Actions]]></title><description><![CDATA[In the world of infrastructure as code (IaC), ensuring that your templates and scripts adhere to best practices for security, compliance, and reliability is paramount. Azure Bicep, a domain-specific language for deploying Azure resources, is no excep...]]></description><link>https://benroberts.io/enhancing-azure-security-with-checkov-and-github-actions</link><guid isPermaLink="true">https://benroberts.io/enhancing-azure-security-with-checkov-and-github-actions</guid><category><![CDATA[Azure]]></category><category><![CDATA[Bicep]]></category><category><![CDATA[blog]]></category><category><![CDATA[DevSecOps]]></category><category><![CDATA[GitHub]]></category><dc:creator><![CDATA[Ben Roberts]]></dc:creator><pubDate>Tue, 07 May 2024 07:20:42 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1727489367057/7fe4de79-1be2-40b3-b6f3-5a566ad95bd3.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the world of infrastructure as code (IaC), ensuring that your templates and scripts adhere to best practices for security, compliance, and reliability is paramount. Azure Bicep, a domain-specific language for deploying Azure resources, is no exception. With the increasing complexity and scale of cloud deployments, the need for automated tools to scan and secure IaC has never been more critical. One such tool that stands out in this space is Checkov.</p>
<h2 id="heading-what-is-checkov">What is Checkov?</h2>
<p>Checkov is an open-source static code analysis tool designed to help developers prevent cloud misconfigurations during the build phase of infrastructure development. Developed by Bridgecrew (<a target="_blank" href="https://www.paloaltonetworks.com/company/press/2021/palo-alto-networks-completes-acquisition-of-bridgecrew#:~:text=Bridgecrew%20co%2Dfounders%2C%20Idan%20Tendler,equity%20awards%20to%20acquire%20Bridgecrew.">aquired by Palo Alto Networks in 2021</a>), a company specializing in cloud security, Checkov is available under the Apache License 2.0, allowing anyone to freely use, modify, and distribute it. It supports scanning a variety of infrastructure as code (IaC) formats including Terraform, CloudFormation, Kubernetes, ARM Templates, and notably, Azure Bicep. A key advantage of Checkov is its native support for Azure Bicep, meaning it can directly scan Bicep files without needing to compile them into JSON format first. This capability enhances the “shift left” approach of DevSecOps by integrating security checks early into the development workflow, thus enabling developers to detect and rectify potential security issues before they become embedded in the deployment process. By using Checkov, teams can significantly reduce vulnerabilities and ensure robust compliance and security practices from the outset of their projects.</p>
<h2 id="heading-benefits-of-using-checkov-with-azure-bicep">Benefits of Using Checkov with Azure Bicep</h2>
<p>Checkov provides several advantages for teams working with Azure Bicep files, enhancing security and compliance throughout the development lifecycle:</p>
<ul>
<li><p><strong>Prevention of Misconfigurations</strong>: By scanning Azure Bicep files, Checkov identifies potential security issues and misconfigurations before they are deployed, helping to prevent costly and risky errors in live environments.</p>
</li>
<li><p><strong>Comprehensive Policy Library</strong>: Checkov includes over 1000 built-in policies that cover a wide range of security best practices and compliance standards across multiple cloud providers, including Azure. This extensive library ensures that your infrastructure meets rigorous security and compliance benchmarks.</p>
</li>
<li><p><strong>Custom Policy Creation</strong>: Users have the flexibility to create custom policies in Checkov. This feature allows organizations to tailor the tool to meet specific security needs and organizational standards, providing a highly adaptable security solution.</p>
</li>
<li><p><strong>Seamless Integration</strong>: Checkov can be easily integrated into CI/CD pipelines, providing continuous security and compliance assessments without disrupting development workflows.</p>
</li>
<li><p><strong>Scanning for CVEs</strong>: Checkov performs Software Composition Analysis (SCA) to scan open source packages and container images for Common Vulnerabilities and Exposures (CVEs), adding an additional layer of security by identifying known vulnerabilities in dependencies.</p>
</li>
</ul>
<p>These features make Checkov an invaluable tool for teams aiming to enhance the security and integrity of their Azure Bicep deployments.</p>
<h2 id="heading-aligning-checkov-with-azure-cloud-security-benchmarks">Aligning Checkov with Azure Cloud Security Benchmarks</h2>
<p>While Checkov provides extensive coverage of security best practices and compliance rules, its direct alignment with the Azure Cloud Security Benchmark might not be <em>explicitly</em> detailed for every rule. However, many of Checkov’s rules are broadly applicable to Azure and adhere to general security best practices that overlap significantly with Azure’s benchmarks. For organizations aiming to specifically meet Azure’s security standards, Checkov allows for the customization and extension of rules (defined in yaml), enabling teams to tailor security checks to the unique requirements of the Azure Cloud Security Benchmark or custom Azure Policy definitions. . This combination ensures a robust compliance and security strategy that is well-aligned with Microsoft’s cloud security benchmark (MCSB).</p>
<h2 id="heading-integrating-checkov-with-github-actions-for-azure-bicep-scanning">Integrating Checkov with GitHub Actions for Azure Bicep Scanning</h2>
<p>Incorporating <a target="_blank" href="https://github.com/marketplace/actions/checkov-github-action">Checkov into your GitHub Actions</a> workflow can significantly enhance the security posture of your Azure Bicep deployments. Checkov, a static code analysis tool (SCAT), scans infrastructure as code (IaC) for misconfigurations and potential security issues. Here’s how to set up Checkov with GitHub Actions for Azure Bicep files.</p>
<p>In this example, the Bicep file creates a storage account with public network access enabled and allows public access to blobs. Running Checkov on this file will flag these configurations as security issues, helping you identify and rectify potential vulnerabilities before deployment.</p>
<h2 id="heading-workflow-setup">Workflow Setup</h2>
<p>Start by defining the workflow in your repository’s <code>.github/workflows</code> folder. Here’s an effective configuration example:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">name:</span> <span class="hljs-string">Azure</span> <span class="hljs-string">Bicep</span> <span class="hljs-string">Template</span> <span class="hljs-string">Analysis</span>

<span class="hljs-attr">on:</span>
  <span class="hljs-attr">pull_request:</span>
    <span class="hljs-attr">branches:</span> [<span class="hljs-string">main</span>]
  <span class="hljs-attr">push:</span>
    <span class="hljs-attr">branches:</span> [<span class="hljs-string">/release/*</span>]

<span class="hljs-attr">jobs:</span>
  <span class="hljs-attr">checkov-scan:</span>
    <span class="hljs-attr">runs-on:</span> <span class="hljs-string">ubuntu-latest</span>
    <span class="hljs-attr">name:</span> <span class="hljs-string">Checkov</span> <span class="hljs-string">Bicep</span> <span class="hljs-string">Analysis</span>
    <span class="hljs-attr">permissions:</span>
      <span class="hljs-attr">contents:</span> <span class="hljs-string">read</span> <span class="hljs-comment"># Allows actions/checkout to fetch the code</span>
      <span class="hljs-attr">security-events:</span> <span class="hljs-string">write</span> <span class="hljs-comment"># Allows github/codeql-action/upload-sarif to upload SARIF results</span>

    <span class="hljs-attr">steps:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Checkout</span> <span class="hljs-string">Repository</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">actions/checkout@v4</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Run</span> <span class="hljs-string">Checkov</span> <span class="hljs-string">Action</span>
        <span class="hljs-attr">id:</span> <span class="hljs-string">checkov</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">bridgecrewio/checkov-action@v12</span>
        <span class="hljs-attr">with:</span>
          <span class="hljs-attr">directory:</span> <span class="hljs-string">templates/</span>
          <span class="hljs-attr">quiet:</span> <span class="hljs-literal">true</span> <span class="hljs-comment"># Display only failed checks</span>
          <span class="hljs-attr">soft_fail:</span> <span class="hljs-literal">true</span> <span class="hljs-comment"># Do not return an error code if there are failed checks</span>
          <span class="hljs-attr">framework:</span> <span class="hljs-string">bicep</span> <span class="hljs-comment"># Specify to run checks only on Azure Bicep templates</span>
          <span class="hljs-attr">output_format:</span> <span class="hljs-string">sarif</span> <span class="hljs-comment"># Set output format to SARIF for integration with GitHub security tab</span>
          <span class="hljs-attr">output_file_path:</span> <span class="hljs-string">results.sarif</span> <span class="hljs-comment"># Specify the file path for the SARIF output</span>
          <span class="hljs-attr">download_external_modules:</span> <span class="hljs-literal">true</span> <span class="hljs-comment"># Enable downloading external modules</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Upload</span> <span class="hljs-string">SARIF</span> <span class="hljs-string">Results</span> <span class="hljs-string">to</span> <span class="hljs-string">GitHub</span>
        <span class="hljs-attr">if:</span> <span class="hljs-string">always()</span> <span class="hljs-comment"># Ensure this step runs regardless of previous steps' success or failure</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">github/codeql-action/upload-sarif@v3</span>
        <span class="hljs-attr">with:</span>
          <span class="hljs-attr">sarif_file:</span> <span class="hljs-string">results.sarif</span>
</code></pre>
<h3 id="heading-benefits-of-using-sarif-output">Benefits of Using SARIF Output</h3>
<ul>
<li><p><strong>SARIF (Static Analysis Results Interchange Format)</strong> is a standard output format that enhances the interoperability between different security tools and services. By configuring Checkov to generate SARIF output, you can leverage GitHub’s native capabilities to display and manage security findings effectively:</p>
</li>
<li><p><strong>Centralized Security Findings</strong>: The SARIF file generated by Checkov is uploaded to GitHub where it integrates seamlessly within th pull request. This centralization makes it easier for teams to review, discuss, and remediate any detected issues directly within their development environment.</p>
</li>
</ul>
<p>Next, define the bicep files you want to scan in the <code>templates/</code> directory. You can adjust the <code>directory</code> parameter in the workflow configuration to match your repository’s structure. Once you’ve set up the workflow, every pull request targeting the <code>main</code> branch or any push to the <code>release/</code>* branches will trigger the Checkov scan for Azure Bicep templates.</p>
<pre><code class="lang-go">param location <span class="hljs-keyword">string</span> = resourceGroup().location

resource storageAccount <span class="hljs-string">'Microsoft.Storage/storageAccounts@2021-06-01'</span> = {
  name: <span class="hljs-string">'examplestorageaccount'</span>
  location: location
  sku: {
    name: <span class="hljs-string">'Standard_LRS'</span>
  }
  kind: <span class="hljs-string">'StorageV2'</span>
  properties: {
    publicNetworkAccess: <span class="hljs-string">'Enabled'</span>
    allowBlobPublicAccess: <span class="hljs-literal">true</span>
  }
}
</code></pre>
<p>After the workflow runs, validate that the Checkov scan results are displayed in the GitHub Security tab. This integration provides a clear overview of the security findings, allowing you to address any issues promptly.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727485608499/9edbbd4e-dfdd-4353-87d4-562a49c3b9e9.png" alt /></p>
<p>The Checkov scan results are also populate in the pull request and highlight the offending resource:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727485610203/4b0185dc-85fe-4b07-8e83-547a9a189378.png" alt /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727485611821/e84a6fd1-e09c-48f8-a8d3-59db6244ce10.png" alt /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727485613797/610e9aa1-e38d-4363-b9b1-835dceb0f630.png" alt /></p>
<h2 id="heading-securing-infrastructure-as-code-with-advanced-compliance-and-privacy-controls">Securing Infrastructure as Code with Advanced Compliance and Privacy Controls</h2>
<p>Enterprises, particularly those operating within highly regulated industries or handling sensitive data, must prioritize robust security measures. The use of Checkov for scanning Azure Bicep files can be a cornerstone of such an approach, ensuring that infrastructure deployments are secure by design. However, for these organizations, using public GitHub Actions might not align with their stringent security protocols due to concerns about data privacy and unauthorized access.</p>
<p>A viable solution for these enterprises is to integrate Checkov within a privately hosted GitHub Actions runner. This setup combines the flexibility and power of GitHub Actions with the security controls that enterprises require. For instance, by using private runners, enterprises can ensure that their codebase and critical infrastructure configurations do not leave their controlled environments. Moreover, using paid solutions like Prisma Cloud Enterprise or Snyk IaC+ can further enhance this setup. Prisma Cloud, a comprehensive cloud security product, incorporates Checkov to provide advanced security scanning, compliance monitoring, and threat detection capabilities tailored to enterprise needs.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Using Checkov with GitHub Actions to scan Azure Bicep files introduces a robust layer of security to your infrastructure deployment practices. By automating the security scanning process, this integration ensures that your cloud resources are deployed not only efficiently but also with a high standard of security. This proactive approach significantly reduces the risk of deploying flawed or vulnerable infrastructure, thereby safeguarding your systems against potential threats.</p>
<p>As cloud environments grow in complexity, the integration of sophisticated tools like Checkov into CI/CD pipelines becomes indispensable. Such tools enable organizations to adhere to best practices effortlessly, ensuring continuous compliance and security. In summary, leveraging Checkov with GitHub Actions empowers organizations to maintain rigorous security standards while fully embracing the benefits of infrastructure as code. This is crucial for building resilient systems that can thrive in today’s dynamic and challenging digital landscape.</p>
]]></content:encoded></item><item><title><![CDATA[Enhancing Workload Resiliency with Traffic Manager and Azure Chaos Studio]]></title><description><![CDATA[Introduction
In this blog , we’ll explore how to improve the resiliency of your workloads using Azure Traffic Manager and Azure Chaos Studio. We’ll delve into the concepts of chaos engineering, Traffic Manager, and Chaos Studio, outlining their funct...]]></description><link>https://benroberts.io/enhancing-workload-resiliency-with-traffic-manager-and-azure-chaos-studio</link><guid isPermaLink="true">https://benroberts.io/enhancing-workload-resiliency-with-traffic-manager-and-azure-chaos-studio</guid><category><![CDATA[Azure]]></category><category><![CDATA[Bicep]]></category><category><![CDATA[blog]]></category><category><![CDATA[GitHub]]></category><dc:creator><![CDATA[Ben Roberts]]></dc:creator><pubDate>Sun, 05 May 2024 06:24:07 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1727490029248/403582d6-f83f-4e38-9456-a5b299c07ea2.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-introduction">Introduction</h1>
<p>In this blog , we’ll explore how to improve the resiliency of your workloads using Azure Traffic Manager and Azure Chaos Studio. We’ll delve into the concepts of chaos engineering, Traffic Manager, and Chaos Studio, outlining their functionalities and how they work together to ensure application robustness. Additionally, we’ll provide a step-by-step guide for deploying Traffic Manager and Chaos Studio to test your workload’s ability to handle failures.</p>
<h1 id="heading-chaos-engineering-embracing-failure-for-success">Chaos Engineering: Embracing Failure for Success</h1>
<p>Chaos engineering is a proactive approach to building reliable systems. It involves intentionally introducing controlled disruptions (faults) to identify weaknesses and potential failure points in your application. By simulating real-world failure scenarios, chaos engineering helps you proactively strengthen your system’s ability to handle unexpected events.</p>
<h1 id="heading-azure-chaos-studio-a-playground-for-controlled-chaos">Azure Chaos Studio: A Playground for Controlled Chaos</h1>
<p>Azure Chaos Studio is a managed service specifically designed for chaos engineering within Azure. It provides a platform to define and execute experiments that introduce faults into your applications and infrastructure. Chaos Studio helps you:</p>
<ul>
<li><p><strong>Identify vulnerabilities</strong>: By simulating failures, you can uncover hidden weaknesses in your system before they cause outages in production.</p>
</li>
<li><p><strong>Validate recovery mechanisms</strong>: Chaos Studio allows you to test your system’s ability to recover from failures, ensuring your redundancy and failover mechanisms function as expected.</p>
</li>
<li><p><strong>Build confidence in your deployments</strong>: By successfully navigating simulated failures, you gain confidence in your application’s ability to withstand real-world disruptions.</p>
</li>
</ul>
<h1 id="heading-azure-traffic-manager-a-balancing-act-for-optimal-performance">Azure Traffic Manager: A Balancing Act for Optimal Performance</h1>
<p>Azure Traffic Manager is a DNS-based traffic routing service that distributes incoming traffic across your applications and services in a geographically optimal manner. It offers various routing methods, allowing you to tailor traffic flow based on your specific needs. Here’s how Traffic Manager contributes to workload resiliency:</p>
<ul>
<li><p><strong>High Availability</strong>: Traffic Manager ensures traffic reaches a healthy endpoint even if one of your application instances becomes unavailable. It automatically routes requests to the remaining healthy instances, minimizing downtime and maintaining service continuity.</p>
</li>
<li><p><strong>Performance Optimization</strong>: Traffic Manager can route users to the closest available endpoint, reducing latency and improving user experience.</p>
</li>
</ul>
<h2 id="heading-the-synergy-of-chaos-studio-and-traffic-manager">The Synergy of Chaos Studio and Traffic Manager</h2>
<p>Chaos Studio and Traffic Manager work hand-in-hand to bolster your workload’s resilience. Chaos Studio helps you identify potential failure points in your application, while Traffic Manager provides a safety net by ensuring traffic continues to flow even during disruptions.</p>
<p>Here’s a scenario: Imagine you have an e-commerce application deployed across two regions. Using Chaos Studio, you can simulate a failure in one region. If you have Traffic Manager configured, it will automatically route traffic to the healthy region, minimizing the impact on your customers. By combining these tools, you can proactively build resilience into your system and ensure it can withstand real-world disruptions.</p>
<h2 id="heading-the-importance-of-site-reliability-engineering-sre-and-observability">The Importance of Site Reliability Engineering (SRE) and Observability</h2>
<p>Site Reliability Engineering (SRE) is a practice focused on building and maintaining highly reliable and scalable systems. Chaos engineering and Traffic Manager are valuable tools within the SRE toolbox. Observability, the ability to monitor and understand system behavior, plays a crucial role in chaos engineering.</p>
<p>By monitoring system behavior during chaos experiments, you can gain valuable insights into how your application handles faults. This information can be used to identify areas for improvement and enhance your system’s overall resilience.</p>
<h1 id="heading-deploying-traffic-manager-and-chaos-studio-a-step-by-step-guide">Deploying Traffic Manager and Chaos Studio: A Step-by-Step Guide</h1>
<p>Let’s walk through the process of deploying Traffic Manager and Chaos Studio to test your workload’s resiliency. As always, we’ll only be using the Portal as a visual aid to validate our progress and change. All steps will be complete using Azure CLI and Bicep templates.</p>
<h3 id="heading-step-1-check-azure-resource-providers">Step 1: Check Azure Resource Providers</h3>
<p>To check the registration status of Choas Studio on your subscripton, run the following command:</p>
<pre><code class="lang-bash">az provider list --query <span class="hljs-string">"[?namespace=='Microsoft.Chaos'].registrationState"</span>
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727485617915/690892c4-392a-44a4-982f-8ae577c6c827.png" alt /></p>
<p>If you need to register the provider, run the following command:</p>
<pre><code class="lang-bash">az provider register --namespace Microsoft.Chaos
</code></pre>
<p>This can take between two to five minutes to complete. Rerun the <code>az provider list</code> query to confirm the registration has completed.</p>
<h3 id="heading-step-2-deploy-azure-web-apps">Step 2: Deploy Azure Web Apps</h3>
<p>We’ll be using Azure Web Apps as our workload for this scenario. Deploy two web apps in different regions to simulate a multi-region deployment. Note: Chaos Studio endpoints aren’t available in all regions, so ensure your web apps are deployed in supported regions <a target="_blank" href="https://azure.microsoft.com/en-au/explore/global-infrastructure/products-by-region/?products=chaos-studio">here</a>.</p>
<p>We’ll be using the following Bicep template to deploy the web apps:</p>
<pre><code class="lang-bash">resource asp <span class="hljs-string">'Microsoft.Web/serverfarms@2023-01-01'</span> = {
  name: aspName
  location: location
  properties: {
    reserved: <span class="hljs-literal">true</span>
  }
  sku: {
    name: <span class="hljs-string">'S1'</span>
    tier: <span class="hljs-string">'Standard'</span>
    size: <span class="hljs-string">'S1'</span>
    family: <span class="hljs-string">'S'</span>
    capacity: 1
  }
  kind: <span class="hljs-string">'linux'</span>
}

resource webapp <span class="hljs-string">'Microsoft.Web/sites@2023-01-01'</span> = {
  name: webAppName
  location: location
  kind: <span class="hljs-string">'linux'</span>
  identity: {
    <span class="hljs-built_in">type</span>: <span class="hljs-string">'UserAssigned'</span>
    userAssignedIdentities: {
      <span class="hljs-string">'${umi.id}'</span>: {}
    }
  }
  properties: {
    enabled: <span class="hljs-literal">true</span>
    // required <span class="hljs-keyword">for</span> the webapp to REALLY be linux. If this is <span class="hljs-built_in">set</span> to <span class="hljs-literal">false</span>
    // the webapp will be Windows, even though the Kind is <span class="hljs-built_in">set</span> to linux.
    reserved: <span class="hljs-literal">true</span>
    serverFarmId: asp.id
    publicNetworkAccess: <span class="hljs-string">'Enabled'</span>
    siteConfig: {
      acrUseManagedIdentityCreds: <span class="hljs-literal">true</span>
      acrUserManagedIdentityID: umi.properties.clientId // not the resource id!
      numberOfWorkers: 1
      linuxFxVersion: <span class="hljs-string">'DOCKER|${acrName}.azurecr.io/${appName}:${appVersion}'</span>
      healthCheckPath: <span class="hljs-string">'/'</span>
      alwaysOn: <span class="hljs-literal">true</span>
    }
  }
}
</code></pre>
<p>The container image is stored in an Azure Container Registry (ACR). Ensure you have the necessary configurations in place to pull the image during deployment. I covered this in a previous blog post, <a target="_blank" href="https://benroberts.io/azure-verified-modules-in-action">Deploying Azure Web Apps with Azure Container Registry</a>.</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://benroberts.io/azure-verified-modules-in-action">https://benroberts.io/azure-verified-modules-in-action</a></div>
<p> </p>
<h3 id="heading-step-3-create-a-chaos-studio-endpoints">Step 3: Create a Chaos Studio Endpoints</h3>
<p>Next, we’ll create Chaos Studio endpoints for our web apps. This will allow Chaos Studio to interact with our web apps during experiments. We’ll use the following Bicep template to create the endpoints:</p>
<pre><code class="lang-abap">resource target_appservice <span class="hljs-string">'Microsoft.Chaos/targets@2024-01-01'</span> = {
  scope: webapp
  name: <span class="hljs-string">'microsoft-appservice'</span>
  location: location
  properties: {}
  dependsOn: []
}

resource microsoft_appservice_Stop_1_0 <span class="hljs-string">'Microsoft.Chaos/targets/capabilities@2024-01-01'</span> = {
  parent: target_appservice
  name: <span class="hljs-string">'Stop-1.0'</span>
}
</code></pre>
<p><code>Stop-1.0</code> is a capability that allows Chaos Studio to stop the web app. You can define additional capabilities based on your requirements. See the full list of capabilities <a target="_blank" href="https://learn.microsoft.com/en-us/azure/chaos-studio/chaos-studio-fault-library">here</a> for more information. Some of the capabilities include simulating high CPU usage, network latency, and disk I/O, just to name a few. Most of capabilities are limited to VM/VMSS (AKS) workloads however.</p>
<p>The <code>scope</code> property in the <code>target_appservice</code> resource should be set to the resource ID of the web app you want to target. This will allow Chaos Studio to interact with the web app during experiments.</p>
<p>Maintaining the Chaos Studio endpoint within the same module as the web app resources simplifies management and enhances code readability. However, you have the option to organize the endpoint in a separate module, should that better suit your architectural preferences.</p>
<h3 id="heading-step-4-role-based-access-control-rbac">Step 4: Role Based Access Control (RBAC)</h3>
<p>To allow Chaos Studio to interact with your web apps, you’ll need to grant the necessary permissions to stop the web app site. This requires the Microsoft.Web/sites/restart/Action, Microsoft.Web/sites/start/Action and Microsoft.Web/sites/Read permissions. Using the principle of least privilege, you can create a custom role with these permissions and assign it to the Chaos Studio service principal. However, for the sake of simplicity, we’ll use the built-in <code>Website Contributor</code> role for this example.</p>
<p>Now that we’ve laid the groundwork for our experiment, let’s move on to the Traffic Manager configuration.</p>
<pre><code class="lang-abap">resource roleAssignment <span class="hljs-string">'Microsoft.Authorization/roleAssignments@2022-04-01'</span> = {
  name: guid(resourceGroup().id, umi.name, <span class="hljs-string">'Website Contributor'</span>)
  scope: webapp
  properties: {
    principalType: <span class="hljs-string">'ServicePrincipal'</span>
    principalId: umi.properties.principalId
    roleDefinitionId: resourceId(<span class="hljs-string">'Microsoft.Authorization/roleDefinitions'</span>, <span class="hljs-string">'de139f84-1756-47ae-9be6-808fbbe84772'</span>)
  }
}
</code></pre>
<h3 id="heading-step-5-deploy-traffic-manager">Step 5: Deploy Traffic Manager</h3>
<p>Deploy Traffic Manager to route traffic to the web apps in different regions. We’ll use the following Bicep template to create the Traffic Manager profile:</p>
<pre><code class="lang-bash">resource webappAUE <span class="hljs-string">'Microsoft.Web/sites@2023-01-01'</span> existing = {
  scope: resourceGroup(aue_rg)
  name: webAppNameAUE
}

resource webappSEA <span class="hljs-string">'Microsoft.Web/sites@2023-01-01'</span> existing = {
  scope: resourceGroup(sea_rg)
  name: webAppNameSEA
}

resource trafficManagerProfiles <span class="hljs-string">'Microsoft.Network/trafficmanagerprofiles@2022-04-01'</span> = {
  name: profilesName
  location: <span class="hljs-string">'global'</span>
  properties: {
    profileStatus: <span class="hljs-string">'Enabled'</span>
    trafficRoutingMethod: <span class="hljs-string">'Priority'</span>
    dnsConfig: {
      relativeName: profilesName
      ttl: 5
    }
    monitorConfig: {
      protocol: <span class="hljs-string">'HTTPS'</span>
      port: 443
      path: <span class="hljs-string">'/'</span>
      intervalInSeconds: 30
      toleratedNumberOfFailures: 3
      timeoutInSeconds: 10
    }
    endpoints: [
      {
        name: <span class="hljs-string">'${webAppNameAUE}-endpoint'</span>
        <span class="hljs-built_in">type</span>: <span class="hljs-string">'Microsoft.Network/trafficManagerProfiles/azureEndpoints'</span>
        properties: {
          endpointStatus: <span class="hljs-string">'Enabled'</span>
          endpointMonitorStatus: <span class="hljs-string">'Online'</span>
          targetResourceId: webappAUE.id
          weight: 1
          priority: 1
          endpointLocation: <span class="hljs-string">'Australia East'</span>
          alwaysServe: <span class="hljs-string">'Disabled'</span>
        }
      }
      {
        name: <span class="hljs-string">'${webAppNameSEA}-endpoint'</span>
        <span class="hljs-built_in">type</span>: <span class="hljs-string">'Microsoft.Network/trafficManagerProfiles/azureEndpoints'</span>
        properties: {
          endpointStatus: <span class="hljs-string">'Enabled'</span>
          endpointMonitorStatus: <span class="hljs-string">'Online'</span>
          targetResourceId: webappSEA.id
          weight: 1
          priority: 2
          endpointLocation: <span class="hljs-string">'Southeast Asia'</span> // Chaos Studio not available <span class="hljs-keyword">in</span> <span class="hljs-string">'Australia SouthEast'</span>
          alwaysServe: <span class="hljs-string">'Disabled'</span>
        }
      }
    ]
    trafficViewEnrollmentStatus: <span class="hljs-string">'Disabled'</span>
  }
}
</code></pre>
<p>In this template, we’re creating a Traffic Manager profile with two endpoints: one for each web app in different regions. The <code>trafficRoutingMethod</code> is set to <code>Priority</code>, meaning traffic will be routed to the endpoint with the highest priority (lowest number) that is healthy. The <code>monitorConfig</code> section defines the health check settings for the endpoints.</p>
<h3 id="heading-step-6-deploy-chaos-studio-experiments">Step 6: Deploy Chaos Studio Experiments</h3>
<p>Now that we have our web apps, Chaos Studio endpoints, and Traffic Manager in place, we can create a Chaos Studio experiment to test the resiliency of our workload. We’ll use the following Bicep template to define the experiment:</p>
<pre><code class="lang-bash">resource umi <span class="hljs-string">'Microsoft.ManagedIdentity/userAssignedIdentities@2023-07-31-preview'</span> existing = {
  name: umiName
}

resource choasStudio <span class="hljs-string">'Microsoft.Chaos/experiments@2024-01-01'</span> = {
  name: <span class="hljs-string">'dev-aue-cs'</span>
  location: location
  identity: {
    <span class="hljs-built_in">type</span>: <span class="hljs-string">'UserAssigned'</span>
    userAssignedIdentities: {
      <span class="hljs-string">'${umi.id}'</span>: {}
    }
  }
  properties: {
    selectors: [
      {
        <span class="hljs-built_in">type</span>: <span class="hljs-string">'List'</span>
        targets: [
          {
            id: extensionResourceId(resourceId(aue_rg,<span class="hljs-string">'Microsoft.Web/sites'</span>,webAppNameAUE),<span class="hljs-string">'Microsoft.Chaos/targets'</span>,<span class="hljs-string">'microsoft-appservice'</span>)
            <span class="hljs-built_in">type</span>: <span class="hljs-string">'ChaosTarget'</span>
          }
          {
            id: extensionResourceId(resourceId(sea_rg,<span class="hljs-string">'Microsoft.Web/sites'</span>,webAppNameSEA),<span class="hljs-string">'Microsoft.Chaos/targets'</span>,<span class="hljs-string">'microsoft-appservice'</span>)
            <span class="hljs-built_in">type</span>: <span class="hljs-string">'ChaosTarget'</span>
          }
        ]
        id: <span class="hljs-string">'ReferenceThisSelector'</span>
      }
    ]
    steps: [
      {
        name: <span class="hljs-string">'Step 1: Failover an App Service web app'</span>
        branches: [
          {
            name: <span class="hljs-string">'Branch 1: Emulate an App Service failure'</span>
            actions: [
              {
                <span class="hljs-built_in">type</span>: <span class="hljs-string">'continuous'</span>
                selectorId: <span class="hljs-string">'ReferenceThisSelector'</span>
                duration: <span class="hljs-string">'PT10M'</span>
                parameters: []
                name: <span class="hljs-string">'urn:csci:microsoft:appService:stop/1.0'</span>
              }
            ]
          }
        ]
      }
    ]
  }
}
</code></pre>
<p>In the experiment, we’re targeting the Chaos Studio endpoints we created earlier. The experiment consists of a single step that emulates an App Service failure by stopping the web app. The experiment runs for 10 minutes, allowing us to observe how Traffic Manager responds to the failure and routes traffic to the healthy endpoint.</p>
<p>We’re also assigning the user-assigned identity (UMI) to the Chaos Studio experiment we created earlier.</p>
<p>For reference you can find the pipelines and code in my repo:</p>
<p><a target="_blank" href="https://github.com/broberts23/workload-resiliency">https://github.com/broberts23/workload-resil</a><a target="_blank" href="https://github.com/broberts23/workload-resiliency/actions">iency</a></p>
<h3 id="heading-validation">Validation</h3>
<p>We’re finally at the point where we can validate our setup! To do this, we’ll run the Chaos Studio experiment and observe the behavior of Traffic Manager. You can monitor the experiment’s progress and view the results in the Chaos Studio portal.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727485619240/c93eda94-ddd2-4384-b058-134fd6dd9b46.png" alt /></p>
<p>After starting the experiment, we can verify that the web app in the affected region has stopped and that Traffic Manager has successfully routed traffic to the healthy endpoint. This demonstrates the resiliency of our workload and the effectiveness of our Traffic Manager configuration.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727485620357/e315eeb0-3bf1-4470-842e-1238513ba149.png" alt /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727485621324/bebbd2e3-36c1-4f98-801b-46f599b5bc56.png" alt /></p>
<h1 id="heading-conclusion">Conclusion</h1>
<p>In today’s dynamic and interconnected digital landscape, ensuring the resiliency of cloud workloads is crucial for maintaining business continuity and delivering exceptional user experiences. Technologies like Azure Chaos Studio and Azure Traffic Manager provide organizations with powerful tools to validate system resilience, optimize application availability, and mitigate the impact of failures. By embracing chaos engineering principles, leveraging traffic management solutions, and adopting SRE practices, organizations can build more resilient, scalable, and reliable cloud-based applications.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727485621995/40dd0bf5-950c-4685-ac40-ef58d6fd1bfa.png" alt="🚀" /></p>
]]></content:encoded></item><item><title><![CDATA[Bicep Custom Script Extension: Maximizing Deployment Efficiency]]></title><description><![CDATA[In the fast-paced world of cloud infrastructure, automation stands as the cornerstone of operational efficiency. Microsoft Azure, with its rich suite of tools and extensions, significantly enhances the deployment processes. Among these tools, the Cus...]]></description><link>https://benroberts.io/bicep-custom-script-extension-maximizing-deployment-efficiency</link><guid isPermaLink="true">https://benroberts.io/bicep-custom-script-extension-maximizing-deployment-efficiency</guid><category><![CDATA[automation]]></category><category><![CDATA[Azure]]></category><category><![CDATA[Bicep]]></category><category><![CDATA[blog]]></category><category><![CDATA[GitHub]]></category><dc:creator><![CDATA[Ben Roberts]]></dc:creator><pubDate>Mon, 29 Apr 2024 08:17:34 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1727491515828/895a7cfd-60e6-459b-8c9a-f70b1240c7cb.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the fast-paced world of cloud infrastructure, automation stands as the cornerstone of operational efficiency. Microsoft Azure, with its rich suite of tools and extensions, significantly enhances the deployment processes. Among these tools, the Custom Script Extension for Windows (and Linux) is particularly valuable. It allows users to execute custom scripts on virtual machines (VMs) during deployment, facilitating tailored configurations and setups without the need for manual intervention. In this blog post, we will explore how to harness the power of the Custom Script Extension to automate Windows VM deployments in Azure, with a special focus on leveraging Bicep templates.</p>
<h2 id="heading-overview-of-custom-script-extension">Overview of Custom Script Extension</h2>
<p>The Azure Custom Script Extension is a powerful tool designed to streamline the execution of custom scripts on Azure virtual machines. This facilitates a variety of tasks such as post-deployment configurations, software installations, and other system customization. It supports a wide range of scripting languages, including PowerShell and Azure CLI, and integrates seamlessly with domain-specific languages (DSLs) such as Ansible, Bicep, and Terraform. Scripts can be sourced from Azure Storage, GitHub, or any accessible URL, ensuring flexibility across different deployment scenarios.</p>
<h2 id="heading-key-features">Key Features</h2>
<ul>
<li><p><strong>Script Execution</strong>: The extension can execute scripts stored in locations like Azure Storage or GitHub, or directly from a specified URL.</p>
</li>
<li><p><strong>Secure Execution</strong>: Scripts are executed using Azure Managed Identity, enhancing security by eliminating the need for explicit credentials.</p>
</li>
<li><p><strong>Script Output Logging</strong>: Detailed logs are provided for script outputs, aiding in troubleshooting and audit processes.</p>
</li>
<li><p><strong>Versioning and Updating</strong>: Scripts can be version-controlled and updated seamlessly, ensuring consistency across deployments.</p>
</li>
</ul>
<p>For more Features and Tips for Custom Script Extension, refer to the <a target="_blank" href="https://learn.microsoft.com/en-us/azure/virtual-machines/extensions/custom-script-windows#tips">Azure documentation</a>.</p>
<p>It’s important to keep in mind that the Custom Script Extensions executes scripts directly on the VM, inheriting the VM’s permissions and environment. Authentication mechanisms and network connectivity should be carefully managed to access internal and external resources securely.</p>
<h2 id="heading-custom-script-extension-use-cases">Custom Script Extension Use Cases</h2>
<p>Custom Script Extensions (CSE) offer a versatile way to enhance and automate the configuration and management of Azure virtual machines (VMs) and virtual machine scale sets (VMSS). Here are several practical use cases where Custom Script Extensions can be particularly beneficial:</p>
<h3 id="heading-enhancing-a-base-or-golden-image"><strong>Enhancing a Base or Golden Image</strong></h3>
<ul>
<li><p><strong>Scenario</strong>: When deploying VMs from a standardized base image (often referred to as a “golden image”), it might lack specific software or configuration tweaks needed for particular applications or environments.</p>
</li>
<li><p><strong>Use Case</strong>: Use a Custom Script Extension to automatically install additional software packages, apply security patches, or configure system settings after the VM is instantiated from the golden image. This ensures that each deployed VM is immediately tailored to meet specific operational requirements without manual intervention.</p>
</li>
</ul>
<h3 id="heading-automated-software-deployment"><strong>Automated Software Deployment</strong></h3>
<ul>
<li><p><strong>Scenario</strong>: Continuous Integration/Continuous Deployment (CI/CD) environments require frequent updates and consistent configurations across multiple VMs.</p>
</li>
<li><p><strong>Use Case</strong>: Integrate Custom Script Extensions to deploy the latest version of your application automatically or perform necessary pre-deployment and post-deployment tasks such as configuring software or updating dependencies.</p>
</li>
</ul>
<h3 id="heading-environment-configuration"><strong>Environment Configuration</strong></h3>
<ul>
<li><p><strong>Scenario</strong>: Different environments (development, testing, production) often require different configurations and settings to function correctly.</p>
</li>
<li><p><strong>Use Case</strong>: Use Custom Script Extensions to modify configuration files, adjust environment variables, or execute scripts that tailor the VM’s environment to fit its intended role. This can include setting up database connections, configuring logging levels, or modifying network settings.</p>
</li>
</ul>
<h3 id="heading-security-hardening-and-compliance"><strong>Security Hardening and Compliance</strong></h3>
<ul>
<li><p><strong>Scenario</strong>: Ensuring that VMs comply with organizational or regulatory security standards is crucial but can be time-consuming if performed manually.</p>
</li>
<li><p><strong>Use Case</strong>: Automate the process of hardening VMs by using Custom Script Extensions to apply security configurations, install security tools, enforce password policies, and disable unnecessary services. Scripts can be maintained in a central repository and updated as standards evolve.</p>
</li>
</ul>
<h3 id="heading-performance-tuning"><strong>Performance Tuning</strong></h3>
<ul>
<li><p><strong>Scenario</strong>: Optimal performance tuning often requires adjustments that are specific to the workload or the software that the VM is hosting.</p>
</li>
<li><p><strong>Use Case:</strong> Deploy scripts via Custom Script Extensions to tweak system parameters such as memory management settings, networking stack adjustments, or disk I/O performance settings. These enhancements can be crucial for performance-sensitive applications like databases or large-scale transaction systems.</p>
</li>
</ul>
<h2 id="heading-custom-script-extension-vs-powershell-desired-state-configuration">Custom Script Extension vs. PowerShell Desired State Configuration</h2>
<p>When it comes to automating configurations and managing Azure virtual machines, both Custom Script Extensions (CSE) and PowerShell Desired State Configuration (DSC) are powerful tools. While CSE focuses on executing scripts to configure VMs during deployment or post-deployment, DSC takes a declarative approach to define and maintain a desired state for a VM’s configuration. DSC uses a declarative syntax to specify how a system should be configured, and it continuously ensures that the system remains in the desired state. On the other hand, CSE is imperative, executing scripts at specific points in time. DSC is particularly useful for ongoing configuration management and drift prevention, while CSE excels at one-time setup tasks and ad-hoc customization. However, CSE can also be used for ongoing management by triggering script execution on a schedule or in response to events. Ultimately, the choice between CSE and DSC depends on the specific requirements, with CSE offering flexibility and simplicity, and DSC providing a robust framework for maintaining consistent configurations over time. While powerful, PowerShell DSC can introduce complexity due to the requirement of packaging configurations, and deploying them through Azure Machine Configuration (Guest Configuration).</p>
<h2 id="heading-deployment-process">Deployment Process</h2>
<ul>
<li><p><strong>Script Preparation</strong>: Write a PowerShell script tailored to meet the specific deployment needs, such as software installation or system configuration.</p>
</li>
<li><p><strong>Integration with ARM Templates</strong>: Embed the Custom Script Extension in an Azure Resource Manager (ARM) template, specifying the script to be executed during the VM setup.</p>
</li>
<li><p><strong>Deployment via Azure Portal and Tools</strong>: Deploy the VM through the Azure Portal, Azure CLI, or Azure PowerShell. The Custom Script Extension will automatically execute the PowerShell script post-deployment.</p>
</li>
<li><p><strong>Monitoring and Troubleshooting</strong>: Use tools like Azure Portal or Azure CLI to monitor the deployment and troubleshoot any potential issues.</p>
</li>
</ul>
<h2 id="heading-best-practices-and-considerations">Best Practices and Considerations</h2>
<ul>
<li><p><strong>Script Idempotency</strong>: Design scripts so they can be safely rerun multiple times, which is essential for handling retries and resuming interrupted deployments.</p>
</li>
<li><p><strong>Error Handling</strong>: Scripts should include comprehensive error handling to manage failures and, if necessary, revert changes.</p>
</li>
<li><p><strong>Security Measures</strong>: Employ Azure Key Vault for managing sensitive data, use Managed Identity for script execution, and ensure scripts are accessible only to authorized users.</p>
</li>
<li><p><strong>Version Control</strong>: Maintain scripts under version control to track modifications and facilitate rollbacks when needed.</p>
</li>
<li><p><strong>Testing and Validation</strong>: Thoroughly test scripts in a non-production environment to confirm their functionality and identify any issues before widespread deployment.</p>
</li>
</ul>
<h2 id="heading-example-scenario-automating-iis-deployment-with-custom-script-extension">Example Scenario: Automating IIS Deployment with Custom Script Extension</h2>
<p>Consider a scenario where we need to automate a website/app deployment on an Azure VM. Using Custom Script Extension with PowerShell, here’s how you might achieve this:</p>
<pre><code class="lang-powershell"><span class="hljs-comment"># PowerShell script for automating IIS installation</span>
<span class="hljs-built_in">Set-ExecutionPolicy</span> RemoteSigned <span class="hljs-literal">-Scope</span> CurrentUser <span class="hljs-literal">-Force</span>

<span class="hljs-comment"># Define configuration settings</span>
<span class="hljs-variable">$contentPath</span> = <span class="hljs-string">"<span class="hljs-variable">$Env:systemdrive</span>\inetpub\wwwroot"</span>
<span class="hljs-variable">$gitHubRepoUri</span> = <span class="hljs-string">"https://raw.githubusercontent.com/cloudacademy/static-website-example/master/index.html"</span>

<span class="hljs-comment"># Install IIS Role and Features</span>
<span class="hljs-comment">&lt;#PSScriptInfo
<span class="hljs-doctag">.Synopsis</span>
   Powershell script for Azure Bicep VM Extension to configure a Windows Web Server
<span class="hljs-doctag">.INPUTS</span>
   No inputs are required for this script
<span class="hljs-doctag">.OUTPUTS</span>
   No outputs are generated by this script
<span class="hljs-doctag">.NOTES</span>
  Version:        0.1
  Author:         Ben Roberts
  Creation Date:  17/04/2024
  Purpose/Change: Initial script development
.
#&gt;</span>

<span class="hljs-comment"># Update Script Execution Policy (temporarily allow script execution)</span>
<span class="hljs-built_in">Set-ExecutionPolicy</span> RemoteSigned <span class="hljs-literal">-Scope</span> CurrentUser <span class="hljs-literal">-Force</span>

<span class="hljs-comment"># Define Configuration Settings (replace with your values)</span>
<span class="hljs-variable">$contentPath</span> = <span class="hljs-string">"<span class="hljs-variable">$Env:systemdrive</span>\inetpub\wwwroot"</span>
<span class="hljs-variable">$gitHubRepoUri</span> = <span class="hljs-string">"https://raw.githubusercontent.com/cloudacademy/static-website-example/master/index.html"</span>

<span class="hljs-comment"># Install IIS Role and Features</span>
<span class="hljs-keyword">Try</span> {
  <span class="hljs-built_in">Install-WindowsFeature</span> <span class="hljs-literal">-Name</span> Web<span class="hljs-literal">-Server</span> <span class="hljs-literal">-IncludeManagementTools</span>
  <span class="hljs-built_in">Write-Host</span> <span class="hljs-string">"IIS Role and Features installed successfully"</span>
}
<span class="hljs-keyword">Catch</span> {
  <span class="hljs-built_in">Write-Host</span> <span class="hljs-variable">$Error</span>[<span class="hljs-number">0</span>].Exception.Message
}

<span class="hljs-keyword">try</span> {
  <span class="hljs-comment"># Download static website content</span>
  <span class="hljs-built_in">Invoke-WebRequest</span> <span class="hljs-literal">-Uri</span> <span class="hljs-variable">$gitHubRepoUri</span> <span class="hljs-literal">-OutFile</span> <span class="hljs-string">"<span class="hljs-variable">$contentPath</span>\Default.htm"</span>
  <span class="hljs-built_in">Write-Host</span> <span class="hljs-string">"Website content cloned successfully"</span>
}
<span class="hljs-keyword">catch</span> {
  <span class="hljs-built_in">Write-Host</span> <span class="hljs-variable">$Error</span>[<span class="hljs-number">0</span>].Exception.Message
}

<span class="hljs-keyword">try</span> {
  <span class="hljs-comment"># Disable Unused Services</span>
  <span class="hljs-variable">$unusedServices</span> = <span class="hljs-selector-tag">@</span>(<span class="hljs-string">"tapisrv"</span>, <span class="hljs-string">"WMPNetworkSvc"</span>, <span class="hljs-string">"ssh-agent"</span>)
  <span class="hljs-built_in">Stop-Service</span> <span class="hljs-variable">$unusedServices</span> <span class="hljs-literal">-ErrorAction</span> SilentlyContinue
  <span class="hljs-built_in">Set-Service</span> <span class="hljs-variable">$unusedServices</span> <span class="hljs-literal">-StartupType</span> Disabled
}
<span class="hljs-keyword">catch</span> {
  <span class="hljs-built_in">Write-Host</span> <span class="hljs-variable">$Error</span>[<span class="hljs-number">0</span>].Exception.Message
}

<span class="hljs-keyword">try</span> {
  <span class="hljs-comment"># Configure Windows Firewall</span>
  <span class="hljs-built_in">New-NetFirewallRule</span> <span class="hljs-literal">-DisplayName</span> <span class="hljs-string">"Allow Inbound Port 80"</span> <span class="hljs-literal">-Direction</span> Inbound <span class="hljs-literal">-LocalPort</span> <span class="hljs-number">80</span> <span class="hljs-literal">-RemotePort</span> <span class="hljs-number">80</span> <span class="hljs-literal">-Protocol</span> TCP <span class="hljs-literal">-Action</span> Allow
  <span class="hljs-built_in">Write-Host</span> <span class="hljs-string">"Firewall rule configured successfully"</span>
}
<span class="hljs-keyword">catch</span> {
  <span class="hljs-built_in">Write-Host</span> <span class="hljs-variable">$Error</span>[<span class="hljs-number">0</span>].Exception.Message
}
<span class="hljs-comment"># Update Script Execution Policy (revert to previous policy)</span>
<span class="hljs-built_in">Set-ExecutionPolicy</span> Bypass <span class="hljs-literal">-Scope</span> CurrentUser <span class="hljs-literal">-Force</span>

<span class="hljs-comment"># Restart IIS service to apply configuration changes</span>
<span class="hljs-built_in">Restart-Service</span> W3SVC
</code></pre>
<p>For deployment, this script is integrated into a Bicep template that specifies the Custom Script Extension to execute the script on the Azure VM:</p>
<pre><code class="lang-bash">@description(<span class="hljs-string">'VM Extension Properties.'</span>)
param extentionsProperties object = {
  extensionName: <span class="hljs-string">'IIS'</span>
  publisher: <span class="hljs-string">'Microsoft.Compute'</span>
  <span class="hljs-built_in">type</span>: <span class="hljs-string">'CustomScriptExtension'</span>
  typeHandlerVersion: <span class="hljs-string">'1.10'</span>
}

@description(<span class="hljs-string">'Command to execute on the Virtual Machine.'</span>)
param commandToExecute string = <span class="hljs-string">'powershell -File ConfigureWebServer_base.ps1'</span>

resource vmExtension <span class="hljs-string">'Microsoft.Compute/virtualMachines/extensions@2023-09-01'</span> = {
  parent: vm
  name: extentionsProperties.extensionName
  location: location
  properties: {
    publisher: extentionsProperties.publisher
    <span class="hljs-built_in">type</span>: extentionsProperties.type
    typeHandlerVersion: extentionsProperties.typeHandlerVersion
    autoUpgradeMinorVersion: <span class="hljs-literal">true</span>
    settings: {
      fileUris: [
        // Uri derived from the output of the deployment script: See uploadBlob.ps1
        deploymentScript.outputs.result
      ]
      commandToExecute: commandToExecute
    }
  }
}
</code></pre>
<p>For brevity I’ve omitted all the supporting configuration including the VM, VNET, Nic, Managed Identity, etc.</p>
<p>If you want the full context you can view the code here: <a target="_blank" href="https://github.com/broberts23/azure-automation-dsc">https://github.com/broberts23/azure-automation-dsc</a></p>
<p>If you’re wondering what the heck <code>deploymentScript.outputs.result</code> is, <a target="_blank" href="https://benroberts.io/2024/04/20/bicep-deployment-scripts-extending-azure-resource-deployment/">checkout my blog</a> on Bicep Deployment Scripts where we uploaded the powershell script we’re consuming in the <code>fileUris</code> property:</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://benroberts.io/bicep-deployment-scripts-extending-azure-resource-deployment">https://benroberts.io/bicep-deployment-scripts-extending-azure-resource-deployment</a></div>
<p> </p>
<h2 id="heading-deploying-the-bicep-template">Deploying the Bicep Template</h2>
<p>To deploy the Bicep template with the Custom Script Extension, we’ll be using a GitHib Action workflow. The workflow will, validate the Bicep file, and deploy the template to Azure, then for a little something extra, verify the website is up.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">name:</span> <span class="hljs-string">Bicep</span> <span class="hljs-string">Deployment</span> <span class="hljs-string">and</span> <span class="hljs-string">Pester</span> <span class="hljs-string">Test</span>

<span class="hljs-attr">on:</span>
  <span class="hljs-attr">workflow_dispatch:</span>

<span class="hljs-comment"># OICD Auth</span>
<span class="hljs-attr">permissions:</span>
  <span class="hljs-attr">id-token:</span> <span class="hljs-string">write</span>
  <span class="hljs-attr">contents:</span> <span class="hljs-string">read</span>

<span class="hljs-attr">env:</span>
  <span class="hljs-attr">resource-group:</span> <span class="hljs-string">RG1</span> <span class="hljs-comment"># name of the Azure resource group</span>
  <span class="hljs-attr">rollout-name:</span> <span class="hljs-string">rollout01</span> <span class="hljs-comment"># name of the deployment</span>
  <span class="hljs-attr">environment:</span> <span class="hljs-string">dev</span>

<span class="hljs-attr">jobs:</span>
  <span class="hljs-attr">validate:</span>
    <span class="hljs-attr">name:</span> <span class="hljs-string">Validate</span> <span class="hljs-string">Bicep</span> <span class="hljs-string">template</span>
    <span class="hljs-attr">runs-on:</span> <span class="hljs-string">ubuntu-latest</span>
    <span class="hljs-attr">environment:</span> <span class="hljs-string">dev</span>
    <span class="hljs-attr">steps:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Checkout</span> <span class="hljs-string">code</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">actions/checkout@v4</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Login</span> <span class="hljs-string">via</span> <span class="hljs-string">Azure</span> <span class="hljs-string">CLI</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">azure/login@v2</span>
        <span class="hljs-attr">with:</span>
          <span class="hljs-attr">client-id:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.AZURE_CLIENT_ID</span> <span class="hljs-string">}}</span>
          <span class="hljs-attr">tenant-id:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.AZURE_TENANT_ID</span> <span class="hljs-string">}}</span>
          <span class="hljs-attr">subscription-id:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.AZURE_SUBSCRIPTION_ID</span> <span class="hljs-string">}}</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Bicep</span> <span class="hljs-string">Validate</span>
        <span class="hljs-attr">id:</span> <span class="hljs-string">validate</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">azure/CLI@v2</span>
        <span class="hljs-attr">with:</span>
          <span class="hljs-attr">azcliversion:</span> <span class="hljs-string">latest</span>
          <span class="hljs-attr">inlineScript:</span> <span class="hljs-string">|
            az deployment group validate --resource-group ${{ env.resource-group }} --name  ${{ env.rollout-name }} --template-file vm_dsc.bicep --parameters adminPassword=${{ secrets.ADMINPASSWORD }}
</span>

  <span class="hljs-attr">deploy_and_test:</span>
    <span class="hljs-attr">name:</span> <span class="hljs-string">Deploy</span> <span class="hljs-string">and</span> <span class="hljs-string">Test</span>
    <span class="hljs-attr">runs-on:</span> <span class="hljs-string">ubuntu-latest</span>
    <span class="hljs-attr">environment:</span> <span class="hljs-string">dev</span>
    <span class="hljs-attr">needs:</span> [<span class="hljs-string">validate</span>]
    <span class="hljs-attr">steps:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Checkout</span> <span class="hljs-string">code</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">actions/checkout@v4</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Login</span> <span class="hljs-string">via</span> <span class="hljs-string">Azure</span> <span class="hljs-string">CLI</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">azure/login@v2</span>
        <span class="hljs-attr">with:</span>
          <span class="hljs-attr">client-id:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.AZURE_CLIENT_ID</span> <span class="hljs-string">}}</span>
          <span class="hljs-attr">tenant-id:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.AZURE_TENANT_ID</span> <span class="hljs-string">}}</span>
          <span class="hljs-attr">subscription-id:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.AZURE_SUBSCRIPTION_ID</span> <span class="hljs-string">}}</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Deploy</span> <span class="hljs-string">Bicep</span> <span class="hljs-string">template</span>
        <span class="hljs-attr">id:</span> <span class="hljs-string">deploy</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">azure/CLI@v2</span>      
        <span class="hljs-attr">with:</span>
          <span class="hljs-attr">azcliversion:</span> <span class="hljs-string">latest</span>
          <span class="hljs-attr">inlineScript:</span> <span class="hljs-string">|
            az deployment group create --resource-group ${{ env.resource-group }} --name  ${{ env.rollout-name }} --template-file vm_dsc.bicep --parameters adminPassword=${{ secrets.ADMINPASSWORD }}
            # Get the public IP address of the deployed VM from the bicep output
            publicIP=$(az deployment group show --resource-group ${{ env.resource-group }} --name  ${{ env.rollout-name }} --query 'properties.outputs.publicIP.value' -o tsv)
            echo "publicIP=$publicIP" &gt;&gt; $GITHUB_OUTPUT
</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Testing</span> <span class="hljs-string">Website</span> <span class="hljs-string">Accessibility</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">azure/powershell@v2</span>
        <span class="hljs-attr">with:</span>
          <span class="hljs-attr">azPSVersion:</span> <span class="hljs-string">"latest"</span>
          <span class="hljs-attr">inlineScript:</span> <span class="hljs-string">|
            Import-Module Pester
            $url = "http://${{ steps.deploy.outputs.publicIP }}"
            Describe "Website Accessibility Test" {
                It "Website should be accessible" {
                    $response = Invoke-WebRequest -Uri $url -UseBasicParsing -ErrorAction SilentlyContinue
                    $response.StatusCode | Should -Be 200
                }
            }</span>
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727485625663/193c0e8f-d6c5-4332-9093-54132181b567.png" alt="Bicep Custom Script Extension" /></p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>The Custom Script Extension, when combined with Bicep, provides a powerful solution for automating and streamlining cloud deployments in Azure. By leveraging this approach, IT administrators and DevOps teams can ensure more consistent, secure, and efficient deployment processes. The example provided here demonstrates just one of the myriad possibilities, encouraging users to explore this tool-set for diverse automation tasks in Azure environments.</p>
<p>I hope you enjoyed this blog post and found it valuable. Stay tuned for more insights and tutorials on cloud computing, automation, and infrastructure management. Happy automating!</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727485626848/4280730c-799b-4ab5-9054-f4c68c17e102.png" alt="🚀" /></p>
]]></content:encoded></item><item><title><![CDATA[Bicep Deployment Scripts: Extending Azure Resource Deployment]]></title><description><![CDATA[Hey there tech enthusiasts! Today, let’s dive into the world of Azure deployment scripts using Bicep, Microsoft’s DSL for Azure Resource Manager templates. Bicep Deployment Scripts are a powerful tool that enhances the deployment of Azure resources b...]]></description><link>https://benroberts.io/bicep-deployment-scripts-extending-azure-resource-deployment</link><guid isPermaLink="true">https://benroberts.io/bicep-deployment-scripts-extending-azure-resource-deployment</guid><category><![CDATA[Azure]]></category><category><![CDATA[Bicep]]></category><category><![CDATA[blog]]></category><category><![CDATA[Powershell]]></category><dc:creator><![CDATA[Ben Roberts]]></dc:creator><pubDate>Fri, 19 Apr 2024 23:27:32 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1727491605419/02d11ab3-3d50-417e-aaa0-e6e7c7bc9193.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hey there tech enthusiasts! Today, let’s dive into the world of Azure deployment scripts using Bicep, Microsoft’s DSL for Azure Resource Manager templates. Bicep Deployment Scripts are a powerful tool that enhances the deployment of Azure resources by combining the flexibility of Bicep with the power of PowerShell or Azure CLI.</p>
<h2 id="heading-what-are-bicep-deployment-scripts">What are Bicep Deployment Scripts?</h2>
<p><a target="_blank" href="https://learn.microsoft.com/en-us/azure/azure-resource-manager/bicep/deployment-script-bicep?tabs=CLI">Bicep Deployment Scripts</a> are essentially PowerShell or Azure CLI scripts that streamline, enhance or augment the deployment process of Azure resources using Bicep <code>resources</code>. They offer a cleaner and more concise way to execute command in-stream of the deployment process, making it easier to define and deploy Azure resources. Deployment scripts can be used to automate the provisioning of Azure resources, configure settings, and execute custom actions during deployment that can then be referenced in the Bicep template by other resources and/or modules.</p>
<p>Deployment scripts execute in a Container Instance and Storage Account which are created as part of the deployment process. Note that there are costs associated with the Container Instance and Storage Account, so be mindful of which properties you use to delete or keep the resources after deployment.</p>
<h2 id="heading-why-bicep">Why Bicep?</h2>
<p>First off, if you’re not familiar with Bicep, imagine ARM json templates but with a much friendlier syntax. Bicep simplifies the process of defining Azure resources by providing a more readable and maintainable format. If you’ve ever felt overwhelmed by the complexity of ARM templates, Bicep is your ticket to a smoother deployment experience.</p>
<h2 id="heading-getting-started-with-bicep-deployment-scripts">Getting Started with Bicep Deployment Scripts</h2>
<p>To kick things off, make sure you have the PowerShell and Bicep installed on your machine or developer environment. Once you’re all set up, writing your first Bicep Deployment Script is a breeze. In this example, we’ll create a simple Bicep module that executes a simple PowerShell script that uploads a file we’ll use in an upcoming blog. Below is a snippet of the Bicep module, i’ve omitted the supporting resources (User Assigned Identity, VNET’s, Storage Account, etc.) from the snippet for brevity:</p>
<pre><code class="lang-yaml"><span class="hljs-string">resource</span> <span class="hljs-string">deploymentScript</span> <span class="hljs-string">'Microsoft.Resources/deploymentScripts@2023-08-01'</span> <span class="hljs-string">=</span> {
  <span class="hljs-attr">name:</span> <span class="hljs-string">'inlinePS'</span>
  <span class="hljs-attr">location:</span> <span class="hljs-string">location</span>
  <span class="hljs-attr">kind:</span> <span class="hljs-string">'AzurePowerShell'</span>
  <span class="hljs-attr">identity:</span> {
    <span class="hljs-attr">type:</span> <span class="hljs-string">'UserAssigned'</span> <span class="hljs-string">//</span> <span class="hljs-string">User</span> <span class="hljs-string">Assigned</span> <span class="hljs-string">Identity</span> <span class="hljs-string">for</span> <span class="hljs-string">the</span> <span class="hljs-string">Deployment</span> <span class="hljs-string">Script</span> <span class="hljs-string">to</span> <span class="hljs-string">access</span> <span class="hljs-string">the</span> <span class="hljs-string">Storage</span> <span class="hljs-string">Account</span>
    <span class="hljs-attr">userAssignedIdentities:</span> {
      <span class="hljs-string">'${umi.id}'</span><span class="hljs-string">:</span> {}
    }
  }
  <span class="hljs-attr">properties:</span> {
    <span class="hljs-attr">azPowerShellVersion:</span> <span class="hljs-string">'11.4'</span>
    <span class="hljs-attr">retentionInterval:</span> <span class="hljs-string">'PT1H'</span>
    <span class="hljs-attr">cleanupPreference:</span> <span class="hljs-string">'OnSuccess'</span>
    <span class="hljs-attr">environmentVariables:</span> [
      {
        <span class="hljs-attr">name:</span> <span class="hljs-string">'AZURE_STORAGE_ACCOUNT'</span>
        <span class="hljs-attr">value:</span> <span class="hljs-string">storageAccount.name</span>
      }
      {
        <span class="hljs-attr">name:</span> <span class="hljs-string">'CONTENT'</span> <span class="hljs-string">//</span> <span class="hljs-string">Imports</span> <span class="hljs-string">the</span> <span class="hljs-string">content</span> <span class="hljs-string">of</span> <span class="hljs-string">a</span> <span class="hljs-string">file</span> <span class="hljs-bullet">-</span> <span class="hljs-string">in</span> <span class="hljs-string">this</span> <span class="hljs-string">case</span>, <span class="hljs-string">another</span>
        <span class="hljs-string">//</span> <span class="hljs-string">PowerShell</span> <span class="hljs-string">script</span> <span class="hljs-bullet">-</span> <span class="hljs-string">that</span> <span class="hljs-string">will</span> <span class="hljs-string">be</span> <span class="hljs-string">uploaded</span> <span class="hljs-string">to</span> <span class="hljs-string">the</span> <span class="hljs-string">Storage</span> <span class="hljs-string">Account</span>
        <span class="hljs-string">//</span> <span class="hljs-string">This</span> <span class="hljs-string">is</span> <span class="hljs-string">stored</span> <span class="hljs-string">in</span> <span class="hljs-string">an</span> <span class="hljs-string">Environment</span> <span class="hljs-string">Variable</span> <span class="hljs-string">as</span> <span class="hljs-string">a</span> <span class="hljs-string">string</span>
        <span class="hljs-attr">value:</span> <span class="hljs-string">loadTextContent('ConfigureWebServer_base.ps1')</span>
      }
      {
        <span class="hljs-attr">name:</span> <span class="hljs-string">'AZURE_RESOURCE_GROUP'</span>
        <span class="hljs-attr">value:</span> <span class="hljs-string">resourceGroup().name</span>
      }
    ]
    <span class="hljs-string">//</span> <span class="hljs-attr">arguments:</span> <span class="hljs-string">saName</span>
    <span class="hljs-attr">scriptContent:</span> <span class="hljs-string">loadTextContent('uploadBlob.ps1')</span>
  }
  <span class="hljs-attr">dependsOn:</span> [
    <span class="hljs-string">//</span> <span class="hljs-string">This</span> <span class="hljs-string">ensures</span> <span class="hljs-string">that</span> <span class="hljs-string">the</span> <span class="hljs-string">Storage</span> <span class="hljs-string">Account</span> <span class="hljs-string">and</span> <span class="hljs-string">role</span> <span class="hljs-string">assignment</span> <span class="hljs-string">are</span> <span class="hljs-string">created</span> <span class="hljs-string">before</span> <span class="hljs-string">the</span> <span class="hljs-string">deployment</span> <span class="hljs-string">script</span>
    <span class="hljs-string">roleAssignment</span>
  ]
}

<span class="hljs-string">output</span> <span class="hljs-string">result</span> <span class="hljs-string">string</span> <span class="hljs-string">=</span> <span class="hljs-string">deploymentScript.properties.outputs.text.ICloudBlob.Uri</span>
</code></pre>
<p>In this module, we define a <code>deploymentScript</code> resource that executes an Azure PowerShell script (uploadBlob.ps1) via the <code>scriptContent</code> property. The environment variables are used to pass parameters to the script, such as the Storage Account name and the content of the file. The <code>loadTextContent</code> function is used to import the content of both PowerShell scripts (plain text) from local files and store them as an environment variables.</p>
<p>Note the <code>cleanupPreference</code> property is set to ‘OnSuccess’, which means the Container Instance and Storage Account created to execute the script will only be deleted after a successful execution. This is handy for troubleshooting a failing script, you won’t have to wait for the container instance to be created on subsiquent runs. However, this might not be ideal at scale as developers could leave failed deployments undeleted, leading to wasted spend.</p>
<p>Let’s have a quick look at the PowerShell script called by <code>scriptContent</code>:</p>
<pre><code class="lang-powershell"><span class="hljs-comment">&lt;#PSScriptInfo
<span class="hljs-doctag">.Synopsis</span> Powershell script to upload a file to a blob storage account
<span class="hljs-doctag">.INPUTS</span> The input to this script are the environment variables set by the DeploymentScript Bicep resource
<span class="hljs-doctag">.OUTPUTS</span> The output of this script is returned as a DeploymentScript output ($DeploymentScriptOutputs) as the 'text' key. This is later interpreted by the DeploymentScript to get the Uri of the uploaded file.
<span class="hljs-doctag">.NOTES</span>
Version: 0.1
Author: Ben Roberts
Creation Date: 17/04/2024
Purpose/Change: Initial script development
#&gt;</span>

<span class="hljs-comment"># Connect to Azure using Managed Identity defined in the Deployment Script resource</span>
<span class="hljs-keyword">try</span> {
    <span class="hljs-built_in">Connect-AzAccount</span> <span class="hljs-literal">-Identity</span>
} <span class="hljs-keyword">catch</span> {
    <span class="hljs-built_in">Write-Host</span> <span class="hljs-string">"Failed to connect to Azure"</span>
    <span class="hljs-keyword">exit</span> <span class="hljs-number">1</span>
}

<span class="hljs-comment"># Get the storage account context</span>
<span class="hljs-variable">$saContext</span> = <span class="hljs-built_in">Get-AzStorageAccount</span> <span class="hljs-literal">-Name</span> <span class="hljs-string">"<span class="hljs-variable">$</span>{env:AZURE_STORAGE_ACCOUNT}"</span> <span class="hljs-literal">-ResourceGroupName</span> <span class="hljs-string">"<span class="hljs-variable">$</span>{env:AZURE_RESOURCE_GROUP}"</span>
<span class="hljs-variable">$workingContext</span> = <span class="hljs-variable">$saContext</span>.Context

<span class="hljs-comment"># Output the script from the environment variable to a file</span>
<span class="hljs-built_in">Write-Output</span> <span class="hljs-string">"<span class="hljs-variable">$</span>{env:CONTENT}"</span> &gt; ConfigureWebServer_base.ps1

<span class="hljs-comment"># Upload the file to the blob storage account</span>
<span class="hljs-keyword">try</span> {
    <span class="hljs-variable">$output</span> = <span class="hljs-built_in">Set-AzStorageBlobContent</span> <span class="hljs-operator">-File</span> ConfigureWebServer_base.ps1 <span class="hljs-literal">-Container</span> dsc <span class="hljs-literal">-Blob</span> ConfigureWebServer_base.ps1 <span class="hljs-literal">-Context</span> <span class="hljs-variable">$workingContext</span> <span class="hljs-literal">-Force</span>
} <span class="hljs-keyword">catch</span> {
    <span class="hljs-built_in">Write-Host</span> <span class="hljs-variable">$Error</span>[<span class="hljs-number">0</span>].Exception.Message
}

<span class="hljs-comment"># Output the results to the built-in DeploymentScript variable using the 'text' key</span>
<span class="hljs-variable">$DeploymentScriptOutputs</span> = <span class="hljs-selector-tag">@</span>{}
<span class="hljs-variable">$DeploymentScriptOutputs</span>[<span class="hljs-string">'text'</span>] = <span class="hljs-variable">$output</span> <span class="hljs-comment"># See results.json for an example of $output</span>
</code></pre>
<p>This script authenticates to Azure using the Managed Identity defined in the Deployment Script resource, retrieves the Storage Account context, and uploads the content of the second PowerShell script to a container in the Storage Account. The output of the script is stored in the <code>$DeploymentScriptOutputs</code> variable, which is a common variable that can be accessed by the DeploymentScript resource and surfaced by Azure Resource Manager. This can be referenced in the Azure portal, or using the Azure PowerShell cmdlet <code>Get-AzDeploymentScript</code></p>
<h2 id="heading-deployment-scripts-outputs">Deployment Scripts Outputs</h2>
<p>The output is stored in the <code>text</code> key of the <code>$DeploymentScriptOutputs</code> variable, which can be output to a variable in the Bicep template. This allows you to reference the output of the script in other resources or modules.</p>
<p>output result string = deploymentScript.properties.outputs.text.ICloudBlob.Uri</p>
<p><code>deploymentScript.properties.outputs.text</code> is a JSON object. We can access the <code>ICloudBlob.Uri</code> property to get the URI of the uploaded file. This output can then be used in other resources or modules to reference the uploaded file by using <code>deploymentScript.outputs.resut</code>. We’ll make use of this in an upcoming blog.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727485630781/d18fd3e6-f3a0-4fce-9f2d-a071993a8c9e.png" alt="Bicep Deployment Script" /></p>
<h2 id="heading-multi-line-scripts">Multi-line Scripts</h2>
<p>As an alternative to the passing the deployment script as a <code>ps1</code> file, you can also pass the script as a multi-line string in the Bicep by using three quotes `”’`. This can be useful for small scripts that don’t require a separate file. Here’s an example of how you can define a multi-line script in a Bicep file:</p>
<pre><code class="lang-bash">param storageAccountName = <span class="hljs-string">'myStorageAccount'</span>
param rgName = resourceGroup().name

resource deploymentScript <span class="hljs-string">'Microsoft.Resources/deploymentScripts@2023-08-01'</span> = {
  name: <span class="hljs-string">'inlinePS'</span>
  location: location
  kind: <span class="hljs-string">'AzurePowerShell'</span>
  properties: {
    azCliVersion: <span class="hljs-string">'11.4'</span>
    arguments: <span class="hljs-string">'-name ${storageAccountName} -rgName ${rgName}'</span>
    scriptContent: <span class="hljs-string">''</span><span class="hljs-string">'
      param (
        [string] $name,
        [string] $rgName
      )
      $output = az storage account create -n $name -g $rgName --sku Standard_LRS
      $DeploymentScriptOutputs = @{}
      $DeploymentScriptOutputs['</span>text<span class="hljs-string">'] = $output
    '</span><span class="hljs-string">''</span>
  }
}

output result object = deploymentScript.properties.outputs.text
</code></pre>
<h2 id="heading-idempotent-scripts">Idempotent Scripts</h2>
<p>Deployment script execution is designed to be idempotent, meaning that if there are no changes to any of the deploymentScripts resource properties, including the inline script, the script will not run upon redeployment of the Bicep file. The deployment script service compares resource names in the Bicep file with existing resources in the same resource group.</p>
<p>To run the same deployment script multiple times, you have two options: either change the name of your deploymentScripts resource, perhaps using the <code>utcNow</code> function as the resource name or as part of it (note that <code>utcNow</code> can only be used in the default value for a parameter), changing the resource <code>name</code> (“inlinePS” in my example) creates a new deploymentScripts resource; or specify a different value in the <code>forceUpdateTag</code> property, such as <code>utcNow</code>.</p>
<p>Writing deployment scripts with idempotence in mind ensures that accidental reruns won’t lead to unintended system alterations. For instance, let’s say you’re deploying an Azure virtual machine using a deployment script. Before creating the VM, the script should check if a VM with the same name already exists. If it does, the script can either skip VM creation or delete the existing VM and recreate it with updated configurations. This approach guarantees that running the script multiple times won’t result in duplicate VMs or unexpected changes to existing ones, maintaining system integrity and minimizing errors in the deployment process.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Bicep Deployment Scripts are a game-changer for Azure resource deployment. By combining the simplicity of Bicep with the power of PowerShell, you can streamline your deployment process and automate complex workflows with ease. Whether you’re a seasoned Azure pro or just getting started, Bicep Deployment Scripts are a must-have tool in your arsenal.</p>
<p>Stay tuned for the next blog post where we’ll explore how to use Custom Script Extension to deploy a custom web server configuration to an Azure VM, using the PowerShell script we just uploaded to the Storage Account. Until then, happy coding!</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727485631757/36553836-3576-402c-9454-22b4ef5a60bf.png" alt="🚀" /></p>
]]></content:encoded></item><item><title><![CDATA[Azure Verified Modules: In Action]]></title><description><![CDATA[In this blog, I will be demonstrating how to use Azure Verified Modules (AVM) from the Azure Public Githib Repository. Azure Verified Modules are a set of pre-built, pre-tested, and pre-configured modules that allow you to quickly deploy and configur...]]></description><link>https://benroberts.io/azure-verified-modules-in-action</link><guid isPermaLink="true">https://benroberts.io/azure-verified-modules-in-action</guid><category><![CDATA[Azure]]></category><category><![CDATA[blog]]></category><category><![CDATA[containers]]></category><category><![CDATA[GitHub]]></category><dc:creator><![CDATA[Ben Roberts]]></dc:creator><pubDate>Sat, 06 Apr 2024 22:41:05 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1727491779195/097f3942-fbe6-4cd6-912f-68ef61c10d05.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In this blog, I will be demonstrating how to use <a target="_blank" href="https://azure.github.io/Azure-Verified-Modules/">Azure Verified Modules</a> (AVM) from the Azure Public Githib Repository. Azure Verified Modules are a set of pre-built, pre-tested, and pre-configured modules that allow you to quickly deploy and configure Azure services. These modules are built by Microsoft and are verified by the Azure team to ensure they meet specific quality standards such as the Azure Cloud Adoption Framework (CAF).</p>
<p>We will explore AVM and how you can use them to simplify your Azure deployments.</p>
<h2 id="heading-what-are-azure-verified-modules">What are Azure Verified Modules?</h2>
<p>Azure Verified Modules are pre-built Terraform and Bicep modules that are designed to automate the deployment and configuration of Azure services. These modules are built using best practices and are verified by the Azure team to ensure they are secure, reliable, and efficient.</p>
<p>Azure Verified Modules are available in the <a target="_blank" href="https://github.com/Azure/bicep-registry-modules">Bicep Registry</a> and can be easily imported into your Bicep configuration files. These modules can be used to deploy a wide range of Azure services, including virtual machines, storage accounts, databases, and many more.</p>
<h2 id="heading-what-are-we-building">What are we building?</h2>
<p>I’m working on a series of blogs where I need working web applications. I will be using container hosting services like Azure Container Instances, Azure Container Apps and Azure Kubernetes Service to host these applications. In this blog, I will be using AVM to deploy an Azure Container Registry, Managed Identity an VNET, then use GitHub Actions to deploy the infrasctucture resources and build then push a container image to the Azure Container Registry as a peice of reusable code I can run on-demand.</p>
<h2 id="heading-building-the-azure-container-registry-using-verified-modules">Building the Azure Container Registry using Verified Modules</h2>
<p>To get started, we need to deploy an Azure Container Registry using AVM. The Azure team and community have created excellent documentation on how to use AVM, and I will be following the steps outlined in the <a target="_blank" href="https://github.com/Azure/bicep-registry-modules/tree/main/avm/res/container-registry/registry#Usage-examples">Azure Container Registry module documentation</a>.</p>
<p>Here is the Bicep code to deploy the Azure Container Registry and supporting resources using AVM:</p>
<pre><code class="lang-bash">param location string
param environment string

var acrName = <span class="hljs-string">'${environment}acr${uniqueString(resourceGroup().id)}'</span>
var vnetName = <span class="hljs-string">'${environment}-vnet-${uniqueString(resourceGroup().id)}'</span>
var umiName = <span class="hljs-string">'${environment}-umi-${uniqueString(resourceGroup().id)}'</span>

module userAssignedIdentity <span class="hljs-string">'br/public:avm/res/managed-identity/user-assigned-identity:0.2.1'</span> = {
  name: <span class="hljs-string">'userAssignedIdentityDeployment'</span>
  params: {
    name: umiName
    location: location
  }
}

module virtualNetwork <span class="hljs-string">'br/public:avm/res/network/virtual-network:0.1.5'</span> = {
  name: <span class="hljs-string">'virtualNetworkDeployment'</span>
  params: {
    addressPrefixes: [
      <span class="hljs-string">'10.0.0.0/16'</span>
    ]
    name: vnetName
    location: location
    subnets: [
      {
        addressPrefix: <span class="hljs-string">'10.0.0.0/24'</span>
        name: <span class="hljs-string">'az-subnet-001'</span>
        privateEndpointNetworkPolicies: <span class="hljs-string">'Disabled'</span>
        privateLinkServiceNetworkPolicies: <span class="hljs-string">'Enabled'</span>
      }
      {
        addressPrefix: <span class="hljs-string">'10.0.1.0/24'</span>
        name: <span class="hljs-string">'az-subnet-002'</span>
      }
      {
        addressPrefix: <span class="hljs-string">'10.0.3.0/24'</span>
        name: <span class="hljs-string">'az-subnet-003'</span>
      }
    ]
  }
}

module registry <span class="hljs-string">'br/public:avm/res/container-registry/registry:0.1.1'</span> = {
  name: <span class="hljs-string">'registryDeployment'</span>
  params: {
    managedIdentities: {
      systemAssigned: <span class="hljs-literal">false</span>
      userAssignedResourceIds: [
        userAssignedIdentity.outputs.resourceId
      ]
    }
    name: acrName
    acrAdminUserEnabled: <span class="hljs-literal">false</span>
    acrSku: <span class="hljs-string">'Premium'</span>
    publicNetworkAccess: <span class="hljs-string">'Enabled'</span>
    softDeletePolicyStatus: <span class="hljs-string">'disabled'</span>
    exportPolicyStatus: <span class="hljs-string">'enabled'</span>
    location: location
    tags: {
      Environment: environment
      <span class="hljs-string">'hidden-title'</span>: <span class="hljs-string">'Container Registry'</span>
      Role: <span class="hljs-string">'DeploymentValidation'</span>
    }
  }
}
</code></pre>
<p>In this code snippet, we are deploying an Azure Container Registry with a Premium SKU, enabling the export policy, and disabling the soft delete policy. We are also deploying a virtual network with three subnets and a user-assigned managed identity. The managed identity is used to authenticate the Azure Container Registry to future resources.</p>
<p>I’ve reduced the parameters and variables to a minimum to keep the code as readable as possible.</p>
<p>br/public is the namespace for AVM, and the avm/res/container-registry/registry module is used to deploy the Azure Container Registry. The module takes several parameters, including the name of the Azure Container Registry, the location, the SKU, and the network access settings.</p>
<p>A major benefit of using AVM is tab completion and/or Intellisense in your IDE. This makes it easier to discover the available modules versions, and their parameters, as well as the values that can be passed to the parameters.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727485635796/ad994103-574a-46df-be3d-fb0fe4b00a31.png" alt="Verified Modules" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727485637058/bac9b3de-53d8-4fd8-a4d4-c0e18d44ecdb.png" alt="Verified Modules" /></p>
<h2 id="heading-github-actions-for-infra-deployment">GitHub Actions for Infra Deployment</h2>
<p>Now that we have the Bicep code to deploy the Azure Container Registry, we need to create a GitHub Actions workflow to deploy the infrastructure resources. Here is the GitHub Actions workflow that deploys the Azure Container Registry using AVM:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">name:</span> <span class="hljs-string">Deploy</span> <span class="hljs-string">Azure</span> <span class="hljs-string">Infra</span>

<span class="hljs-attr">on: workflow_dispatch:</span>

<span class="hljs-attr">permissions:</span>
  <span class="hljs-attr">id-token:</span> <span class="hljs-string">write</span>
  <span class="hljs-attr">contents:</span> <span class="hljs-string">read</span>

<span class="hljs-attr">env:</span>
  <span class="hljs-attr">resource-group:</span> <span class="hljs-string">RG1</span> <span class="hljs-comment"># name of the Azure resource group</span>

<span class="hljs-attr">jobs:</span>
  <span class="hljs-attr">bicep-deploy:</span>
    <span class="hljs-attr">name:</span> <span class="hljs-string">"Bicep Deploy"</span>
    <span class="hljs-attr">runs-on:</span> <span class="hljs-string">ubuntu-latest</span>
    <span class="hljs-attr">environment:</span> <span class="hljs-string">dev</span>

    <span class="hljs-attr">steps:</span>
      <span class="hljs-comment"># Checkout the repository to the GitHub Actions runner</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Checkout</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">actions/checkout@v4</span>

      <span class="hljs-comment"># Authenticate to Az CLI using OIDC</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">"Az CLI login"</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">azure/login@v2</span>
        <span class="hljs-attr">with:</span>
          <span class="hljs-attr">client-id:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.AZURE_CLIENT_ID</span> <span class="hljs-string">}}</span>
          <span class="hljs-attr">tenant-id:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.AZURE_TENANT_ID</span> <span class="hljs-string">}}</span>
          <span class="hljs-attr">subscription-id:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.AZURE_SUBSCRIPTION_ID</span> <span class="hljs-string">}}</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Azure</span> <span class="hljs-string">CLI</span> <span class="hljs-string">script</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">azure/cli@v2</span>
        <span class="hljs-attr">with:</span>
          <span class="hljs-attr">azcliversion:</span> <span class="hljs-number">2.59</span><span class="hljs-number">.0</span>
          <span class="hljs-attr">inlineScript:</span> <span class="hljs-string">|</span>
            <span class="hljs-string">az</span> <span class="hljs-string">deployment</span> <span class="hljs-string">group</span> <span class="hljs-string">create</span> <span class="hljs-string">--resource-group</span> <span class="hljs-string">${{</span> <span class="hljs-string">env.resource-group</span> <span class="hljs-string">}}</span> <span class="hljs-string">--name</span> <span class="hljs-string">rollout01</span> <span class="hljs-string">--template-file</span> <span class="hljs-string">main.bicep</span> <span class="hljs-string">--parameters</span> <span class="hljs-string">main.bicepparam</span>
</code></pre>
<p>In this GitHub Actions workflow, we are using the Azure/login and Azure/cli actions to authenticate to the Azure CLI and deploy the Bicep code to the Azure resource group. The workflow is triggered manually using the workflow_dispatch event.</p>
<p>I’ve covered using OIDC to authenticate Github to Azure in depth in previous blogs. You can find more information on how to set up OIDC authentication in parts 4 and 5 of <a target="_blank" href="https://benroberts.io/azure-mlops-challenge-blog-index">Azure MLOps Challenge Blog</a></p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://benroberts.io/azure-mlops-challenge-blog-index">https://benroberts.io/azure-mlops-challenge-blog-index</a></div>
<p> </p>
<h2 id="heading-github-actions-for-container-deployment">GitHub Actions for Container Deployment</h2>
<p>Now that we have deployed the Azure Container Registry using Azure Verified Modules, we need to create a GitHub Actions workflow to build and push a container image to the Azure Container Registry. Here is the GitHub Actions workflow that builds and pushes a container image to the Azure Container Registry:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">name:</span> <span class="hljs-string">Build</span> <span class="hljs-string">and</span> <span class="hljs-string">Push</span> <span class="hljs-string">Docker</span> <span class="hljs-string">image</span> <span class="hljs-string">to</span> <span class="hljs-string">ACR</span>

<span class="hljs-attr">on: workflow_dispatch:</span>

<span class="hljs-attr">permissions:</span>
  <span class="hljs-attr">id-token:</span> <span class="hljs-string">write</span>
  <span class="hljs-attr">contents:</span> <span class="hljs-string">read</span>

<span class="hljs-attr">env:</span>
  <span class="hljs-attr">REGISTRY_NAME:</span> <span class="hljs-string">devacrulelz55lpsh</span> <span class="hljs-comment"># Set your registry name here</span>
  <span class="hljs-attr">IMAGE_NAME:</span> <span class="hljs-string">app2003</span> <span class="hljs-comment"># Set your image name here</span>

<span class="hljs-attr">jobs:</span>
  <span class="hljs-attr">build-and-push:</span>
    <span class="hljs-attr">runs-on:</span> <span class="hljs-string">ubuntu-latest</span>
    <span class="hljs-attr">environment:</span> <span class="hljs-string">dev</span>

    <span class="hljs-attr">steps:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Checkout</span> <span class="hljs-string">code</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">actions/checkout@main</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Set</span> <span class="hljs-string">up</span> <span class="hljs-string">Docker</span> <span class="hljs-string">Buildx</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">docker/setup-buildx-action@v3</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">"Az CLI login"</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">azure/login@v2</span>
        <span class="hljs-attr">with:</span>
          <span class="hljs-attr">client-id:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.AZURE_CLIENT_ID</span> <span class="hljs-string">}}</span>
          <span class="hljs-attr">tenant-id:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.AZURE_TENANT_ID</span> <span class="hljs-string">}}</span>
          <span class="hljs-attr">subscription-id:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.AZURE_SUBSCRIPTION_ID</span> <span class="hljs-string">}}</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Login</span> <span class="hljs-string">to</span> <span class="hljs-string">ACR</span>
        <span class="hljs-attr">run:</span> <span class="hljs-string">|
          az acr login --name ${{ env.REGISTRY_NAME }}
</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Build</span> <span class="hljs-string">and</span> <span class="hljs-string">push</span> <span class="hljs-string">Docker</span> <span class="hljs-string">image</span>
        <span class="hljs-attr">uses:</span> <span class="hljs-string">docker/build-push-action@v5</span>
        <span class="hljs-attr">with:</span>
          <span class="hljs-attr">file:</span> <span class="hljs-string">./Dockerfile</span>
          <span class="hljs-attr">push:</span> <span class="hljs-literal">true</span>
          <span class="hljs-attr">tags:</span> <span class="hljs-string">${{</span> <span class="hljs-string">env.REGISTRY_NAME</span> <span class="hljs-string">}}.azurecr.io/${{</span> <span class="hljs-string">env.IMAGE_NAME</span> <span class="hljs-string">}}:${{</span> <span class="hljs-string">github.sha</span> <span class="hljs-string">}}</span>
</code></pre>
<p>In this GitHub Actions workflow, we are using the Docker/build-push-action to build and push a container image to the Azure Container Registry. We’re also using Azure RBAC authentication to the registry, opposed to the registry access keys of the Admin User login. The workflow is triggered manually using the workflow_dispatch event.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727485638211/0e869e89-4208-43f8-aa24-f7e4e2544e48.png" alt="Container Registry" /></p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In this blog post, we explored Azure Verified Modules and how you can use them to simplify your Azure deployments. We deployed an Azure Container Registry using Azure Verified Modules and created a GitHub Actions workflows to deploy the infrastructure resources and build and push a container image to an Azure Container Registry.</p>
<p>Azure Verified Modules are a powerful tool that can help you automate the deployment and configuration of Azure services. By using Azure Verified Modules, you can save time and effort and ensure your deployments are secure, reliable, and efficient.</p>
<p>Azure Verified Modules offer significant benefits for businesses, particularly in terms of reducing the need to maintain internal registries, which can be time-consuming and resource-intensive. These modules are pre-built, tested, and verified by Microsoft, which means businesses can leverage them without worrying about their upkeep. By using Azure Verified Modules, businesses can focus more on their core operations and less on the maintenance of infrastructure code. Furthermore, these modules offer best practices and security, providing businesses with the assurance that they are using reliable and secure modules for their Azure deployments. This not only saves time and resources but also enhances the efficiency and reliability of business operations.</p>
<p>I hope you found this blog post helpful. Thank you for reading!</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727485638947/fdcb4475-7d66-4b1c-b3f7-d0ead6229e08.png" alt="🚀" /></p>
]]></content:encoded></item><item><title><![CDATA[Automated Deployments for Azure Kubernetes Service (Preview)]]></title><description><![CDATA[Deploying applications to an AKS cluster can be a complex and time-consuming process. But with Automated Deployments, you can automate your deployments and save time and effort.
AKS provides a fully managed Kubernetes cluster that makes it easy to de...]]></description><link>https://benroberts.io/automated-deployments-for-azure-kubernetes-service-preview</link><guid isPermaLink="true">https://benroberts.io/automated-deployments-for-azure-kubernetes-service-preview</guid><category><![CDATA[aks]]></category><category><![CDATA[Azure]]></category><category><![CDATA[blog]]></category><category><![CDATA[containers]]></category><category><![CDATA[GitHub]]></category><dc:creator><![CDATA[Ben Roberts]]></dc:creator><pubDate>Fri, 03 Nov 2023 21:16:45 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1727492520847/85176699-7f0d-4ffe-bf91-0a13216bba00.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Deploying applications to an AKS cluster can be a complex and time-consuming process. But with Automated Deployments, you can automate your deployments and save time and effort.</p>
<p>AKS provides a fully managed Kubernetes cluster that makes it easy to deploy, manage, and scale your applications. And with automated deployments, you can streamline your deployment process and reduce the risk of errors.</p>
<p>With automated deployments, you can:</p>
<ul>
<li><p>Reduce the need for Kubernetes expertise: Automated deployments eliminate the need for Kubernetes expertise by automating the build and deployment process.</p>
</li>
<li><p>Reduce the risk of errors: Automated deployments eliminate the risk of human error by automating the build (CI) process.</p>
</li>
<li><p>Save time and effort: Automating your deployments saves time and effort by eliminating the need for manual intervention.</p>
</li>
<li><p>Increase consistency: Automated deployments ensure that your deployments (CD) are consistent and repeatable.</p>
</li>
</ul>
<p>With Automated Deployments you can streamline your integration and deployment process and focus on what really matters: releasing great application on AKS.</p>
<p>Automated Deployments automates the following four tasks:</p>
<ul>
<li><p>Generates a Dockerfile</p>
</li>
<li><p>Generates a GitHub Actions (yaml) pipeline</p>
</li>
<li><p>Generates two Kubernetes manifest</p>
</li>
<li><p>Authenticates with GitHub Actions, ACR and AKS</p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>To complete this tutorial, you need:</p>
<ul>
<li><p>An Azure subscription.</p>
</li>
<li><p>A GitHub account.</p>
</li>
<li><p>An application to deploy. If you don’t have an application to deploy, you can use the sample application in this tutorial.</p>
</li>
</ul>
<h2 id="heading-create-azure-resources">Create Azure Resources</h2>
<p>To create the Azure resources, you’ll need for this tutorial, use the following Azure CLI commands:</p>
<pre><code class="lang-powershell">az <span class="hljs-built_in">group</span> create -<span class="hljs-literal">-name</span> myAKSDemo -<span class="hljs-literal">-location</span> australiaeast

az aks create -<span class="hljs-literal">-resource</span><span class="hljs-literal">-group</span> myAKSDemo `
-<span class="hljs-literal">-name</span> myAKSCluster `
-<span class="hljs-literal">-node</span><span class="hljs-literal">-count</span> <span class="hljs-number">1</span> `
-<span class="hljs-literal">-enable</span><span class="hljs-literal">-addons</span> monitoring `
-<span class="hljs-literal">-generate</span><span class="hljs-literal">-ssh</span><span class="hljs-literal">-keys</span> `
-<span class="hljs-literal">-kubernetes</span><span class="hljs-literal">-version</span> <span class="hljs-number">1.28</span>.<span class="hljs-number">0</span>

az acr create -<span class="hljs-literal">-resource</span><span class="hljs-literal">-group</span> myAKSDemo `
-<span class="hljs-literal">-name</span> automateddeployments `
-<span class="hljs-literal">-sku</span> Basic
</code></pre>
<p>Azure Container Registry (ACR) is used to store our containers after they’re built in the Github Actions Workflow. The containers are then deployed into AKS after the Workflow runs the kubectl apply command.</p>
<h2 id="heading-github-repository">GitHub Repository</h2>
<p>For this demo we’ll be using a (very) basic Flask app. Create a folder in the repo called “src” and a file named <code>app.py</code> and <code>requirements.txt</code> inside.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> flask <span class="hljs-keyword">import</span> Flask

app = Flask(__name__)

<span class="hljs-meta">@app.route("/")</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">hello_world</span>():</span>
    <span class="hljs-keyword">return</span> <span class="hljs-string">"&lt;h1&gt;Hello World&lt;/h1&gt;"</span>

<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:
    app.run(debug=<span class="hljs-literal">True</span>)
</code></pre>
<pre><code class="lang-bash">Flask==2.3.3
</code></pre>
<p>Once you’ve committed those changes to your main branch, we can move on to automated deployments configuration.</p>
<h2 id="heading-configure-automated-deployments">Configure Automated Deployments</h2>
<p>In the Azure Portal, select your cluster and click on the <strong>Automated Deployments</strong> tab. Click on <strong>Automatically containerize and deploy</strong> to start the setup.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727485643545/e114c1e6-ab85-496f-97c0-fdf94536de6a.png" alt /></p>
<p>After you’ve authenticated with GitHub, select the repository you want to use for the deployment.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727485644903/8a509ebe-c0fe-4a0b-ab50-c0038ef7b174.png" alt /></p>
<p>The wizard will automatically detect environment runtime. Fill in the port and location for the app.</p>
<p>Don’t forget to select the registry created above at the bottom of the page.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727485646820/c3f494ca-1392-4102-b0ad-34872edb075e.png" alt /></p>
<p>Create a new namespace for the project and proceed to the next step.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727485648338/33bd5731-803c-4bad-95a8-b408881437a6.png" alt /></p>
<p>and click Deploy.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727485650024/42785056-290e-4381-ab63-34057d221a9b.png" alt /></p>
<p>The automated deployment will now generate the credentials and set up permissions between GitHub Actions, the Container Registry and the AKS cluster.</p>
<p>The automated deployment will also create a pull request containing the new Dockerfile, GitHub Actions workflow and Kubernetes (yaml) manifest.</p>
<p>Clicking on “Approve pull request” will open the pull request in GitHub in a new tab.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727485651374/b504f357-c4db-4ca3-af40-0e4bd343a107.png" alt /></p>
<p><strong>Note:</strong> The auto-generated entrypoint/cmd won’t work, update the Dockerfile to use the following instead.</p>
<p><code>CMD [ "python3", "-m" , "flask", "run", "--host=0.0.0.0"</code>]</p>
<p>Approve the merge and delete the branch then head to Action to see the build process.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727485652551/115112cc-e5df-4732-882c-c8ffb17baf0a.png" alt /></p>
<p>The pipeline will build the Docker image, push it to ACR and deploy it to AKS.</p>
<p>After a few minutes the deployment will be complete and the job status from the GitHub Actions pipeline will be reflected under the Automated Deployments tab in the Azure Portal.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727485653585/dfa98453-d238-4e76-924e-b89e85fbee88.png" alt /></p>
<p>You can start to see how you might scale this out by adding more apps in different repo and/or branches.</p>
<p>Head up to “Services and ingress” to get the public IP address of the app.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727485655454/1fe7e78c-de44-4df6-9b74-c9dbda548f4a.png" alt /></p>
<p>Clicking on the external IP will open a tab with the app.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727485656518/c2f5cf18-6467-4d8b-ae84-e5be522d55a7.png" alt /></p>
<p>Success! 🤜🤛</p>
<p>You’ve deployed an application to AKS using without any knowledge of Kubernetes or Docker!</p>
<h2 id="heading-automated-deployments-demo">Automated Deployments Demo</h2>
<p>Now that we’ve got the basic configuration in place, let’s take a look at how automated deployments work.</p>
<p>This is best explained by modifying the code in the GitHub repo and seeing how the automated deployment process works.</p>
<p>Make a change to <code>app.py</code> and commit the changes to a new branch and create a pull request.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> flask <span class="hljs-keyword">import</span> Flask

app = Flask(__name__)

<span class="hljs-meta">@app.route("/")</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">hello_world</span>():</span>
    <span class="hljs-keyword">return</span> <span class="hljs-string">"&lt;h1&gt;Hello World 2.0&lt;/h1&gt;"</span>

<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:
    app.run(debug=<span class="hljs-literal">True</span>)
</code></pre>
<p><code>git checkout -b feature/website-update</code></p>
<p><code>git commit -am "added things"</code></p>
<p><code>git push --set-upstream origin feature/website-update</code></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727485658717/456e968e-f9e9-405a-ad44-f1e3d78dd8ff.png" alt /></p>
<p>After completing the merge, observe the GitHub Actions pipeline.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727485659962/781c7199-9a86-4fcd-8e96-5cd57b836869.png" alt /></p>
<p>and the Azure Portal.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727485661349/cbe3e8d4-a08f-499a-ae63-5cf0e30b566e.png" alt /></p>
<p>After the build is complete, refresh the tab with the app to see the changes.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727485662349/d2b11ecb-5bc9-4efe-9bfc-0b897fa16aa7.png" alt="🪄" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727485663828/936d5296-331c-46a1-be08-55a1b7b7ae45.png" alt /></p>
<h2 id="heading-behind-the-certains">Behind The Certains</h2>
<p>The real magic is in the GitHub workflow.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">name:</span> <span class="hljs-string">pythonapp</span>
<span class="hljs-attr">"on":</span>
    <span class="hljs-attr">push:</span>
        <span class="hljs-attr">branches:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-string">main</span>
    <span class="hljs-attr">workflow_dispatch:</span> {}
<span class="hljs-attr">env:</span>
    <span class="hljs-attr">ACR_RESOURCE_GROUP:</span> <span class="hljs-string">myAKSDemo</span>
    <span class="hljs-attr">AZURE_CONTAINER_REGISTRY:</span> <span class="hljs-string">automateddeployments</span>
    <span class="hljs-attr">CLUSTER_NAME:</span> <span class="hljs-string">myDemoCluster</span>
    <span class="hljs-attr">CLUSTER_RESOURCE_GROUP:</span> <span class="hljs-string">myAKSDemo</span>
    <span class="hljs-attr">CONTAINER_NAME:</span> <span class="hljs-string">pythonapp</span>
    <span class="hljs-attr">DEPLOYMENT_MANIFEST_PATH:</span> <span class="hljs-string">|
        manifests/deployment.yaml
        manifests/service.yaml
</span><span class="hljs-attr">jobs:</span>
    <span class="hljs-attr">buildImage:</span>
        <span class="hljs-attr">permissions:</span>
            <span class="hljs-attr">contents:</span> <span class="hljs-string">read</span>
            <span class="hljs-attr">id-token:</span> <span class="hljs-string">write</span>
        <span class="hljs-attr">runs-on:</span> <span class="hljs-string">ubuntu-latest</span>
        <span class="hljs-attr">steps:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">uses:</span> <span class="hljs-string">actions/checkout@v3</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">uses:</span> <span class="hljs-string">azure/login@&lt;ID&gt;</span>
              <span class="hljs-attr">name:</span> <span class="hljs-string">Azure</span> <span class="hljs-string">login</span>
              <span class="hljs-attr">with:</span>
                <span class="hljs-attr">client-id:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.AZURE_CLIENT_ID</span> <span class="hljs-string">}}</span>
                <span class="hljs-attr">subscription-id:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.AZURE_SUBSCRIPTION_ID</span> <span class="hljs-string">}}</span>
                <span class="hljs-attr">tenant-id:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.AZURE_TENANT_ID</span> <span class="hljs-string">}}</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Build</span> <span class="hljs-string">and</span> <span class="hljs-string">push</span> <span class="hljs-string">image</span> <span class="hljs-string">to</span> <span class="hljs-string">ACR</span>
              <span class="hljs-attr">run:</span> <span class="hljs-string">az</span> <span class="hljs-string">acr</span> <span class="hljs-string">build</span> <span class="hljs-string">--image</span> <span class="hljs-string">${{</span> <span class="hljs-string">env.CONTAINER_NAME</span> <span class="hljs-string">}}:${{</span> <span class="hljs-string">github.sha</span> <span class="hljs-string">}}</span> <span class="hljs-string">--registry</span> <span class="hljs-string">${{</span> <span class="hljs-string">env.AZURE_CONTAINER_REGISTRY</span> <span class="hljs-string">}}</span> <span class="hljs-string">-g</span> <span class="hljs-string">${{</span> <span class="hljs-string">env.ACR_RESOURCE_GROUP</span> <span class="hljs-string">}}</span> <span class="hljs-string">-f</span> <span class="hljs-string">Dockerfile</span> <span class="hljs-string">./src</span>
    <span class="hljs-attr">deploy:</span>
        <span class="hljs-attr">permissions:</span>
            <span class="hljs-attr">actions:</span> <span class="hljs-string">read</span>
            <span class="hljs-attr">contents:</span> <span class="hljs-string">read</span>
            <span class="hljs-attr">id-token:</span> <span class="hljs-string">write</span>
        <span class="hljs-attr">runs-on:</span> <span class="hljs-string">ubuntu-latest</span>
        <span class="hljs-attr">needs:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-string">buildImage</span>
        <span class="hljs-attr">steps:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">uses:</span> <span class="hljs-string">actions/checkout@v3</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">uses:</span> <span class="hljs-string">azure/login@&lt;ID&gt;</span>
              <span class="hljs-attr">name:</span> <span class="hljs-string">Azure</span> <span class="hljs-string">login</span>
              <span class="hljs-attr">with:</span>
                <span class="hljs-attr">client-id:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.AZURE_CLIENT_ID</span> <span class="hljs-string">}}</span>
                <span class="hljs-attr">subscription-id:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.AZURE_SUBSCRIPTION_ID</span> <span class="hljs-string">}}</span>
                <span class="hljs-attr">tenant-id:</span> <span class="hljs-string">${{</span> <span class="hljs-string">secrets.AZURE_TENANT_ID</span> <span class="hljs-string">}}</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">uses:</span> <span class="hljs-string">azure/use-kubelogin@v1</span>
              <span class="hljs-attr">name:</span> <span class="hljs-string">Set</span> <span class="hljs-string">up</span> <span class="hljs-string">kubelogin</span> <span class="hljs-string">for</span> <span class="hljs-string">non-interactive</span> <span class="hljs-string">login</span>
              <span class="hljs-attr">with:</span>
                <span class="hljs-attr">kubelogin-version:</span> <span class="hljs-string">v0.0.25</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">uses:</span> <span class="hljs-string">azure/aks-set-context@v3</span>
              <span class="hljs-attr">name:</span> <span class="hljs-string">Get</span> <span class="hljs-string">K8s</span> <span class="hljs-string">context</span>
              <span class="hljs-attr">with:</span>
                <span class="hljs-attr">admin:</span> <span class="hljs-string">"false"</span>
                <span class="hljs-attr">cluster-name:</span> <span class="hljs-string">${{</span> <span class="hljs-string">env.CLUSTER_NAME</span> <span class="hljs-string">}}</span>
                <span class="hljs-attr">resource-group:</span> <span class="hljs-string">${{</span> <span class="hljs-string">env.CLUSTER_RESOURCE_GROUP</span> <span class="hljs-string">}}</span>
                <span class="hljs-attr">use-kubelogin:</span> <span class="hljs-string">"true"</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">uses:</span> <span class="hljs-string">Azure/k8s-deploy@v4</span>
              <span class="hljs-attr">name:</span> <span class="hljs-string">Deploys</span> <span class="hljs-string">application</span>
              <span class="hljs-attr">with:</span>
                <span class="hljs-attr">action:</span> <span class="hljs-string">deploy</span>
                <span class="hljs-attr">images:</span> <span class="hljs-string">${{</span> <span class="hljs-string">env.AZURE_CONTAINER_REGISTRY</span> <span class="hljs-string">}}.azurecr.io/${{</span> <span class="hljs-string">env.CONTAINER_NAME</span> <span class="hljs-string">}}:${{</span> <span class="hljs-string">github.sha</span> <span class="hljs-string">}}</span>
                <span class="hljs-attr">manifests:</span> <span class="hljs-string">${{</span> <span class="hljs-string">env.DEPLOYMENT_MANIFEST_PATH</span> <span class="hljs-string">}}</span>
                <span class="hljs-attr">namespace:</span> <span class="hljs-string">flask</span>
</code></pre>
<p>Here’s a brief description of what’s happening:</p>
<ul>
<li><p><code>az acr build</code> containerizes the latest version of the code from the Dockerfile and pushes it to the ACR.</p>
</li>
<li><p><code>azure/use-kubelogin@v1</code> and <code>azure/aks-set-context@v3</code> logs into the AKS cluster.</p>
</li>
<li><p><code>Azure/k8s-deploy@v4</code> deploys the new container image to AKS using <code>deployment.yaml</code></p>
</li>
</ul>
<h2 id="heading-not-without-its-flaws">Not Without Its Flaws</h2>
<p>During the process of writing this blog I tried three times to get the Automated Deployment to work:</p>
<ul>
<li><p>The first attempt I used a Python app that I hadn’t tested as a container,</p>
</li>
<li><p>The second attempt I used a Node app that I had used on AKS before and,</p>
</li>
<li><p>The third attempt I used a Go app that I had used with Docker locally.</p>
</li>
</ul>
<p>Each time I ran into the same issue: the automated deployment process failed to generate a usable Dockerfile. So, I had to troubleshoot the issue with Docker locally before I could move on.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Automated Deployments is a fantastic solution for abstracting away the complexities of containers builds and Kubernetes deployments. All you need is a basic understanding of Docker and you’ll have apps running on AKS in no time. I think it’s a great idea and the automation to generate the various files and pull request integration works really well, but the overall solution feels a bit… incomplete. I’d love to see the wizard get integration with an ingress controller to make it more than just a pod running on a cluster for a start, to bootstrap the blue/green or canary deployments. Similar to Azure Web App Slots.</p>
<p>As always, thanks for reading and happy coding!</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727485664652/5684459b-99ef-4536-b49a-aaade3482217.png" alt="🤘" /></p>
<p>If you’d like more information Automated Deployments, you can find it <a target="_blank" href="https://learn.microsoft.com/en-us/azure/aks/automated-deployments">here</a>.</p>
<h2 id="heading-cleanup">Cleanup</h2>
<p>When you’re ready to delete your resources, you can use the following command to remove the resource group, AKS cluster, and all related resources.</p>
<p><code>az group delete --name myAKSDemo --yes --no-wait</code></p>
]]></content:encoded></item></channel></rss>