<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Telemetry on Matt Goodrich</title><link>https://mattgoodrich.com/tags/telemetry/</link><description>Recent content in Telemetry on Matt Goodrich</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><lastBuildDate>Sat, 30 May 2026 12:00:00 -0700</lastBuildDate><atom:link href="https://mattgoodrich.com/tags/telemetry/index.xml" rel="self" type="application/rss+xml"/><item><title>Your Access Review Is Already Stale: Telemetry-Driven Least Privilege</title><link>https://mattgoodrich.com/posts/telemetry-driven-access-reviews/</link><pubDate>Sat, 30 May 2026 12:00:00 -0700</pubDate><guid>https://mattgoodrich.com/posts/telemetry-driven-access-reviews/</guid><description>&lt;img src="https://mattgoodrich.com/posts/telemetry-driven-access-reviews/header.png" alt="Featured image of post Your Access Review Is Already Stale: Telemetry-Driven Least Privilege" />&lt;p>Every quarter, an email arrives. &amp;ldquo;Please review the access for your team.&amp;rdquo; A manager who hasn&amp;rsquo;t touched a &lt;code>kubectl&lt;/code> command in two years clicks &lt;strong>Approve&lt;/strong> on a list of permissions they don&amp;rsquo;t understand for people whose work they only partially see. Compliance gets its checkbox. Nothing actually changed.&lt;/p>
&lt;p>The system that&amp;rsquo;s supposed to catch over-permissioning produced no useful signal. The data to do it well already exists in the systems that grant and serve those permissions. We just keep asking humans to review what telemetry could decide.&lt;/p>
&lt;p>This isn&amp;rsquo;t an indictment of access reviews. It&amp;rsquo;s an indictment of how we run them.&lt;/p>
&lt;h2 id="the-quarterly-ritual">The Quarterly Ritual
&lt;/h2>&lt;p>Most organizations run access reviews the same way. Manager-driven. Based on a static list of role/group/permission membership. Asking the manager to validate access for direct reports whose day-to-day work the manager only partially sees.&lt;/p>
&lt;p>The &amp;ldquo;LGTM&amp;rdquo; (Looks Good To Me) rate on these reviews approaches 100% in most orgs I&amp;rsquo;ve seen or heard about. There&amp;rsquo;s no real signal in the approvals because there can&amp;rsquo;t be. The manager doesn&amp;rsquo;t have the information to evaluate &amp;ldquo;should this person have this permission.&amp;rdquo; They have the information to evaluate &amp;ldquo;is this person still on my team.&amp;rdquo;&lt;/p>
&lt;p>Compliance teams know this. They&amp;rsquo;re stuck because the controls framework asks for a periodic review, and a periodic review is what gets provided. The framework wants evidence of process; process is what we generate.&lt;/p>
&lt;p>The actual outcome: stale permissions accumulate. Departed teammates&amp;rsquo; service accounts linger. The Jenkins token from the project that ended in 2023 still has prod write. The contractor who left last year is still in the security group. Everyone sort-of-knows the review is theater. Nobody has a clear path to make it not theater.&lt;/p>
&lt;h2 id="the-data-already-exists">The Data Already Exists
&lt;/h2>&lt;p>Here&amp;rsquo;s the thing that should bother you. The data to do this well exists. Most of it has existed for years.&lt;/p>
&lt;p>&lt;strong>Identity provider telemetry.&lt;/strong> SSO/IdP logs from Okta, Entra ID, or whichever vendor you use. Every authentication, every app accessed, every conditional access decision. SCIM/IGA tooling that tracks what&amp;rsquo;s granted, when, by whom.&lt;/p>
&lt;p>&lt;strong>Cloud and infrastructure telemetry.&lt;/strong> CloudTrail, GCP Audit Logs, Azure Activity Logs: every API call with caller identity. Kubernetes audit logs: every &lt;code>kubectl&lt;/code> operation by user or service account. Database audit logs (where enabled): every query by user. Vault and secret-manager logs: every secret read with caller identity.&lt;/p>
&lt;p>&lt;strong>Application-layer telemetry.&lt;/strong> API gateway access logs with bearer-token identity. SaaS admin actions across Salesforce, GitHub, Snowflake. Internal tool authorization decisions. Even custom apps usually emit &amp;ldquo;user X did Y&amp;rdquo; somewhere.&lt;/p>
&lt;p>The data isn&amp;rsquo;t missing. The data isn&amp;rsquo;t even hard to get. We just don&amp;rsquo;t &lt;em>use&lt;/em> it to inform the review. We treat the review as a separate exercise that happens on a calendar and produces a snapshot, instead of as a continuous output of the systems that already know who is using what.&lt;/p>
&lt;h2 id="what-telemetry-driven-least-privilege-looks-like">What Telemetry-Driven Least Privilege Looks Like
&lt;/h2>&lt;p>The shape of the system: continuous evaluation of every (user, permission) pair against actual usage telemetry, with auto-revocation policies for permissions that go stale.&lt;/p>
&lt;h3 id="the-three-questions-a-telemetry-driven-review-answers">The Three Questions a Telemetry-Driven Review Answers
&lt;/h3>&lt;p>For every (user, permission) pair, three questions:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Has it ever been used?&lt;/strong> Granted but never used is a clear candidate for revocation.&lt;/li>
&lt;li>&lt;strong>Has it been used recently?&lt;/strong> Used 14 months ago and not since is also a candidate, with a grace period.&lt;/li>
&lt;li>&lt;strong>Is the usage pattern consistent with the role?&lt;/strong> A read-only API key suddenly making writes is an anomaly worth investigating, even if the writes are technically authorized.&lt;/li>
&lt;/ol>
&lt;p>A manual quarterly review can&amp;rsquo;t answer any of these reliably. Telemetry can answer all three continuously.&lt;/p>
&lt;h3 id="continuous-not-periodic">Continuous, Not Periodic
&lt;/h3>&lt;p>The system runs every day, not every 90 days. Permissions hit a &amp;ldquo;stale&amp;rdquo; threshold (call it 90 days without use) and the user or owner gets a soft notification: &lt;em>&amp;ldquo;You have access to X that hasn&amp;rsquo;t been used in 90 days. Keep it or drop it.&amp;rdquo;&lt;/em> Permissions hit a &amp;ldquo;dormant&amp;rdquo; threshold (180 days, say) and the system auto-revokes with notification and an easy re-grant flow.&lt;/p>
&lt;p>The user still gets a human-in-the-loop moment. It&amp;rsquo;s just targeted, just-in-time, and based on real data instead of being a quarterly batch reviewed at 3 seconds per item.&lt;/p>
&lt;h3 id="make-re-granting-cheaper-than-hoarding">Make Re-Granting Cheaper Than Hoarding
&lt;/h3>&lt;p>This is the make-or-break design decision. If re-granting takes a week, people will hoard permissions out of self-preservation. If re-granting is one command plus a justification plus an automatic approval for low-risk permissions, people will let go of unused permissions because losing them is reversible.&lt;/p>
&lt;p>The principle: make the &lt;em>correct&lt;/em> path easier than the cautious path. If staying over-permissioned is more convenient than re-acquiring access when needed, your system has the wrong incentives by design.&lt;/p>
&lt;h3 id="anomaly-driven-investigation-not-calendar-driven-review">Anomaly-Driven Investigation, Not Calendar-Driven Review
&lt;/h3>&lt;p>&amp;ldquo;User X used a permission they hadn&amp;rsquo;t used in six months&amp;rdquo; is more interesting than &amp;ldquo;User X is in the developers group.&amp;rdquo; &amp;ldquo;Service account Y started writing to a table it had only ever read from&amp;rdquo; is more interesting than &amp;ldquo;Service account Y is in the data-pipeline role.&amp;rdquo;&lt;/p>
&lt;p>The review isn&amp;rsquo;t a once-a-quarter ritual. It&amp;rsquo;s a continuous stream of signals, with humans in the loop only when the signal is anomalous. The anomalies are where human judgment actually adds value. The membership lists are not.&lt;/p>
&lt;h2 id="the-precondition-no-long-lived-production-access">The Precondition: No Long-Lived Production Access
&lt;/h2>&lt;p>The whole picture only works if production access is ephemeral.&lt;/p>
&lt;p>Just-in-time elevation via Teleport, AWS SSO temporary creds, OIDC workload identity, break-glass with auto-expiry. No standing prod admin grants. Service accounts using short-lived tokens, not static API keys. When the access expires, the question &amp;ldquo;is this permission still needed&amp;rdquo; is asked by &lt;em>the system&lt;/em> every time someone re-requests it.&lt;/p>
&lt;p>Without this precondition, the &amp;ldquo;use telemetry to expire stale grants&amp;rdquo; model degrades. Because the dangerous grants are the ones that never expire and are therefore never re-justified. And those are the ones that periodic review fails on too.&lt;/p>
&lt;p>The good news is the precondition is achievable. JIT access tooling has matured significantly. The blocker is usually organizational change: the SRE team that&amp;rsquo;s used to having keys, the legacy app that doesn&amp;rsquo;t support workload identity yet, the on-call rotation that needs break-glass for the 3am call. Those are real constraints, but they&amp;rsquo;re contained, not fundamental.&lt;/p>
&lt;p>Even with the precondition met, the model has a scope limit worth naming. JIT and short-lived tokens cover production infrastructure well: the cloud, the cluster, the database. They cover less of the rest. SaaS apps that aren&amp;rsquo;t yet behind a JIT broker still hand out standing access. IdP group memberships persist across roles. Service accounts in legacy systems hold static credentials because the legacy systems don&amp;rsquo;t speak workload identity. Break-glass accounts exist on purpose and won&amp;rsquo;t be JIT-mediated.&lt;/p>
&lt;p>The realistic outcome is hybrid. JIT shrinks the scope of the periodic review by the size of the JIT-covered estate, which is significant, without eliminating the review entirely, because there is a long tail of access the broker doesn&amp;rsquo;t see. PCI DSS 7.2.4 also hard-codes a six-month review of all user accounts and access privileges regardless of telemetry quality, so for the PCI-scoped portion of the estate that calendar item stays. The honest claim is that telemetry-driven review plus JIT replaces the theater of the quarterly ritual and shrinks the surface area where the ritual still applies, without ever fully eliminating it.&lt;/p>
&lt;h2 id="where-this-lands-grc-engineering-and-iam">Where This Lands GRC Engineering and IAM
&lt;/h2>&lt;p>This is a discipline that sits at the intersection of &lt;a class="link" href="https://mattgoodrich.com/posts/grc-engineering/" >GRC engineering&lt;/a> and IAM, and it benefits from being thought about as that intersection rather than as a problem belonging to either side alone.&lt;/p>
&lt;h3 id="for-grc-engineering">For GRC Engineering
&lt;/h3>&lt;p>Compliance frameworks ask for evidence that access is appropriate. They don&amp;rsquo;t actually mandate manager-clicks-approve, and in most cases they don&amp;rsquo;t even mandate a cadence. SOC 2 CC6.3 says reviews should happen &amp;ldquo;on a periodic basis&amp;rdquo; without specifying when. ISO 27001 A.5.18 says access rights are reviewed &amp;ldquo;at planned intervals,&amp;rdquo; with the interval set by the organization. NIST 800-53 AC-2(j) literally reads &amp;ldquo;[Assignment: organization-defined frequency].&amp;rdquo; The frameworks treat cadence as something you decide and defend, not as a fixed calendar requirement. The exception is PCI DSS 7.2.4, which hard-codes six months for human accounts. Everything else is control-objective, not calendar.&lt;/p>
&lt;p>A telemetry-driven access review produces &lt;em>better&lt;/em> evidence: not &amp;ldquo;manager approved on date X&amp;rdquo; but &amp;ldquo;permission Y was unused for Z days and was revoked on date W, with these N exceptions explicitly justified.&amp;rdquo; The auditor wants to know your access is appropriate. Telemetry-driven review answers that question with data instead of with attestation.&lt;/p>
&lt;p>This is the GRC-as-code pattern. Stop generating PDFs about controls and start emitting machine-readable assertions about what&amp;rsquo;s actually happening.&lt;/p>
&lt;h3 id="for-iam">For IAM
&lt;/h3>&lt;p>The shift is from role-based static grants to &lt;a class="link" href="https://mattgoodrich.com/posts/agents-need-capabilities-not-roles/" >capability-based continuous evaluation&lt;/a>. Roles still exist as the bundle that gets granted; what changes is that the bundle is evaluated against actual usage and pruned automatically.&lt;/p>
&lt;p>The IAM team&amp;rsquo;s job moves from &amp;ldquo;grant management&amp;rdquo; to &amp;ldquo;policy management&amp;rdquo;: defining what stale means, what auto-revoke windows apply to which classes of permission, what the re-grant friction should be per risk tier. The day-to-day operational toil of access reviews disappears, replaced by the work of designing the policies that make the toil unnecessary.&lt;/p>
&lt;h2 id="what-this-changes-for-the-audit-conversation">What This Changes for the Audit Conversation
&lt;/h2>&lt;p>The audit conversation gets faster and more rigorous at the same time.&lt;/p>
&lt;p>&amp;ldquo;Show me your access review&amp;rdquo; → here&amp;rsquo;s the dashboard. Live. Every revocation event timestamped and attributed.&lt;/p>
&lt;p>&amp;ldquo;Show me a sample of approvals&amp;rdquo; → here&amp;rsquo;s every (user, permission) over the last 12 months with last-used date and decision rationale.&lt;/p>
&lt;p>&amp;ldquo;Show me your exception process&amp;rdquo; → here are the cases where a permission was kept despite being unused, with the human justification on each one.&lt;/p>
&lt;p>The auditor stops asking for evidence of a process and starts asking for evidence of &lt;em>outcomes&lt;/em>, which is what they actually wanted in the first place. Once you can produce outcomes-based evidence, the burden of every audit cycle drops, because you&amp;rsquo;re not assembling evidence packages on an artificial cadence. You&amp;rsquo;re querying a system that already has the answer.&lt;/p>
&lt;h2 id="ai-earns-the-triage-not-the-decision">AI Earns the Triage, Not the Decision
&lt;/h2>&lt;p>The next practical question is whether AI agents can do the access review. The honest answer, today, is yes for some of it and no for the part that matters.&lt;/p>
&lt;p>What auditors will accept is AI doing the triage, the prioritization, the explanation, and the evidence packaging. A model can read 500 (user, permission) tuples, mark the unused ones, write a plain-language justification for each recommended action, and route the actionable ones to a human. Lumos, Veza, Opal, ConductorOne, SailPoint, and Saviynt all ship versions of this today. It produces faster, more consistent evidence than the human-only equivalent.&lt;/p>
&lt;p>What auditors will not accept, today, is the agent making the decision unsupervised. The IIA&amp;rsquo;s AI Auditing Framework and ISACA&amp;rsquo;s guidance on AI in audit land in the same place: a named human is accountable for an AI-assisted decision, and the agent&amp;rsquo;s reasoning becomes part of the evidence the human attests to, not a substitute for the attestation. The EU AI Act&amp;rsquo;s Annex III classification reinforces it for any workforce-access decisioning.&lt;/p>
&lt;p>The shape that works is the same shape as the rest of this argument. Continuous telemetry runs the program. JIT carries the production load. AI takes the volume of routine decisions and presents them to a human as a short, justified list. The human signs the call. The system is faster and more rigorous, and still has a person on the hook the way every framework requires.&lt;/p>
&lt;h2 id="why-humans-will-always-lose-this-race">Why Humans Will Always Lose This Race
&lt;/h2>&lt;p>The honest argument for the data-over-judgment position.&lt;/p>
&lt;p>A manager reviews access for ~10 reports across ~50 systems once a quarter. That&amp;rsquo;s roughly 500 (user, permission) tuples reviewed in maybe 30 minutes. ~3.6 seconds per item. That isn&amp;rsquo;t review. That&amp;rsquo;s a rubber stamp.&lt;/p>
&lt;p>Telemetry evaluates the same 500 tuples in 30 milliseconds and never gets bored. More importantly, it knows which of those 500 the user &lt;em>actually used&lt;/em>. That is information the manager doesn&amp;rsquo;t have and can&amp;rsquo;t reasonably get.&lt;/p>
&lt;p>This isn&amp;rsquo;t an argument against human judgment. It&amp;rsquo;s an argument for putting human judgment where it actually matters: the anomalies, the exceptions, the genuinely ambiguous cases. The 5% of decisions that need a person. Not the 95% that need a query.&lt;/p>
&lt;h2 id="whats-hard-about-this">What&amp;rsquo;s Hard About This
&lt;/h2>&lt;p>Worth being honest about the gaps.&lt;/p>
&lt;p>&lt;strong>Data quality.&lt;/strong> Not every permission system emits clean usage logs. Some permissions are inferred (was this S3 bucket read because of role X or role Y?). Some are missing entirely (legacy apps, on-prem systems that never got modern auditing). Where the data is bad or missing, telemetry-driven review can&amp;rsquo;t help, and you&amp;rsquo;re back to manual review for those slices.&lt;/p>
&lt;p>&lt;strong>Permission granularity.&lt;/strong> &amp;ldquo;User has the developer role&amp;rdquo; is observable. &amp;ldquo;User has the developer role and used the kubectl-exec capability&amp;rdquo; requires deeper telemetry plumbing. Coarse permissions are easier to review against; fine-grained capabilities are where the real over-permissioning hides.&lt;/p>
&lt;p>&lt;strong>Cross-system correlation.&lt;/strong> A user touches Okta, then GitHub, then AWS, then Snowflake. Stitching that into a single &amp;ldquo;what did they actually use&amp;rdquo; view is non-trivial. Most orgs haven&amp;rsquo;t connected these views yet.&lt;/p>
&lt;p>&lt;strong>Edge cases that look like inactivity.&lt;/strong> A permission used once a year for the annual audit looks identical to a forgotten grant. The system needs to support &amp;ldquo;rare but real&amp;rdquo; patterns, typically with explicit annotations on the permission itself.&lt;/p>
&lt;p>&lt;strong>Service accounts.&lt;/strong> &lt;a class="link" href="https://mattgoodrich.com/posts/service-accounts-that-improvise/" >Bot identities&lt;/a> don&amp;rsquo;t have managers and don&amp;rsquo;t read notification emails. The lifecycle has to be owned by the team that owns the workload, with automation enforcing rotation and revocation. Many orgs don&amp;rsquo;t have good ownership data on their service accounts to begin with.&lt;/p>
&lt;p>None of these are blockers to starting. They&amp;rsquo;re the parts you build out as you mature the system.&lt;/p>
&lt;h2 id="the-access-review-wasnt-a-bad-idea">The Access Review Wasn&amp;rsquo;t a Bad Idea
&lt;/h2>&lt;p>The access review wasn&amp;rsquo;t a bad idea. It was the right idea built for an era when telemetry didn&amp;rsquo;t exist and humans were the only review mechanism. That era ended.&lt;/p>
&lt;p>The telemetry exists. The dashboards exist. The auto-revocation tooling exists. The blocker is organizational, not technical: replacing a ritual that everyone agrees is theater is harder than it should be, because the ritual is &lt;em>legible&lt;/em> to auditors and the new model isn&amp;rsquo;t yet.&lt;/p>
&lt;p>GRC engineering is the discipline that makes the new model legible. IAM is the discipline that makes it operational. The two have to meet.&lt;/p>
&lt;p>Telemetry-driven least privilege isn&amp;rsquo;t a vision. It&amp;rsquo;s an engineering project that pays back in audit speed, reduced standing privilege, and fewer departing-employee credential leaks. The argument for doing it is the same argument as for any of the things compliance organizations have already accepted in adjacent domains: automated evidence collection, continuous monitoring, GRC-as-code.&lt;/p>
&lt;p>The access review was the last big domain that hadn&amp;rsquo;t moved. It&amp;rsquo;s time it moved.&lt;/p></description></item></channel></rss>