<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Copyright on Matt Goodrich</title><link>https://mattgoodrich.com/tags/copyright/</link><description>Recent content in Copyright on Matt Goodrich</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><lastBuildDate>Sat, 06 Jun 2026 12:00:00 -0700</lastBuildDate><atom:link href="https://mattgoodrich.com/tags/copyright/index.xml" rel="self" type="application/rss+xml"/><item><title>Who Wrote This Code? A Layered Approach to AI Attribution and Provenance</title><link>https://mattgoodrich.com/posts/ai-code-attribution-and-provenance/</link><pubDate>Sat, 06 Jun 2026 12:00:00 -0700</pubDate><guid>https://mattgoodrich.com/posts/ai-code-attribution-and-provenance/</guid><description>&lt;img src="https://mattgoodrich.com/posts/ai-code-attribution-and-provenance/header.png" alt="Featured image of post Who Wrote This Code? A Layered Approach to AI Attribution and Provenance" />&lt;p>A simple question that most engineering organizations can&amp;rsquo;t answer: &lt;strong>which code in our repos was written by AI, which was written by humans, and who is accountable for each?&lt;/strong>&lt;/p>
&lt;p>It sounds like the kind of thing you&amp;rsquo;d just know. In practice, almost no one knows. AI tooling is rolling out across teams faster than the governance stack can keep up. Each tool has different attribution behavior: some attribute by default, most don&amp;rsquo;t. Almost no one is writing this down at the commit level. And the moment a customer, an auditor, or a court asks the question, the answer is a shrug.&lt;/p>
&lt;p>This is the architecture I&amp;rsquo;d recommend, the working git hooks for the layers you can roll out today, and an honest accounting of what this approach doesn&amp;rsquo;t solve.&lt;/p>
&lt;h2 id="why-attribution-matters-supposedly">Why Attribution Matters (Supposedly)
&lt;/h2>&lt;p>Four domains care about the answer to &amp;ldquo;who wrote this?&amp;rdquo;, and they care for different reasons.&lt;/p>
&lt;p>&lt;strong>Intellectual property.&lt;/strong> The U.S. Copyright Office has held that works created entirely by AI are not copyrightable. Authors must &amp;ldquo;identify and disclaim AI-generated parts&amp;rdquo; to protect what humans authored. Without attribution, you cannot accurately represent ownership of your codebase, which becomes a problem the moment IP changes hands, gets licensed, or ends up in court.&lt;/p>
&lt;p>&lt;strong>Security.&lt;/strong> &lt;a class="link" href="https://www.veracode.com/resources/analyst-reports/2025-genai-code-security-report/" target="_blank" rel="noopener"
>Veracode&amp;rsquo;s 2025 GenAI Code Security Report&lt;/a> found that 45% of AI-generated code samples contained a security flaw, including OWASP Top 10 vulnerabilities, across more than 100 models, and that newer and larger models were no safer. On secrets, &lt;a class="link" href="https://www.gitguardian.com/state-of-secrets-sprawl-report-2025" target="_blank" rel="noopener"
>GitGuardian&amp;rsquo;s 2025 State of Secrets Sprawl&lt;/a> found public repositories with GitHub Copilot active leaked secrets at 6.4%, against a 4.6% baseline. Whether or not those numbers hold for your codebase, the implication is the same: you can&amp;rsquo;t prioritize security review for AI-authored code if you can&amp;rsquo;t find AI-authored code.&lt;/p>
&lt;p>&lt;strong>Regulatory compliance.&lt;/strong> EU AI Act enforcement begins August 2026. The Act requires machine-readable disclosure on AI-generated content. Whether source code is in scope is ambiguous, but the prudent posture is to treat it as covered, because the compliance cost of being wrong is much higher than the cost of attribution.&lt;/p>
&lt;p>&lt;strong>Audit readiness.&lt;/strong> SOC 2, ISO 27001, and customer due-diligence reviews are starting to ask about AI usage in codebases. You want to answer those questions with data, not estimates. &amp;ldquo;We don&amp;rsquo;t know&amp;rdquo; is not a competitive answer in a vendor security questionnaire.&lt;/p>
&lt;h2 id="the-attribution-spectrum">The Attribution Spectrum
&lt;/h2>&lt;p>The first thing to get past is the binary mental model. &amp;ldquo;AI-written&amp;rdquo; vs &amp;ldquo;human-written&amp;rdquo; is not a coin flip. Code authorship lives on a spectrum:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Level&lt;/th>
&lt;th>Description&lt;/th>
&lt;th>Example&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Autonomous&lt;/td>
&lt;td>AI agent writes and commits with no human editing&lt;/td>
&lt;td>A coding agent generates a module from a ticket and pushes&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>AI-Primary&lt;/td>
&lt;td>Human prompts, AI generates, human accepts verbatim&lt;/td>
&lt;td>Developer describes a function, accepts the suggestion as-is&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Collaborative&lt;/td>
&lt;td>Human and AI alternate edits within the same unit&lt;/td>
&lt;td>Pair-programming with an inline assistant, each contributing lines&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>AI-Assisted&lt;/td>
&lt;td>Human writes, AI provides completions or suggestions inline&lt;/td>
&lt;td>Accepting a 3-line autocomplete&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>AI-Informed&lt;/td>
&lt;td>Human writes independently with AI as a reference&lt;/td>
&lt;td>Asking Claude about an API while writing the code yourself&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Human-Only&lt;/td>
&lt;td>No AI involvement&lt;/td>
&lt;td>Traditional development&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Any attribution architecture has to handle this whole spectrum. A system that only records &amp;ldquo;did AI touch this commit?&amp;rdquo; gives you a single dirty bit. A system that captures &lt;em>degree&lt;/em> of AI involvement gives you something you can actually act on.&lt;/p>
&lt;h2 id="the-tooling-reality">The Tooling Reality
&lt;/h2>&lt;p>Here&amp;rsquo;s where the gap shows up: most AI coding tools do not self-attribute by default.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Tool&lt;/th>
&lt;th>Auto-Attributes?&lt;/th>
&lt;th>Method&lt;/th>
&lt;th>Gap&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;strong>Claude Code&lt;/strong>&lt;/td>
&lt;td>Yes&lt;/td>
&lt;td>&lt;code>Co-Authored-By&lt;/code> trailer on every commit&lt;/td>
&lt;td>No line-level granularity&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Aider&lt;/strong>&lt;/td>
&lt;td>Yes&lt;/td>
&lt;td>&lt;code>(aider)&lt;/code> in author name + model co-author&lt;/td>
&lt;td>Non-standard format&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>OpenAI Codex CLI&lt;/strong>&lt;/td>
&lt;td>Configurable&lt;/td>
&lt;td>Optional &lt;code>Co-Authored-By&lt;/code> via prepare-commit-msg hook&lt;/td>
&lt;td>Off by default; some subcommands emit no telemetry&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>GitHub Copilot&lt;/strong>&lt;/td>
&lt;td>No&lt;/td>
&lt;td>None at the commit level&lt;/td>
&lt;td>Aggregate metrics API only&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Cursor&lt;/strong>&lt;/td>
&lt;td>No&lt;/td>
&lt;td>None at the commit level&lt;/td>
&lt;td>Per-acceptance only at Enterprise tier&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Gemini Code Assist&lt;/strong>&lt;/td>
&lt;td>No&lt;/td>
&lt;td>None at the commit level&lt;/td>
&lt;td>Invisible at commit level&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>The takeaway: if you want commit-level attribution, you have to enforce it through git conventions and hooks. You cannot rely on the tooling to do it for you, because most of the tooling won&amp;rsquo;t.&lt;/p>
&lt;h2 id="a-layered-attribution-architecture">A Layered Attribution Architecture
&lt;/h2>&lt;p>The right shape here is layered. Each layer is independently valuable. You don&amp;rsquo;t need a &amp;ldquo;big bang&amp;rdquo; rollout. You start with Layer 1 today and layer the rest in when you&amp;rsquo;re ready.&lt;/p>
&lt;h3 id="layer-1-git-commit-attribution">Layer 1: Git Commit Attribution
&lt;/h3>&lt;p>&lt;strong>Mechanism:&lt;/strong> standardize git commit metadata.&lt;/p>
&lt;p>For autonomous AI commits, set the author to a non-human identity:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="cl">git commit --author&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;ai-agent &amp;lt;ai@example.com&amp;gt;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># The committer remains the human; git preserves both fields&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>For AI-assisted human commits, require a &lt;code>Co-Authored-By&lt;/code> trailer in the message body:&lt;/p>
&lt;pre tabindex="0">&lt;code>Refactor pricing module to use new cache layer.
Co-Authored-By: Claude Code &amp;lt;noreply@anthropic.com&amp;gt;
Signed-off-by: Jane Doe &amp;lt;jane@example.com&amp;gt;
&lt;/code>&lt;/pre>&lt;p>A &lt;code>prepare-commit-msg&lt;/code> hook auto-adds the trailer when an &lt;code>AI_TOOL&lt;/code> environment variable is set:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="cl">&lt;span class="cp">#!/usr/bin/env bash
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="cp">&lt;/span>&lt;span class="c1"># .githooks/prepare-commit-msg&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nv">COMMIT_MSG_FILE&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$1&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">if&lt;/span> &lt;span class="o">[[&lt;/span> -n &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="si">${&lt;/span>&lt;span class="nv">AI_TOOL&lt;/span>&lt;span class="k">:-&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> &lt;span class="o">]]&lt;/span>&lt;span class="p">;&lt;/span> &lt;span class="k">then&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> ! grep -qF &lt;span class="s2">&amp;#34;Co-Authored-By: &lt;/span>&lt;span class="si">${&lt;/span>&lt;span class="nv">AI_TOOL&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$COMMIT_MSG_FILE&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="p">;&lt;/span> &lt;span class="k">then&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">printf&lt;/span> &lt;span class="s2">&amp;#34;\nCo-Authored-By: %s\n&amp;#34;&lt;/span> &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$AI_TOOL&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> &amp;gt;&amp;gt; &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$COMMIT_MSG_FILE&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">fi&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">fi&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The immediate payoff is queryability. From the moment this hook is in place:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># All commits where an AI tool was a co-author&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">git log --all --grep&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;Co-Authored-By:&amp;#34;&lt;/span> --pretty&lt;span class="o">=&lt;/span>format:&lt;span class="s2">&amp;#34;%h %an %s&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Commits authored entirely by an autonomous agent&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">git log --all --author&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;ai-agent&amp;#34;&lt;/span> --pretty&lt;span class="o">=&lt;/span>format:&lt;span class="s2">&amp;#34;%h %an %s&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Per-author counts (humans + named AI agents)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">git shortlog -sne --all
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>That&amp;rsquo;s an enormous step up from &amp;ldquo;we don&amp;rsquo;t know.&amp;rdquo; It costs nothing. It runs in any git repo. And it works whether or not your AI tooling cooperates.&lt;/p>
&lt;p>The accountability piece sits next to this: every AI-authored PR has a named human sponsor who accepts responsibility for review and correctness. This is the same pattern most organizations already use for vendor-contributed or contractor code.&lt;/p>
&lt;h3 id="layer-2-responsibility-classification-rai-footers">Layer 2: Responsibility Classification (RAI Footers)
&lt;/h3>&lt;p>Layer 1 captures presence/absence. Layer 2 captures &lt;em>degree&lt;/em>.&lt;/p>
&lt;p>The convention I&amp;rsquo;d use is RAI footers: three trailers that distinguish how much of the commit is AI-generated, paired with a &lt;code>Signed-off-by&lt;/code> to keep a human accountable:&lt;/p>
&lt;pre tabindex="0">&lt;code>Assisted-by: Copilot # AI helped with suggestions (~up to 33% generated)
Co-authored-by: Claude Code # Substantial AI contribution (35–67%)
Generated-by: Claude Code # Primarily AI-generated (67%+)
Signed-off-by: jdoe # Human takes responsibility
&lt;/code>&lt;/pre>&lt;p>A &lt;code>commit-msg&lt;/code> hook enforces the dual-attestation:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="cl">&lt;span class="cp">#!/usr/bin/env bash
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="cp">&lt;/span>&lt;span class="c1"># .githooks/commit-msg&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nv">COMMIT_MSG_FILE&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$1&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">if&lt;/span> grep -qE &lt;span class="s2">&amp;#34;^(Assisted-by|Co-authored-by|Generated-by):&amp;#34;&lt;/span> &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$COMMIT_MSG_FILE&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="p">;&lt;/span> &lt;span class="k">then&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">if&lt;/span> ! grep -qE &lt;span class="s2">&amp;#34;^Signed-off-by:&amp;#34;&lt;/span> &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$COMMIT_MSG_FILE&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="p">;&lt;/span> &lt;span class="k">then&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">echo&lt;/span> &lt;span class="s2">&amp;#34;ERROR: AI attribution trailer present, but no Signed-off-by.&amp;#34;&lt;/span> &amp;gt;&lt;span class="p">&amp;amp;&lt;/span>&lt;span class="m">2&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">echo&lt;/span> &lt;span class="s2">&amp;#34; Re-run with: git commit -s&amp;#34;&lt;/span> &amp;gt;&lt;span class="p">&amp;amp;&lt;/span>&lt;span class="m">2&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="nb">exit&lt;/span> &lt;span class="m">1&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="k">fi&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">fi&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>A small reporting script summarizes the AI footprint for a repo:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="cl">&lt;span class="cp">#!/usr/bin/env bash
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="cp">&lt;/span>&lt;span class="c1"># bin/ai-attribution-stats.sh&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nv">RANGE&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="si">${&lt;/span>&lt;span class="nv">1&lt;/span>&lt;span class="k">:-&lt;/span>&lt;span class="nv">HEAD&lt;/span>&lt;span class="p">~1000..HEAD&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nv">total&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="k">$(&lt;/span>git log &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$RANGE&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> --oneline &lt;span class="p">|&lt;/span> wc -l &lt;span class="p">|&lt;/span> tr -d &lt;span class="s1">&amp;#39; &amp;#39;&lt;/span>&lt;span class="k">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nv">assisted&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="k">$(&lt;/span>git log &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$RANGE&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> --grep&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;^Assisted-by:&amp;#34;&lt;/span> --oneline &lt;span class="p">|&lt;/span> wc -l &lt;span class="p">|&lt;/span> tr -d &lt;span class="s1">&amp;#39; &amp;#39;&lt;/span>&lt;span class="k">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nv">coauth&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="k">$(&lt;/span>git log &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$RANGE&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> --grep&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;^Co-authored-by:&amp;#34;&lt;/span> --oneline &lt;span class="p">|&lt;/span> wc -l &lt;span class="p">|&lt;/span> tr -d &lt;span class="s1">&amp;#39; &amp;#39;&lt;/span>&lt;span class="k">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nv">generated&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="k">$(&lt;/span>git log &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$RANGE&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> --grep&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;^Generated-by:&amp;#34;&lt;/span> --oneline &lt;span class="p">|&lt;/span> wc -l &lt;span class="p">|&lt;/span> tr -d &lt;span class="s1">&amp;#39; &amp;#39;&lt;/span>&lt;span class="k">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nv">ai_total&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="k">$((&lt;/span>assisted &lt;span class="o">+&lt;/span> coauth &lt;span class="o">+&lt;/span> generated&lt;span class="k">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">printf&lt;/span> &lt;span class="s2">&amp;#34;AI Attribution Report\n&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">printf&lt;/span> &lt;span class="s2">&amp;#34; Total commits: %s\n&amp;#34;&lt;/span> &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$total&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">printf&lt;/span> &lt;span class="s2">&amp;#34; Assisted-by: %s\n&amp;#34;&lt;/span> &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$assisted&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">printf&lt;/span> &lt;span class="s2">&amp;#34; Co-authored-by: %s\n&amp;#34;&lt;/span> &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$coauth&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">printf&lt;/span> &lt;span class="s2">&amp;#34; Generated-by: %s\n&amp;#34;&lt;/span> &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$generated&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">printf&lt;/span> &lt;span class="s2">&amp;#34; AI percentage: %s%%\n&amp;#34;&lt;/span> &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="k">$((&lt;/span>ai_total &lt;span class="o">*&lt;/span> &lt;span class="m">100&lt;/span> &lt;span class="o">/&lt;/span> total&lt;span class="k">))&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>What this gives you: degree-of-AI-involvement per commit, plus dual-attestation. Commit-level dashboards become possible. &amp;ldquo;What percentage of commits in this service are &lt;code>Generated-by&lt;/code> vs &lt;code>Assisted-by&lt;/code>?&amp;rdquo; goes from a guess to a query.&lt;/p>
&lt;p>The honest limitation is self-reporting. The hook can enforce that &lt;em>some&lt;/em> classification is present when an AI tool is named. It cannot verify the percentage split is accurate. That&amp;rsquo;s a discipline question, not a tooling question.&lt;/p>
&lt;h3 id="layer-3-line-level-provenance">Layer 3: Line-Level Provenance
&lt;/h3>&lt;p>Commit-level is coarse. A commit that&amp;rsquo;s 60% AI-assisted contains specific lines an AI wrote and specific lines a human wrote, and Layers 1 and 2 give you no way to tell them apart.&lt;/p>
&lt;p>For that, look at &lt;a class="link" href="https://github.com/git-ai-project/git-ai" target="_blank" rel="noopener"
>&lt;code>git-ai&lt;/code>&lt;/a>. It uses Git Notes (&lt;code>refs/notes/ai&lt;/code>) to attach metadata to commits without modifying commit history, supports a growing list of agents including Claude Code, Copilot, Cursor, and Windsurf, and is listed on Thoughtworks Technology Radar. The format is an open standard.&lt;/p>
&lt;p>What it answers that Layers 1–2 cannot:&lt;/p>
&lt;ul>
&lt;li>Exact percentage of a file or module that was AI-generated&lt;/li>
&lt;li>Security review prioritization based on AI-authored line density&lt;/li>
&lt;li>Function-level IP risk assessment&lt;/li>
&lt;li>Compliance evidence at line granularity&lt;/li>
&lt;/ul>
&lt;p>&lt;a class="link" href="https://github.com/dotsetlabs/whogitit" target="_blank" rel="noopener"
>&lt;code>whogitit&lt;/code>&lt;/a> is a similar tool with automatic redaction for sensitive prompt content, worth evaluating if prompt privacy is a concern.&lt;/p>
&lt;p>The trade-off is operational complexity. Git Notes are powerful but they require the agents to call into them at commit time, and they require everyone working on the repo to fetch the notes namespace. Layer 3 is where the investment starts to look real.&lt;/p>
&lt;h3 id="layer-4-telemetry-aggregation">Layer 4: Telemetry Aggregation
&lt;/h3>&lt;p>For organizations with enough scale to justify it, aggregate telemetry from multiple AI tools into a unified dashboard:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>GitHub Copilot Metrics API&lt;/strong>: suggestion/acceptance rates by org, team, language, editor&lt;/li>
&lt;li>&lt;strong>Anthropic API consumption&lt;/strong>: token usage and model versions per team&lt;/li>
&lt;li>&lt;strong>Cursor Enterprise API&lt;/strong>: per-acceptance tracking&lt;/li>
&lt;li>&lt;strong>&lt;code>git-ai stats&lt;/code>&lt;/strong>: line-level aggregates from Layer 3&lt;/li>
&lt;li>&lt;strong>Internal commit metadata&lt;/strong>: Layers 1 and 2 data&lt;/li>
&lt;/ul>
&lt;p>No off-the-shelf product unifies these today. It&amp;rsquo;s a custom build. The upside is a single pane of glass for AI code footprint across the org; the downside is that you&amp;rsquo;re integrating against four or five different telemetry shapes that weren&amp;rsquo;t designed to compose.&lt;/p>
&lt;h2 id="rollout-recommendation">Rollout Recommendation
&lt;/h2>&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Phase&lt;/th>
&lt;th>Layer&lt;/th>
&lt;th>Timeline&lt;/th>
&lt;th>Owner&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;strong>Phase 1&lt;/strong>&lt;/td>
&lt;td>Git conventions + hooks&lt;/td>
&lt;td>2–4 weeks&lt;/td>
&lt;td>Engineering standards team&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Phase 2&lt;/strong>&lt;/td>
&lt;td>RAI footers + accountability policy&lt;/td>
&lt;td>4–6 weeks&lt;/td>
&lt;td>Standards + Legal&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Phase 3&lt;/strong>&lt;/td>
&lt;td>&lt;code>git-ai&lt;/code> pilot&lt;/td>
&lt;td>1–2 months&lt;/td>
&lt;td>1–2 pilot teams&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Phase 4&lt;/strong>&lt;/td>
&lt;td>Telemetry dashboard&lt;/td>
&lt;td>3–6 months&lt;/td>
&lt;td>Platform engineering&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>&lt;strong>Phase 1 is the critical path.&lt;/strong> Zero tooling investment, immediate queryable data, and it establishes the cultural norm that AI code is attributed code. Everything else builds on this foundation.&lt;/p>
&lt;h2 id="the-standards-landscape">The Standards Landscape
&lt;/h2>&lt;p>There is no settled industry standard for AI code provenance. The closest thing to one, &lt;code>git-ai&lt;/code>&amp;rsquo;s line-level format, is a single-project specification that is gaining traction but has not been ratified by any standards body.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Standard&lt;/th>
&lt;th>Scope&lt;/th>
&lt;th>Code Provenance?&lt;/th>
&lt;th>Status&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;strong>C2PA&lt;/strong> (v2.2)&lt;/td>
&lt;td>Images, video, audio, documents&lt;/td>
&lt;td>No&lt;/td>
&lt;td>Active, no code working group&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>AIBOM&lt;/strong>&lt;/td>
&lt;td>Model provenance (models, datasets, configs)&lt;/td>
&lt;td>No (tracks what model built it, not which lines)&lt;/td>
&lt;td>Emerging&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>SLSA 1.2&lt;/strong>&lt;/td>
&lt;td>Software supply chain artifacts&lt;/td>
&lt;td>Build/source provenance, not AI authorship&lt;/td>
&lt;td>Active (Linux Foundation)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>git-ai v3.0.0&lt;/strong>&lt;/td>
&lt;td>Line-level AI attribution in git&lt;/td>
&lt;td>Yes&lt;/td>
&lt;td>Active, single-project spec&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>EU AI Act&lt;/strong>&lt;/td>
&lt;td>AI-generated content disclosure&lt;/td>
&lt;td>Likely applicable to code (ambiguous)&lt;/td>
&lt;td>Enforcement August 2026&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>The space between &amp;ldquo;sign your commits&amp;rdquo; and &amp;ldquo;maintain cryptographic chain-of-custody for every AI-generated patch&amp;rdquo; is where standards will likely emerge over the next 12–24 months. A layered approach lets you adopt early without rework when standards land.&lt;/p>
&lt;h2 id="detection-as-a-backstop-not-a-strategy">Detection as a Backstop (Not a Strategy)
&lt;/h2>&lt;p>When provenance metadata is missing (legacy code, tools that don&amp;rsquo;t self-attribute), statistical signals can sometimes identify AI-generated code. Structural uniformity, comment-to-code ratio, naming conventions, the absence of typos in comments, lower token perplexity. These are real signals, and they&amp;rsquo;re useful for retroactive analysis of legacy codebases.&lt;/p>
&lt;p>They are not a governance strategy. The signals are statistical and they degrade as models improve. By the time you&amp;rsquo;ve trained a classifier on this generation of models, the next generation produces code that looks more like what humans write. The lesson is to instrument attribution at write-time, not detect it after the fact.&lt;/p>
&lt;h2 id="the-threshold-nobody-can-draw">The Threshold Nobody Can Draw
&lt;/h2>&lt;p>Here is the counterargument to all of this, and I think it&amp;rsquo;s a real one.&lt;/p>
&lt;p>The attribution spectrum assumes you can draw a line between human and machine authorship. At the commit level you roughly can. At the line level, the line dissolves.&lt;/p>
&lt;p>Take a function a developer wrote by hand. An AI assistant changes one character in one line: a &lt;code>&amp;lt;=&lt;/code> becomes a &lt;code>&amp;lt;&lt;/code>. Is that line now AI-authored? Is the whole function? If the answer is yes, a single keystroke from a model relabels human work as machine output. If the answer is no, then what is the threshold: ten characters, a whole expression, half the line? Nobody has a principled answer. The Copyright Office said to &amp;ldquo;identify and disclaim AI-generated parts,&amp;rdquo; but it never defined how large a part has to be before it counts, and no court has either.&lt;/p>
&lt;p>Flip it around and it gets worse. A human edits one character of an AI-generated line. Does that keystroke pull the line back into copyrightable human authorship? If a one-character human edit launders AI output into protected IP, then the cheapest way to own your codebase is to have someone retype one character on every line. That is absurd, and the absurdity is the point. The threshold model breaks in both directions.&lt;/p>
&lt;p>So a chunk of the IP rationale rests on a line nobody can draw, for a legal fight that may never come. The case law isn&amp;rsquo;t there yet, and it may never get tested at the granularity that would make line-level provenance the deciding evidence. There is a real chance we are building careful chain-of-custody for a courtroom no one ever walks into.&lt;/p>
&lt;p>I still think you should instrument attribution. The IP rationale is the shaky one; the other three are not. Security prioritization, compliance posture, and audit answerability all run fine on coarse, commit-level signals. A rough, queryable answer to &amp;ldquo;how much of this was AI?&amp;rdquo; is all they need, and that is Layer 1. Layer 1 is cheap. The expensive, line-level, cryptographic chain-of-custody is the part most likely being built for the court case that never comes.&lt;/p>
&lt;h2 id="open-questions-worth-asking-inside-your-org">Open Questions Worth Asking Inside Your Org
&lt;/h2>&lt;p>This is the section to take into a working group meeting:&lt;/p>
&lt;ol>
&lt;li>Which AI tools are officially supported, and what are the attribution behaviors of each?&lt;/li>
&lt;li>Where does accountability land? The &amp;ldquo;bot sponsorship&amp;rdquo; model, where every AI PR has a named human sponsor, is the pattern I&amp;rsquo;d recommend.&lt;/li>
&lt;li>What threshold triggers enhanced review? Should &lt;code>Generated-by&lt;/code> (67%+ AI) commits require additional reviewers or security review?&lt;/li>
&lt;li>How do you handle the Copilot/Cursor invisibility gap? Mandate manual attribution, or accept the blind spot?&lt;/li>
&lt;li>Do you need line-level tracking (Layer 3) for compliance, or is commit-level (Layers 1–2) sufficient given your audit posture?&lt;/li>
&lt;/ol>
&lt;h2 id="companion-code">Companion Code
&lt;/h2>&lt;p>The three scripts above (&lt;code>prepare-commit-msg&lt;/code>, &lt;code>commit-msg&lt;/code>, &lt;code>ai-attribution-stats.sh&lt;/code>) are the minimum viable Layer 1 + Layer 2 deployment for any team. Drop them into &lt;code>.githooks/&lt;/code>, run &lt;code>git config core.hooksPath .githooks&lt;/code>, and you have queryable AI attribution from the next commit forward.&lt;/p>
&lt;p>Production-ready versions live at &lt;strong>&lt;a class="link" href="https://github.com/mgoodric/ai-attribution-hooks" target="_blank" rel="noopener"
>github.com/mgoodric/ai-attribution-hooks&lt;/a>&lt;/strong>, including a one-shot installer and a slightly enhanced &lt;code>prepare-commit-msg&lt;/code> that also auto-adds &lt;code>Signed-off-by&lt;/code> so commits clear the dual-attestation check.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="cl">git clone https://github.com/mgoodric/ai-attribution-hooks.git
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">cd&lt;/span> /path/to/your/repo
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">/path/to/ai-attribution-hooks/install.sh
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="someone-has-to-set-the-standard">Someone Has to Set the Standard
&lt;/h2>&lt;p>You can start with Layer 1 today. The perfect provenance system can wait.&lt;/p>
&lt;p>The technical work here is the easy part: it&amp;rsquo;s git hooks. The hard part is the organizational discipline to actually require attribution across teams, to actually enforce the dual-attestation, and to actually treat AI authorship as something that deserves the same accountability as human authorship. That&amp;rsquo;s not a tooling problem.&lt;/p>
&lt;p>Attribution is how you keep IP defensible, security review prioritized, and audits answerable with data instead of estimates. The tools don&amp;rsquo;t do this for us. The convention doesn&amp;rsquo;t establish itself. Someone has to set the standard inside the org and stick to it.&lt;/p>
&lt;hr>
&lt;h2 id="references">References
&lt;/h2>&lt;ul>
&lt;li>&lt;a class="link" href="https://github.com/git-ai-project/git-ai" target="_blank" rel="noopener"
>git-ai: Line-level AI attribution&lt;/a> | &lt;a class="link" href="https://usegitai.com" target="_blank" rel="noopener"
>How it works&lt;/a>&lt;/li>
&lt;li>&lt;a class="link" href="https://github.com/dotsetlabs/whogitit" target="_blank" rel="noopener"
>whogitit: AI attribution with Claude Code integration&lt;/a>&lt;/li>
&lt;li>&lt;a class="link" href="https://botcommits.dev/" target="_blank" rel="noopener"
>botcommits.dev: Tracking AI commits on GitHub&lt;/a>&lt;/li>
&lt;li>&lt;a class="link" href="https://dev.to/anchildress1/signing-your-name-on-ai-assisted-commits-with-rai-footers-2b0o" target="_blank" rel="noopener"
>RAI Footers for AI-assisted commits&lt;/a>&lt;/li>
&lt;li>&lt;a class="link" href="https://www.ssw.com.au/rules/attribute-ai-assisted-commits-with-co-authors" target="_blank" rel="noopener"
>SSW Rules: Attribute AI-assisted commits&lt;/a>&lt;/li>
&lt;li>&lt;a class="link" href="https://pullflow.com/blog/the-new-git-blame/" target="_blank" rel="noopener"
>The New Git Blame: Who&amp;rsquo;s Responsible When AI Writes Code&lt;/a>&lt;/li>
&lt;li>&lt;a class="link" href="https://www.copyright.gov/ai/" target="_blank" rel="noopener"
>U.S. Copyright Office on AI-generated works&lt;/a>&lt;/li>
&lt;/ul></description></item></channel></rss>