As LLM usage becomes a real operational cost in software outsourcing, companies need a systematic way to allocate token costs across projects and contracts. This post proposes a timesheet-based allocation framework using actual logged hours.

The New Line Item Nobody Budgeted For

Two years ago, AI tooling was an experiment. Today, it's infrastructure. Developers at software outsourcing companies are using Cursor, GitHub Copilot, Claude, and GPT-4 every single day — and that usage has a real dollar cost attached to it.

The problem is that most outsourcing companies are still treating AI token spend the way they treated software licenses in 2010: as a flat overhead cost, invisible to project P&Ls, impossible to attribute to specific clients or contracts.

That's starting to become a serious issue. When a single project's AI usage can run into thousands of dollars per month, "we'll just absorb it into overhead" stops being a defensible answer — to clients, to finance teams, and to anyone trying to understand true project profitability.

This post proposes a framework for solving this problem. It's a proposed approach, not a case study — we're sharing it as a starting point for discussion across the industry.

Why This Problem Is Uniquely Hard in Outsourcing

In a product company, AI cost attribution is relatively straightforward: the AI serves the product, the product generates revenue, done. But outsourcing is different in ways that matter here.

First, developers work across multiple contracts simultaneously. A senior developer might be 50% allocated to a fintech CWO (Contract Work Order), 30% to an e-commerce project, and 20% on internal tooling — all in the same week. If that developer uses AI tools heavily, which client absorbs the cost?

Second, the billing unit in outsourcing is the CWO, not the project. A project might span three years and four CWOs. Token costs need to land at the CWO level to be invoiceable, not just at the project level.

Third, AI tool usage is not naturally scoped to a project. Unlike compute costs (which can be tagged to environments), an LLM prompt doesn't know or care which client's code it's helping with.

The Core Insight: Timesheets Already Solve This

Here's the key observation: outsourcing companies already have a system that tracks exactly how each developer's time is distributed across CWOs each week. It's called a timesheet.

If we know that Developer A logged 20 hours to CWO-Fintech and 10 hours to CWO-Ecommerce in week 10, and we know Developer A consumed $45 of AI tokens that week — then the most defensible, lowest-friction allocation is:

CWO-Fintech: 20/30 × $45 = $30.00
CWO-Ecommerce: 10/30 × $45 = $15.00

This is Allocation Method 2: Proportional to Actual Logged Hours. It uses the hours that developers have already submitted and had approved — not estimated or planned hours — as the distribution key.

This matters because actual hours reflect reality. Planned allocations are aspirational; timesheets are what actually happened.

The Proposed Data Flow

Here's how we see the full pipeline working:


Developer uses AI tools (Cursor, Copilot, Claude API...)
        │
        ▼
AI Platform Team collects token usage logs
  { user_id, date, total_tokens, cost_usd }
        │
        ▼
Data Lake / Pipeline aggregates weekly
  { employee_id, week_start_date, total_tokens, total_cost_usd }
        │
        ▼
ETM (Timesheet System) queries Data Lake weekly
        │
        ▼
For each employee, fetch approved TimesheetEntries for that week
  → Get all active CWOs via: TimesheetEntry → Task → Allocation → CWORole → CWO
  → Calculate hours per CWO
  → Distribute token cost proportionally
        │
        ▼
Store: TokenCostAllocation { EmployeeId, CWOId, WeekDate, Tokens, CostUSD }
        │
        ▼
Aggregate up to Project level for dashboards
Report at CWO level for invoicing

Why CWO, Not Project?

When we first discussed this internally, the instinct was to track token cost at the project level — it's simpler, and it's how AI Platform Teams naturally think about codebase ownership.

But for outsourcing, the CWO is the billing unit. A project might have three active CWOs in any given quarter. If a client's contract comes up for renewal, you want to show them the true cost of delivery — including AI tooling — scoped to the contract period, not the entire project history.

The practical compromise: store at CWO level, aggregate to Project level for display. This gives finance the granularity they need for invoicing, while giving PMs and CTOs the project-level view they want for cost management.

Handling the Multi-CWO Developer Problem

The trickiest edge case is a developer who works across multiple CWOs in the same week. The proportional hours approach handles this cleanly, but there are a few important nuances:

Non-billable allocations: Should a developer's time on an internal or non-billable CWO attract token cost? Our suggestion: yes, include it in the denominator for calculating proportions, but flag it separately. Internal AI costs are still real costs — they just don't get passed to clients.

Leave weeks: If a developer was on approved leave for the full week, their token usage (if any) should be treated as overhead, not allocated to any CWO.

Delayed timesheets: If a developer hasn't submitted their timesheet yet when the weekly pipeline runs, fall back to planned allocation hours as a provisional estimate. Recalculate once the timesheet is approved.

Zero-hours edge case: If a developer has an active allocation but logged zero hours to every CWO in a given week, distribute token cost equally across active CWOs rather than failing silently.

The Missing Half: Who Decides How Much Budget a CWO Gets?

The allocation framework above answers the question "where did the tokens go?" But there is an equally important question the industry has not yet addressed: "how much should a CWO be budgeted in the first place?"

Most companies default to one of two approaches. Finance sets a flat percentage of project budget for AI tooling (top-down), or Tech Leads estimate based on gut feel (bottom-up). Both have serious weaknesses. The flat percentage ignores the fact that a complex data engineering CWO consumes far more AI tokens per developer than a maintenance CWO. And gut-feel estimates are only as good as the estimator's experience — which is essentially zero for a cost category that barely existed two years ago.

We propose a third model: data-driven budget suggestion, PM-confirmed.

Once your system has accumulated a few months of actual token usage data — which the allocation pipeline above generates automatically — the timesheet system can compute meaningful benchmarks per role:


Historical averages (computed from TokenCostAllocation):
  Backend Developer:   ~8,000 tokens / week / person
  QA Engineer:         ~3,000 tokens / week / person
  Project Manager:     ~1,500 tokens / week / person

When a PM creates a new CWO, the system reads the planned allocations and surfaces a suggested budget automatically:


New CWO: "Bosch Phase 3"   (12-week duration)

  Suggested weekly budget:
    3 × BE Dev  →  3 × 8,000  =  24,000 tokens
    1 × QA      →  1 × 3,000  =   3,000 tokens
    1 × PM      →  1 × 1,500  =   1,500 tokens
                              ─────────────────
    Weekly total              =  28,500 tokens
    + 5% buffer               =  30,000 tokens

  Suggested total budget:  30,000 × 12 weeks = 360,000 tokens

  [Confirm]  [Adjust]  [Skip]

The PM doesn't need to understand token pricing or consumption patterns. They see a number grounded in how similar roles have actually behaved on previous CWOs, and they confirm or adjust it in one step. That's the full interaction.

This approach has an important property: it gets more accurate over time. Every completed CWO adds a new data point. The system's suggestions in week one will be rough; by month six, they'll be calibrated to your team's specific tooling habits and project types.

There is an obvious bootstrapping problem — you cannot generate historical benchmarks before you have history. For the first two to three months, teams should focus purely on tracking actual usage without worrying about budgets. The data collected in that period becomes the seed for meaningful suggestions thereafter.

What This Requires From Each Team

From the AI Platform Team: Token usage must be trackable at the individual user level, not just the team level. If your AI tooling only exposes aggregate usage by department, the entire approach breaks down. User-level attribution is the non-negotiable prerequisite.

From the Data Pipeline team: A weekly aggregate export in a queryable format: { employee_id, week_start_date, total_tokens, cost_usd }. The timesheet system doesn't need raw logs — just the weekly summary per person.

From the Timesheet / ETM team: A new table to store allocated token costs, a weekly scheduled job to pull from the Data Lake and run the allocation logic, a recalculation trigger when timesheets are approved or amended, and two new fields on the CWO table (TokenBudgetPerWeek, TokenBudgetTotal) to store the confirmed budget.

The Proposed Schema Addition

Here is a minimal schema addition that supports the full framework — allocation tracking and budget management:


-- Extend CWO table (PM sets once at CWO creation)
CWO
  TokenBudgetPerWeek   BIGINT NULL
  TokenBudgetTotal     BIGINT NULL

-- Weekly allocation result (fully automated)
TokenCostAllocation
─────────────────────────────────────────────
Id                UNIQUEIDENTIFIER  PK
EmployeeId        UNIQUEIDENTIFIER  FK → Employee
CWOId             UNIQUEIDENTIFIER  FK → CWO
MondayDate        DATE              Week identifier
AllocatedTokens   BIGINT
AllocatedCostUSD  DECIMAL(18, 4)
AllocationBasis   NVARCHAR(20)      'ACTUAL' | 'ESTIMATED'
IsRecalculated    BIT

-- Budget vs actual tracking (fully automated)
TokenBudgetTracking
─────────────────────────────────────────────
Id              UNIQUEIDENTIFIER  PK
CWOId           UNIQUEIDENTIFIER  FK → CWO
MondayDate      DATE
BudgetTokens    BIGINT
ActualTokens    BIGINT
UtilizationPct  DECIMAL(5, 2)
CreatedAt       DATETIME2

The AllocationBasis field flags whether the allocation used approved timesheet hours (ACTUAL) or planned allocation hours (ESTIMATED). The TokenBudgetTracking table enables utilization dashboards and automated alerts — notify the PM at 70% utilization, escalate at 90% — without any manual intervention in the pipeline.

Open Questions for the Industry

Should token cost be passed to clients? This is a commercial decision, not a technical one. But having the data makes the conversation possible.
How do you handle shared AI infrastructure costs (e.g., a central RAG system serving multiple projects)? Proportional hours won't work here — you'd need usage-level attribution.
What's the right refresh cadence? Weekly aligns with timesheet cycles, but some companies might want monthly for invoicing alignment.
How do you audit the allocation? Developers should be able to see how their token usage was attributed, especially if it affects project cost reporting.
How long before benchmarks become reliable? Our hypothesis is two to three months of data per role type, but this will vary significantly by team size and tooling mix.

Why This Matters Now

The outsourcing industry has spent decades building sophisticated frameworks for tracking human time — timesheets, allocations, billing rates, CWO structures. All of that infrastructure exists because time is the primary input cost.

AI is becoming a second primary input cost. It deserves the same rigor.

The good news is that the hard infrastructure — timesheets, CWO hierarchies, weekly billing cycles — is already there. The framework proposed here doesn't require rebuilding anything from scratch. It requires connecting two data systems that were never designed to talk to each other, using the timesheet as the bridge. And once that connection exists, the same data that tracks where tokens went can also inform how many tokens to budget next time.

If you're building something similar, or have a different approach, we'd genuinely like to hear about it.