Reading the data

This site turns a stack of government web pages into numbers and colours: similarity scores, originality figures, a reuse heat-map, and a record of which statement we saw a passage in first. This page explains what each one means, and what it does not claim.

When the record starts

Continuous tracking opens on 11 Nov 2025, the day the Australian Public Service launched its AI Plan. Ben Swift began the tracker from the audience that afternoon, opening a laptop to archive the agencies’ statements as the announcement was still being made. That is why the record starts exactly when it does.

The Policy for the responsible use of AI in government has required these statements since February 2025. Their first nine months, the period when agencies drew their wording from one another, fell before the tracker was watching. Everything that already existed enters the record together on day one, marked “first tracked” rather than “published”. We cannot know what those statements looked like before we first saw them. This is the single biggest limit on everything below.

Shared passages and the reuse heat-map

Agencies reuse one another’s wording heavily. On each statement page, every passage is tinted by how widely the same words appear across all the statements we track:

uncoloured: unique to that statement;
a few: shared with two or three other agencies;
several: shared with four to nine;
many: shared with ten to twenty-four;
most: shared with twenty-five or more.

Wording lifted straight from the Policy is always tinted at least lightly. A passage that also appears in the Digital Transformation Agency’s own statement is marked in DTA template, because the DTA publishes the wording most other agencies start from.

First observed in our corpus

For every shared passage we also record which tracked statement showed it earliest, and the order in which the others picked it up. Each passage falls into one of three tiers, by how much weight that ordering can bear:

added during tracking (81): we watched the first agency add the passage well after the record opened, with others following later. The strongest signal.
present at the start (53): the first agency already had the passage when tracking began, and others adopted it later. We can order the latecomers, but not that first agency’s own source.
tied (41): several agencies share the earliest date, usually because they were all present on day one, so we make no claim about order.

Read this as first observed by us, never as proof of who wrote it first. A passage two agencies both carried on day one cannot be ordered at all, and even a clean ordering only says who we saw with it first. The real source could be an agency we don’t track, an internal draft circulated off the web, or the DTA template itself.

Originality scores

A statement’s originality score is the share of its text, by length, that is neither shared with other agencies nor drawn from the template. A statement at 80% is mostly in its own words; one at 20% is mostly shared or templated language. The leaderboard and the agency grid colour each statement on this scale, from borrowed to bespoke.

The DTA scores low by design: it publishes the template, so almost everything it says is, by definition, shared. We label it the template source rather than reading its low score as a lack of effort.

Similarity

Similarity is a different measurement from reuse. Reuse is literal: the same words, matched exactly. Similarity is about meaning. Each statement becomes a numeric fingerprint, an embedding, set beside every other one. Two statements can then read as similar even when they share no identical passages, simply because they cover the same ground. The similarity map and the “reads most like” lists use this measure. Scores run from 0 to 1, where closer to 1 means more alike.

The fine print

The About page covers how all of this is computed: how spurious scrape churn is collapsed out, the exact passage-matching rules, and the embedding model. It also sets out the limits of the scraping itself.