A pragmatic AI governance baseline for ANZ teams

22 Jan 2026·7 min read

Most AI governance writing aimed at small and mid-market AU and NZ organisations reads like it was produced by someone who has never had to ship a feature on a deadline. This is the version I wish existed when I was on the inside: a baseline of controls that are worth doing because they make your system safer to operate, not because a framework told you to.

The Australian context here is specific. Australia's AI Ethics Principles, published by the Department of Industry, Science and Resources, have been around since 2019, and the Voluntary AI Safety Standard published in 2024 layered ten guardrails on top of them. Neither is binding. Both are quietly the document a regulator or a procurement team will hand back to you in 2026 and ask how you've addressed. The honest read is that they describe controls a competent team would want anyway. Treat them as a useful checklist, not a compliance burden.

The controls that actually matter on day one

There is a long list of things you could do. There is a much shorter list of things you must do before you let an LLM make a decision a customer can see. In the order I'd put them in:

1. Data classification you can defend

Before any prompt leaves your network, you need a sentence-long answer to: what class of data is in this prompt, and is the model provider permitted to see it? For most ANZ orgs that means a three-tier scheme, public, internal, sensitive, and a rule that sensitive data does not leave the tenancy without an enterprise agreement and zero-retention configured. If that sounds basic, count how many internal tools in your organisation paste customer records into a public chatbot. The number is never zero.

2. Prompt and output logging

Log the prompt, the model identifier, the temperature, the system prompt hash, the output, and the user identifier, on every call. Not for compliance, for debugging. The first time a user files a ticket saying "the assistant told me the wrong thing on Tuesday", you will be enormously grateful you can replay the exact context. Keep these logs in a store with the same retention and access controls as your other PII; they are PII.

3. PII redaction at the boundary

If you're sending user-generated content to a third-party model, run it through a redaction pass first. The cheap version is a regex-and-NER pipeline that masks names, emails, phone numbers, TFNs, Medicare numbers, and account identifiers. The thoughtful version preserves a reversible token so you can rehydrate the response. The point is not perfection, the point is that the question "did we send a customer's TFN to a US-hosted model" should have a defensible answer.

4. Model version pinning

Pin to specific model versions, never to a moving alias. "gpt-4o" or "claude-sonnet" without a date suffix is a configuration bug waiting to happen. When the provider rolls a silent update you want to know because you decided to upgrade, not because your evals quietly degraded over a long weekend.

// Prefer this
const MODEL = "claude-sonnet-4-5-20250929";

// Not this, alias drift will bite you
const MODEL = "claude-sonnet-latest";

5. A rollback plan you've actually tested

If your LLM feature starts producing bad output at 2am on a Saturday, what is the runbook? "Disable the feature flag" is acceptable if the flag exists and someone has used it within the last quarter. "Revert the deployment" is acceptable if your deployments are reversible. "Phone the vendor" is not acceptable. Test your rollback by exercising it on a non-production day; if you can't, you don't have one.

6. Human-in-the-loop with calibrated thresholds

For any decision with material consequence, a refund, a denial, a customer-facing claim, the model should not be the final authority. The interesting design question is where to set the confidence threshold for human review. Set it too high and you wear human cost on benign cases; too low and the system silently rubber-stamps risky ones. The right answer is empirical: instrument it, watch the precision-recall curve over real traffic for a fortnight, then move the threshold with intent.

Human-in-the-loop becomes theatre the moment the human can't actually intervene. If your reviewer sees a decision that has already been emailed to the customer, you have logging, not oversight. The intervention point must come before the decision is observable.

What the AU framework asks for, honestly

The AI Ethics Principles cover human-centred values, human and societal wellbeing, fairness, privacy and security, reliability and safety, transparency, contestability, and accountability, eight in total. The Voluntary AI Safety Standard adds practical guardrails: accountability and governance, risk management, data governance, testing and monitoring, human oversight, end-user transparency, contestability, supply chain transparency, record-keeping, and stakeholder engagement. Read in full, both documents are reasonable. Read in summary, they want you to have an owner, a risk register, a test suite, a way to explain decisions to users, and a way for users to push back.

The piece that small ANZ orgs underweight is the supply chain transparency requirement. If you're using a third-party model, you inherit its training data politics, its content policies, and its update cadence. Document this. "Decisions in this product use Anthropic's claude-sonnet-4-5, hosted in ap-southeast-2 via AWS Bedrock, with data retention disabled" is a sentence your privacy officer should be able to read on demand.

Bottom line

A baseline that lets you ship safely is shorter than the framework documents suggest: classify your data, log everything, redact at the boundary, pin versions, test rollback, set HITL thresholds with evidence. Do these because they make your system better to operate, and because when the regulator does eventually ask, you can answer with what you actually built rather than a policy document. Where I'm uncertain: I think the AU regime tightens in the next eighteen months and the voluntary standard becomes effectively mandatory for procurement-led sectors. If you're in financial services or health, build to that future, not the current floor.