There's an enticing pattern many of us are attracted to when we adopt a new tool. We grab the tool first, and then we try to wrap the business around it. The newer the tool, the louder the pitch, the more we do it.
We grab the tool first, and then we try to wrap the business around it.
It rarely works. The tool ends up doing things we didn't really need done, the people end up working around the tool instead of with it, and the process we used to have stops being the process we actually run. Usually nobody decided that should happen. It just happens, slowly, while we're busy being impressed with the tool.
This issue is about how to not do that with AI.
A quick word on framing. The current zeitgeist treats every new tool as an AI tool, and that isn't always true. But where it is true, and increasingly that's where the live adoption decisions are, the way AI fails is different from the way other tools fail, and the discipline you need before you adopt is different too. So I'm going to assume from here on that the tool you're looking at is an AI tool, because that's where the conversation is right now.
Here's a small example of why AI fails differently. I asked Gemini a riddle this morning. The standard Gemini that comes with our paid Google Workspace plan. How much wood could a woodchuck chuck if a woodchuck could chuck wood?

It surmised an answer. Then it drifted into a real wildlife study about groundhogs and how much dirt they can move when digging a burrow. Then it quietly substituted dirt for timber and produced a hypothesis based on the new question rather than the one I'd asked.
The point isn't that AI gets things wrong. It's that it sounds confident regardless. Nine times out of ten, nineteen times out of twenty, it'll produce something that looks fine. Then on the eleventh or the twenty third, it won't, and you won't know which time was which until afterwards. With a groundhog, that's a curiosity. With a labour variance commentary you're about to send a client, it's a problem.
A quick vocabulary check
Three words used loosely in the public conversation about AI. Worth being precise.
Generative AI is the class of tool you probably already know by name: ChatGPT, Claude, Gemini, Copilot. Every time you ask one a question, it constructs a new answer from scratch, predicting what should come next based on patterns in the data it was trained on. It isn't looking up an answer. It's generating one.
Agentic AI is generative AI given the ability to take actions on your behalf: write code, modify files, send messages, change data, all without asking permission for each step. Claude Code is agentic. Cursor's agent mode is agentic. Most of the new AI assistants being sold into accounting practices are agentic.
Vibe coding is using AI, usually agentic AI, to write working software by describing what you want in plain English. You don't need to know how the code works. The AI handles the engineering. You ship what it produces.
All three sit on the same probabilistic engine. The engine is excellent at language and indifferent to truth. The groundhog drift is a small, harmless version of that. The labour variance is what it looks like when it matters.
AI is excellent at language and indifferent to truth.
Get clear before you evaluate
I had a conversation today with someone running a service business who's about to bring in a friend to build out an AI-driven operations layer for them. Good friend, smart guy, the right kind of help. My note to him was that anything the friend produces is going to be great as long as he and his friend are both clear, before anything gets built, on exactly what good looks like.
That's the discipline. Before you evaluate any AI tool, get clear on a few things.
What are you actually trying to achieve? Not the tool, the outcome. State it in plain terms. A faster close. Better variance commentary. Fewer client emails on the same recurring questions. If you can't say it cleanly in a sentence, you're not ready to look at tools yet.
What does good look like? If a tool produced exactly the right thing for you, what would that thing look like? What would you be able to do that you can't today? This is the question most easily skipped, because it feels obvious, and it almost never is. Most people, when pressed, find they don't have a clear picture of what good looks like, just a vague sense of less of the current problem.
Where in your workflow is AI actually going to sit? Is it the whole task, or one part of it? Is it producing the answer that goes to a client, or is it producing a first draft you'll edit? Is it touching the numbers, or only the language around them? The same tool, applied to different parts of the same workflow, has different risks.
Do you agree it's the right part? AI fits some parts of a workflow well and others badly. Words: usually fine. Numbers a client will act on: usually not. The vibe coding hot take assumes AI fits everywhere. It doesn't.
What does this change for your customers? Your tool decision rarely stays inside your practice. Reports look different. Portals appear. Data flows through new systems. The customer didn't ask for any of that. The right question is what your customers value, and whether the change makes their experience of you better, the same, or quietly worse. A tool that improves your workflow but makes the customer's process harder is a tool that's costing you something you can't see on the invoice.
If those five questions don't yield clear answers, no tool evaluation is going to save the adoption. You'll wrap the business around the tool because you don't have a sharp enough picture of what the business should be doing in the first place.
And then evaluate the tool
Once you're clear, the tool itself can be evaluated properly. The questions below are what I'd want to know about any AI tool that's going to touch client work, whether I'm thinking of buying one or building one with vibe coding.
The DIY column is harder than the buying column on most rows. When you build your own, you've taken on what the vendor would otherwise have to answer for. Neither road is automatically the right one.
| # | The question | If you're buying | If you're building yourself |
|---|---|---|---|
| 1 | When this tool produces a number, where did the number come from? | Can the vendor demonstrate consistency on the same data? | Can you? Do you understand your own methodology well enough to defend it? |
| 2 | Will the tool remember the client between sessions? | Or are you reloading context every time? | Have you built that memory yourself? Can you check what's in it? |
| 3 | Can the tool see more history than fits in one conversation? | Is there a real limit you should know about? | Have you tested that older history actually returns? |
| 4 | What checks the output before you see it? | Or are you the only safety net? | Is your check still running the way you intended? |
| 5 | Can you show a client how a number was arrived at? | Does the tool surface the calculation logic? | Can you reconstruct any specific output if asked? |
| 6 | When the AI model changes, what changes for you? | Do you control when updates happen, or do they happen to you? | Have you budgeted the time to revalidate every workflow? |
| 7 | Where does the client's data live, and who else has access? | What does the vendor's policy actually say, not what's marketed, but what's in the terms? | Have you authorised those flows with your clients? |
Question seven is the one most with the greatest risk. The marketing language matters. Doesn't train on your data is not the same as your data isn't stored on third-party servers. Business tiers generally provide stronger protections than personal tiers, but the terms change without notice. If your engagement letter doesn't authorise third-party AI tools, you don't have client consent, regardless of which tier you're on.
And then there's the people part
Even if every question above gets a good answer, you can still lose the adoption.
You've probably been part of a business at some point where a new tool was introduced and then quietly didn't get used. The investment was made, the implementation went ahead, the tool was capable. And then people went back to doing things the old way because the new way didn't feel like a benefit to them. Maybe they didn't understand how it would help. Maybe the cost of learning it felt larger than the cost of not bothering. Maybe nobody asked them.
Kurt Lewin had a useful frame for this: unfreeze, change, refreeze. You can't refreeze a new way of working if the people doing the work don't see the benefit. Every change that takes is of some benefit to both the person doing the work and the end consumer of the result. If either side of that is missing, the change doesn't stick.
So the work isn't just did we pick the right tool. It's does the change benefit the people who'll have to live with it day to day, and the customers who'll receive what comes out the other end? All of it has to clear the bar.
What this means in practice
Three things, in order, before you adopt:
Get clear on what you're trying to achieve and what good looks like, in your own words, before anyone shows you a tool. Include the customer in that picture, not just your own workflow.
Evaluate the tool against the seven questions, honestly. The point of the questions isn't to produce a verdict. It's to surface things that would otherwise stay invisible.
Check that the change benefits the people who'll actually do the work, and the customers who'll receive what comes out. If either doesn't, you'll be paying for a tool that sits in a drawer, or one that quietly costs you customers you can't see leaving.
People, process, tools. Get the first two clear and the third one stops being the thing you wrap the business around.
Tool I'm Using: Calm
Calm is a focus app for me. Not the sleep stories, not the meditations it's widely known for. I use it for the ambient sound and cafe noise that lets me drop into flow during the working day.
There's a free tier that's enough to try. This link gives you thirty days free on a paid plan. Full disclosure: I get nothing if you sign up. No referral credit, no months free on my end. The link is there because at the time of writing it gives the person clicking it a meaningful run at no cost.
Next Week
Until next time, stay curious!

Know someone navigating the compliance-to-advisory transition? Forward this email — or better yet, send them to baifokal.beehiiv.com to subscribe.get "good meeting, thanks" as they walk out the door, that one's for you. Hit reply and tell me about it — I'm genuinely curious how many of us have had that exact experience.

