In the last year or two, the software industry has changed so much. Thanks to AI tools such as Claude Code, I now manually type only a tiny percentage of the code I write compared to a year ago.
My day to day work is with React, testing with Vitest, Playwright etc. Pretty standard setup that most of you are probably also using. Tests are of course an important part of delivering software.
And AI can definitely write code and add tests for that code.
But I've noticed many patterns that AI constantly introduces which I would consider code smells and generally try to avoid.
I wanted to document some of the testing code smells I keep seeing that AI introduces.
They often help get tests passing quickly, but they can also create maintenance problems that only become apparent months later.
I bet as you read this, you will disagree with some of these points.
Especially if you have already spent the time setting up good AGENTS.md (or CLAUDE.md), with good SKILL.md files covering your conventions.
I'm writing this up to point out the sort of issues that AI can introduce in tests when you don't have any reins to nudge it into the right direction.
The real solution is to refine your AGENTS.md, CLAUDE.md, and SKILL.md files.
I will have an upcoming blog post with the specific skills/instructions I use to nudge Claude Code into writing much better and maintainable tests.
A lot of these code smells contradict each other (e.g. over-testing and then under-testing!). These common code smells don't all appear at the same time!
A lot of these best practices are similar to what I've covered elsewhere on this site, as they're generally just best practices.
Forgetting to add any tests, or only unit testing tiny portions of its changes
The biggest code smell is that AI will completely forget to write tests, unless you specifically tell it.
Even if you tell AI agents to write tests, I've repeatedly found that unless you are very specific about what to test and how to test it, it will often cover only part of the code.
If you have a change that adds:
- a new main (top level) React component
- some other smaller components that the main component uses
- several custom hooks, which are used in your components
- and a few simple pure functions (such as some business logic, maths, etc) that your hooks and/or components use
Typically in those cases you will get most value out of an integration test, testing the main component - at least for the general happy path and some main unhappy paths. Then smaller unit tests for complex or edge case logic in the smaller units (the pure functions or maybe the custom hooks).
I've found that AI will often forget to test the main component, and just unit test the hell out of the pure functions or maybe the custom hooks.
So you might have 500 lines of implementation, and 1000 lines of tests and at first glance you would assume it is well tested.
Only testing happy paths
If you ask AI to add tests for a component that doesn't handle edge cases or errors already, then it is very rare to see AI bother to write tests to confirm behaviour of those edge case or error paths.
Often AI, especially on complex frontend UI tests, will just test the 'happy path' (the success state), and avoid edge cases.
If it does handle edge cases and unhappy paths, it will often make the tests pass even when the behaviour is actually buggy.
AI reviews often don't point out lack of tests
There are lots of ways to automate pull request reviews with AI. And honestly they are pretty amazing.
But AI pull-request reviews can sometimes lack an understanding of why a feature is being added or modified - but they are really great at spotting typos and other mistakes, performance issues, and so on.
But I have found that unless you give it specific instructions that you expect good test coverage for all changes, it is quite rare for it to pick up and point out that something is not tested.
I've also never seen it pick up and comment on badly written tests (such as the issues listed on this page). It will almost always approve a PR with tons of badly written tests or missing tests.
Over testing
Ask AI to write tests for a feature, and it will sometimes go overboard, attempting to test every possible combination of inputs, props, and states.
I love testing, it documents code, it helps with refactors. But there is a strong case that over testing is a huge problem.
If you are over testing every single combination, when really they are not adding much value, it is better to clean up tests and remove the junk that AI writes.
I've often gotten AI to write tests and then ask it to work out where the value actually comes from, and to delete tests that are not adding much value. It can significantly reduce the number of complex tests.
Some things to look for when over testing:
- testing for impossible states (they have no value and should be avoided. Humans writing tests will often know these states are impossible and therefore pointless to test for, but AI doesn't)
- testing trivial things like getters or setters, where no real business logic is involved
- Each test asserting one minor thing at a time, instead of grouping them together
Re-testing existing code
Sometimes you will use AI to update existing code, which may be tested (indirectly or directly) elsewhere in the codebase.
I've seen AI not notice that it is already tested, and when updating an existing function and asking it to test it, it will test not only your new changes but write tests again (in a new test file) for the existing logic that it believes is not tested.
And in my experience when this does happen, the quality of those tests is very low (just 'AI slop') every single time, not understanding what or why the function exists and it is just trying to get 100% coverage.
(This overlaps a bit with a previous point of over testing).
Reluctance to use .each()
Related to the previous point of over testing, when AI decides to test every combination it will almost copy/paste 90% of a test to just change one input and one expected assertion.
It is much cleaner to get it to use .each() (which Jest and Vitest both support)
It can turn something like multiple very similar tests into just one test, with an array of data to pass into the test.
For example, here are multiple cases that are tested in one function, using each():
test.each`
userAge | canPurchaseAlcohol | expectedMessage
${21} | ${true} | ${'Purchase approved'}
${18} | ${false} | ${'Age verification required'}
${65} | ${true} | ${'Senior discount applied'}
${17} | ${false} | ${'Must be 18 or older'}
`(
'displays "$expectedMessage" when user is $userAge years old',
({ userAge, canPurchaseAlcohol, expectedMessage }) => {
render(<CheckoutForm userAge={userAge} />);
const submitButton = screen.getByRole('button', { name: 'Complete Purchase' });
if (canPurchaseAlcohol) {
expect(submitButton).toBeEnabled();
} else {
expect(submitButton).toBeDisabled();
}
expect(screen.getByText(expectedMessage)).toBeInTheDocument();
}
);
Asking AI to write tests after it wrote the implementation
I've touched on this already but whenever you ask AI to write tests for a feature, it will by default write tests so your implementation is correctly tested.
This can be quite dangerous and leave you thinking your tests give you confidence in a working application, but it is false hope.
You have to be careful with the tests that AI wrote, as if there is a bug in the logic the test will just assert it as if it was correctly implemented.
Instead it will take the implementation and write tests that pass.
(Like any and all of these issues, with some careful prompting of good usage of SKILL.md then this can be avoided).
Note: It is great at finding typos or incorrect code syntax. But it fails to understand why a feature exists and often missing when there is a business logic bug.
Over mocking
One of the biggest annoyances of AI written tests is that it will treat most tests as unit tests, and just mock (with jest.mock() or vi.mock()).
This really helps AI get tests passing, but it often removes much of their value because they end up just testing mocked implementations instead of real behaviour.
If you have a dependency injection (DI) system set up, then it is the DI system that should be used to pass in mock objects. AI on JavaScript/TypeScript apps will generally ignore this and just go for jest.mock.
The main issue I have with this for long term maintenance is that when you use mock() you are mocking an entire module (file/import). So if you need to fake some data for one exported function, the entire module has to be mocked. I also find that because AI will just mock when it cannot get a test to pass, it often means there was some badly designed system and the mock is an escape hatch that should not be the first thing it tries.
It writes tests to pass
Related to the previous point of over mocking, when AI writes tests for existing code (code that it may have just written) it will generally try to test so the tests pass.
As said in the previous section often the shortcut will be to mock anything that causes issues.
I have often asked AI to write tests for existing code (that had no tests), and I have yet to see it ever find and fix any bugs. I know that when I've written tests for existing code that had no tests, it is very common to start to find bugs (edge case bugs, like negative numbers, empty arrays, etc).
AI will literally just look at the implementation and assume that code is correct, and the tests must test that implementation. Really we need to be asking AI to try to understand what the function is intending on doing, and write those tests.
Not using spyOn()
Similar to above - if AI decides it has to provide a mock for something, then I find that it will mock an entire module.
The better way is often to use spyOn().
This will give you much more control, and doesn't mock the entire module (entire file). Generally I find that maintaining spyOn is much easier long term than over use of mock.
For example:
// ❌ Bad: Mocking the entire module
vi.mock('../services/logger');
test('logs error when API fails', async () => {
// ...
});
// ✅ Good: Using spyOn to mock just what you need
import * as logger from '../services/logger';
test('logs error when API fails', async () => {
const logErrorSpy = vi.spyOn(logger, 'logError').mockImplementation(() => {});
render(<UserProfile userId="123" />);
await waitFor(() => {
expect(logErrorSpy).toHaveBeenCalledWith('Failed to fetch user');
});
});
Not using fixtures (prepared mock data to pass in as args/props)
Fixtures are a very useful way to clean up your tests. If you are always creating a User object to pass in as a prop, then having a file (maybe fixtures.ts) with a dummy object (or a function to return that object), it keeps your tests much smaller, cleaner and easier to maintain.
Without telling AI about these sort of things in your code base, I've found it will almost always reimplements generating these huge objects in your tests, which can lead to much harder to read tests.
For example:
// ❌ Bad: Recreating mock data in every test
test('renders user profile', () => {
const mockUser = {
id: '123',
name: 'John Doe',
email: 'john@example.com',
address: { street: '123 Main St', city: 'New York' },
preferences: { theme: 'dark' },
};
render(<UserProfile user={mockUser} />);
expect(screen.getByText('John Doe')).toBeInTheDocument();
});
// ✅ Good: Using a fixtures file
// fixtures.ts
export const createMockUser = (overrides = {}) => ({
id: '123',
name: 'John Doe',
email: 'john@example.com',
address: { street: '123 Main St', city: 'New York' },
preferences: { theme: 'dark' },
...overrides,
});
// test file
import { createMockUser } from './fixtures';
test('renders user profile', () => {
render(<UserProfile user={createMockUser()} />);
expect(screen.getByText('John Doe')).toBeInTheDocument();
});
Not knowing your existing helper functions
This is another similar one to not knowing about fixtures.
If you have helper functions, such as wrappers around render() which sets up context providers or other essential things to your app, unless you tell AI about it it will often reimplement those as it won't know they already exist.
Then your tests end up duplicating a lot of very similar code just to set up the tests (most things like render helpers or code to set up dummy data is normally very similar) and makes your test harder to read. Also when you update things in the future (e.g. updating the shape of some data), you have many more places to make sure all your tests have their own helper functions updated separately...
You need to make sure that AI knows about fixtures and helper functions to make your tests cleaner.
Making up types, or as any
If you use TypeScript, as soon as AI needs dummy test data I've found that as well as not using existing fixtures (see previous point), it will very often type things incorrectly or use very broad type assertions like as any.
I have seen this on all kinds of agents and setups - it doesn't seem to matter if the agent can access typechecking information. It will still often type it very loosely.
This can be acceptable in tests sometimes - as it can make test files much easier to work with.
But they are still a code smell and make refactoring things in the future harder without that strict type information.
AI loves Regex when matching rendered elements
If you have some component that you are testing, and it is rendering something like this:
<button>Add item to cart</button>
In my experience, AI will often query it like this.
screen.getByText(/add item to cart/i);
(A regex, matching 'add item to cart' (case insensitive))
I am not entirely sure why, but AI loves regex matchers.
I would say that in almost 9/10 cases (or more!) it could just be replaced with a literal string search (exact match), or just use {exact: false} for non exact matches.
For that example, really it should be one of these:
// ideally this:
screen.getByRole('button', { name: 'Add item to cart' });
// if case sensitivity is not important:
screen.getByRole('button', { name: 'add item to cart', exact: false });
Over use of test IDs
This is related to the previous point, but using React Testing Library's query function by priority is the recommended way.
However if AI sees a data-testid it defaults to using it over all other query ways.
This is not the end of the world, but querying the way that RTL (React Testing Library) expects does lead to your tests being more valuable as they test like a real user uses your application.
Wrapping everything in act()
If you use React Testing Library , there are times you have to reach out to use act() to wrap things in.
But if you are using normal query functions like await screen.findBy(...), or things like the userEvent.click() functions you shouldn't need to reach for act().
But AI will sometimes go overboard and believe that everything had to be wrapped in act().
Testing for exact DOM structure
In human written tests, I would say it is quite rare to need to test the specific DOM structure in React frontend tests.
Normally we just write tests that check a form input, error message and a submit button is on the page.
However I've noticed that AI will sometimes assert on the specific order of these in the page when it isn't really relevant. Apart from very specific (and rare) use cases I don't personally see the value in this sort of assertion.
Even if elements appear in the expected DOM order, without visual regression tests (screenshots, like on Percy) you still don't know how the UI actually looks to users.
Adding extra timeouts (especially to e2e tests)
AI can often very easily convince itself that a test needs additional timeouts to run and pass. This is one I am very confused about why it does it, because I've seen it do it on tests which have absolutely no timeout issues.
It can add them to the entire test (the final arg to test() or it() can be a timeout: test('something', someFn, 60 * 1000))
Or adding them to unnecessary waitFor or findBy:
await waitFor(
() => {
expect(someEl).toHaveTextContent('something');
},
{ timeout: 10 * 1000 }
);
await screen.findByText('something', {}, { timeout: 10 * 1000 });
In most cases they are not needed, and if they are then it is probably an indication that your test is not as clean as it could be.
My guess is that AI adds these timeouts as a safety net (without actually checking if they are actually required).
Asserting too much in a test
I find normally AI will write separate tests for every tiny assertion (so this section is not common).
But sometimes I have seen it do the opposite: mega tests that assert absolutely everything.
expect(a).toBe(...)
expect(b).toBe(...)
expect(c).toBe(...)
expect(d).toBe(...)
expect(e).toBe(...)
// ... and so on
I do not believe the 1 assertion in each test as that is overkill. But over testing in a single test means it is much harder to maintain and less clear what the test function is even attempting to do.
Testing your test, not your actual implementation
This is related to their over use of mocks, but sometimes AI will be testing more from their own test code than actual implementation.
For example if it is generating an object to pass to a component as props, it will make assertions on the input data too.
A similar issue is kind of reverse - testing things with the implementation methods. By this, I mean it will generate some result = ... and then work out what it expects it to equal by calling some internal function that the implementation is also using.
By doing this there is no proof the test is asserting anything useful, as the underlying functionality that both the test and implementation use could be buggy.
A related thing it loves to test sometimes is enums or other constants in your app code. If you direct it to test some new code and it adds something like that, sometimes it gets the idea of testing this static data structure in your tests.
AI's use of beforeAll() or beforeEach()
Another really common pattern in AI generated tests is adding far too much setup logic inside beforeAll() and beforeEach() blocks.
While using these can be great for setting up your app, I find that AI will default to resetting the entire world in complex ways just to be safe that we have reset everything.
Three main code smells to look for here:
- can the code in
beforeEachbe safely moved tobeforeAll, so it runs only once? - are your
beforeAll/beforeEachresetting things that haven't even changed? - If you are mutating the data it sets up in your tests, could you just copy the values instead of mutating the main one?
For example (in this heavily simplified contrived example!) here we have some props. We probably don't need to recreate this on beforeEach.
Doing it once (which could easily be done outside of beforeEach) would probably be more suitable. You can always spread them in <SomeComponent {...baseProps} title="Some override"> too.
let baseProps: SomeProps = {};
beforeEach(() => {
baseProps.title = 'some mock prop';
baseProps.createdAt = new Date();
});
Note: this is, like I said, over simplified, but I have seen AI quite often overuse beforeEach in ways like this
How to fix these issues?
I've listed what I've seen - and like I said at the top of the page this is when AI is given no guidance.
The way to fix these, like a lot of issues when using AI, is to make good use of instructions to AI.
I do not believe personally that you should be guiding your AI agents manually in each prompt. Instead you need to be setting up SKILL.md so it knows how you and your team expect tests to be written. I'll cover this more in an upcoming blog post.
Have you noticed other code smells? I've be really keen to hear - please reach out on on X