AI-Proof Your Code with These Testing Best Practices

Blogpost Thumbnail - Testing Best Practices.jpg

Last month, I had a revealing conversation with my team about how we use Git. I assumed everyone worked from the terminal – especially since we manage updates across 14 different repositories. But it turned out I was the only one. The other 10 developers relied entirely on their IDE’s Git UI.

That got me thinking: How many other "obvious" practices aren’t so obvious? Testing is one of them. Many developers know they should write tests, but few deeply understand how or why – especially now, with AI reshaping how we write code. In light of this, I wanted to discuss some testing best practices.

AI Coding Tools Are Changing the Game, and Testing is Your Safety Net

AI assistants like GitHub Copilot and ChatGPT can generate code at lightning speed, but they come with hidden risks. They operate without truly understanding your system's requirements or constraints. They'll produce solutions that appear correct but may ignore critical edge cases, fail to handle errors gracefully, or miss important business logic. This creates hidden risks, especially when refactoring, as you lack visibility into whether changes introduce subtle bugs.

Testing becomes our essential safeguard in this new paradigm. Testing forces us to explicitly define what 'correct' behavior looks like, anticipate failure scenarios, and establish a safety net that verifies that our code's integrity remains intact through modifications. The tests we write compensate for AI's blind spots, transforming generated code from potential liability to a reliable asset. Let's look closer at how to test AI-generated code.

Testing is the Difference Between "It Works" and "It’s Correct"

AI can automate the typing, but not the thinking. That’s why testing skills are now more valuable than ever. The teams that thrive with AI won’t be the ones writing the most code; they’ll be the ones validating it best. So, let’s talk about testing. Because in the age of AI, knowing why code works is just as important as writing it. So, what makes a test good?

To understand what makes a test good, we first need to define what a test actually is. The definition I prefer is:

Software testing is defined as a set of techniques used to verify that the developed system meets the established requirements.

The truth is, we all test our code in some way, even if it's just manual testing. However, manual testing is expensive, slow, error-prone, and hard to replicate. That’s why automated testing is essential.

Automated tests are a tool that allows us to gain development speed and avoid severe maintenance problems, as long as they are appropriate. Writing tests require an investment of time, and whether we get it back depends on the quality of the tests. Through this article, we will see the qualities we look for in tests to maximise their benefits.

Functional vs. Non-Functional Testing

Tests can be broadly classified into two categories:

Functional testing verifies what the system does. This includes:

Unit tests
(isolated components)
Integration tests
(interactions between components)
End-to-End (E2E) tests
(user flow simulations)
Regression tests
(ensuring existing functionality remains intact)

Non-functional testing verifies how the system performs, covering aspects like:

Performance
Security
Accessibility
Usability
Maintainability

While developers typically handle functional tests (especially unit and integration tests), non-functional tests often require specialists in each area.

Test Distribution: The Evolving Pyramid

A common guideline is the Test Pyramid, which suggests:

Many unit tests (fast, easy to write)
Fewer integration tests (slower, harder to maintain)
Even fewer E2E tests (slowest, most brittle)

Originally proposed by Mike Cohn in 2009, this model is evolving. Modern tools like Playwright and Cypress (compared to older tools like Selenium) have made integration and E2E tests more practical. Additionally, static analyzers (e.g., linters, TypeScript) reduce the need for defensive tests (e.g., input validation), allowing us to focus more on business logic.

There’s no one-size-fits-all strategy; each project demands its own testing approach. However, the key is balancing speed, coverage, and maintainability while adapting to modern tooling.

What to Test?

One of the hardest parts of testing is knowing what to test. While I won’t dive deep into this topic here, I want to share a critical lesson I learned the hard way, which transformed how I approach testing.

The Pitfall of "Testing Everything"

When I joined a project with near-zero unit tests, the solution seemed obvious: write more tests. We did – mocking aggressively, chasing ~90% coverage, and patting ourselves on the back. But soon, reality hit:

Refactors broke tests constantly, even when behavior hadn’t changed.
High coverage gave false confidence – bugs still slipped through.
Over-mocking made tests brittle, disconnected from how the app actually worked.

The root cause? We were testing implementation details (like internal function calls or state mutations) instead of user-facing behavior.

A Better Approach

Inspired by resources like Kent C. Dodds’ articles, I shifted my mindset to create these unit testing tips:

Test like a user – Focus on what the software does, not how. Tools like Testing Library embody this principle.
Forget rigid categories – A test’s value isn’t in being "unit" or "integration", it’s in catching bugs fast.
Prioritize meaningful scenarios – Test the biggest realistic use case you can, not just isolated units.

Coverage Reports ≠ Quality

This is why I’m wary of coverage metrics: they often become goals unto themselves. The real purpose of testing isn’t to hit a number. It’s to answer the question, "Would I trust this code in production?"

Of course, what to test varies by project. But by focusing on behavior over implementation, you’ll write tests that actually prevent bugs, not just satisfy a report.

If you’ve never used a code coverage report, I recommend reading this article first: Making use of code coverage.

How to Write Good Tests

Writing effective tests is a skill that separates good developers from great ones. Here's my practical guide to writing tests that help rather than hinder your development process.

Fast to run and write
Reliable (not flaky)
Easy to read and understand
Use mocks effectively
Test one concept per test
Treated as first-class code

Tests Should Be Fast

We write tests to save time – to catch bugs early and enable safe refactoring. But when tests themselves become slow to write or run, they transform from time-savers into time-wasters.

When running, slow tests create bottlenecks: developers stop running them locally, or CI pipelines become productivity killers. If writing a test takes longer than writing the code itself, your testing approach (or, in many cases, your code) needs simplification.

When tests are:

Fast to run: Developers use them constantly
Fast to write: Tests actually get written
Fast to understand: Test maintainability improves

The Hidden Tax of Unreliable Tests

Erodes Trust: When tests fail randomly, developers start ignoring failures
Wastes Time: Investigating false positives consumes hours of productivity
Creates Noise: Real issues get buried in false alarms

Consider deleting a flaky test when the fix would make it overly complex, it tests non-critical functionality, or the maintenance cost exceeds its value

Easy to Read and Understand

One of the biggest problems we face when creating automatic tests is that we think our tests are not as important as our code, and we forget all the principles we apply to coding when writing tests.

DRY vs. Readability in Tests

Tests are specifications first, code second. While DRY is crucial for production code, tests benefit from controlled repetition when it aids clarity.

Good example, clear and self-contained. Not using external helpers.

- No external helpers or setup

- Concrete dates/amounts tell the story

- Business rule is obvious

Bad example, using external helpers that you can't see directly.

- Test name is vague (how late? what fee?)

- Critical business logic (30+ days) is buried

- The reader must jump between files

Test Length

Everyone would agree that having a 50-line function is not a good practice, but when it comes to testing, somehow it’s acceptable to have a 50-line test.

A good test should be short enough to fit entirely on your screen (typically 5-15 lines). If it’s longer, it’s likely doing too much.

Good example of test, only 3 lines, and focused on the specific behaviour to test.

Bad example of test because it's long and testing multiple things on one test. For example, adding items, removing items, and applying a discount.

Test One Concept Per Test

Each test should verify one specific behavior (not multiple scenarios).

In these examples, we are focused on checking one thing in each test. For example, on the first one, we are checking that we are able to add items to the shopping cart, and the total price gets updated.

The AAA Test Structure: Arrange, Act, Assert

A well-structured test follows the AAA pattern (Arrange-Act-Assert),

Arrange: Set up the test Context. Prepare all necessary objects, mocks, and data needed for the test. For example, if we test a method of a class, we will first have to instantiate that class to test it.
Act: Once the context is prepared, the next step is the action, act: we execute the action we want to test. For example, invoke a method with some parameters.
Assert: We verify if the result of the action is as expected. For example, the result of the invocation of the previous method has to return a certain value.

Only include what’s necessary for the specific test. The concrete data shown in a test should only be relevant to differentiate it from the rest. The data that is irrelevant to the behavior I want to confirm is preferably that they are hidden. Thus, when I see literal strings or numbers or any other concrete details, I know that I should pay attention to them.

Good example of test "Only create a user where it is important the age because we want to check if the isAdult function returns true when the age is 18".

Bad example of text. We want to check that the function isAdult returns true when the age is more than 18. But on this test, we are providing more properties in the user object that are not needed for this test.

Test Description

We invest a lot of time thinking about variable names, but when it comes to a test, we think it’s less important to name things correctly. Explaining what you're testing it's as important as the test itself. A test can be your code documentation; it's living documentation since you are forced to change it every time you change your business logic. When we name the test, it helps us understand the problem and the solution, so investing time in explaining your test is a good practice.

Test names should be clear statements in business language about the behavior of the system. If a person is able to understand the behavior of the system from the test names alone, it means that they are correctly named. The test content is a concrete example of a scenario, a snapshot of the system behavior at a given point in time with given values. It contains concrete data. Therefore, the test name should not have concrete data, but the general business rule that is being demonstrated in that test.

Good example of test for Shopping cart: 'test cart with 2 items and SUMMER20 coupon should return 20% discount'.

Good example of test "applies 15% discount for premium users".

Use Mocks Effectively

Test doubles are a necessary evil. The less you mock, the more your test will resemble the real environment. Mocking too much reduces the quality of the test, but not mocking enough will make your test slow. Similar to what happens when you try to identify what to test, knowing what to mock is not an easy task. Two things have helped me improve: learning the types of mocks and knowing common mistakes.

Types of Mocks

Take into account that knowing the names is not important, but knowing the differences makes a huge impact.

Dummy: Refers to a simple, placeholder object or value that is used to satisfy the parameters of a test but does not participate in the actual test logic. You should use a dummy when you need to pass an object to a method under test, but the object’s behavior is irrelevant for that particular test case.
Stub: Provides predefined responses to method calls, and is necessary when the element you are trying to test not only depends on its arguments, but also on another external source of data. It allows you to replace this source.
Mock & Spy: Verifies interactions (e.g., checks if a method was called with specific arguments). The difference between them is that the spy delegates to the real object and retains the real behavior.
Fake: A lightweight, working implementation that replaces a real dependency (e.g., for speed or simplicity). A good example is an in-memory database for testing.

Common Mistakes

Over-mocking. Mocking every dependency will make tests overly coupled to implementation details and break easily on refactoring.
Verifying implementation details using mock & spy. We should usually test outcomes rather than internal steps.
Mocking what you don’t own. Mocking third-party libraries or framework code. (e.g., axios, fetch) may cause mocks to drift from the real library’s behavior, hiding integration bugs.

Make Testing Your Priority

We should never treat our tests as less important than the code. One easy change in the team that helps me achieve this is to never separate test from implementation. A task is never finished until tests are implemented, green, and refactored. Never allow your team to say, "We don’t have enough time now for testing. Just deliver the code, and we will implement the test once we have the time." As we continue through the AI-powered coding era, oversight is becoming more crucial than ever.

Ariadna Gomez Ruiz

Crafting robust, user-friendly interfaces with clean code and solid tests. Passionate about performance, accessibility, and maintainable solutions. Let’s create something amazing—one component at a time.