When you write automated tests to assert that your application or code is working correctly, you almost always have to set up the world in which to test it with, and provide data to functions (or components) for those tests.
There is a package called Faker.js which provides an easy way to generate realistic data for your tests, such as names, emails, addresses, phone numbers etc.
Here is an example:
import { faker } from '@faker-js/faker';
const randomName = faker.person.fullName(); // example data: 'Rowan Nikolaus'
const randomEmail = faker.internet.exampleEmail(); // example: 'Kassandra.Haley@example.com'
There are lots of 'modules', like the above two ('person' and 'internet' ), and they are also available in different languages .
In this blog post I am not going to explain how to use Faker.js (as it is so simple to use - just import it, and find the module/function to use).
But I want to explain why I personally use it and why I think it improves the quality of your tests.
It is much clearer what is important to your test
When you have large objects in your tests (such as input data/props), it can be hard when reading them to know which values are important for the test. For example:
const inputData = {
age: 40,
name: 'Bart Simpson',
email: 'bart.simpson@example.com',
address: {
street: '742 Evergreen Terrace',
city: 'Springfield',
state: 'Oregon',
zipCode: '97403',
},
phoneNumber: '+1-555-0123',
};
When you read a test with this in it, you might have these questions:
- Does the test care that the age is exactly 40?
- Does the exact name of "Bart Simpson" matter?
- Is a US address important for this test?
For some tests, they might be important. Maybe you are calculating the age someone can retire, so you need to know their location & age.
But if it isn't important, you can swap it out with Faker data, which massively hints that the specific values are not important to this test.
Here is the same example, using Faker:
const inputData = {
age: faker.number.int({ min: 18, max: 100 }),
name: faker.person.fullName(),
email: faker.internet.email(),
address: {
street: faker.location.streetAddress(),
city: faker.location.city(),
state: faker.location.state(),
zipCode: faker.location.zipCode(),
},
phoneNumber: faker.phone.number(),
};
When you read the test now, it is clear that the specific values (like 40 years old, USA address and phone number) are not important for the test.
Helps with testing realistic data
I have written and seen tons of tests with input data like this, with just random and not well-thought-out values:
const inputData = {
name: 'asdfasdf',
email: 'asdfasdf',
address: 'test',
phoneNumber: '123',
dateOfBirth: '2000-01-01',
username: 'user123',
city: 'xxx',
};
This can be ok... it makes it obvious that the specific values aren't important. But it can also mean your tests aren't catching edge cases or issues that might occur with more realistic data - like longer names, special characters in emails, or varying phone number formats.
If we were using Faker, the values would be random each time and much more realistic - a real email address, a real name (first & last), real phone number and so on.
Using real data that is more similar to what your users will use means your tests are more realistic - so they are more valuable.
But make sure your tests are not flaky after you start using Faker
Every time you run a test, the Faker data will be different.
Doing this can in some cases make tests flaky , if your tests will only pass with specific data.
Note: If your tests are flaky because they only pass with specific data - this might be a code smell that indicates your test is not well designed.
Of course: if Faker's data is causing flaky tests, it is easy to swap it out for hardcoded values.
Just be sure that it isn't failing due to a badly designed test, or a real bug.
Remember that real users in your app will essentially be inputting and using 'random' data.
If your tests work with any valid input (i.e. with any random data from Faker), then it probably means your tests are of higher quality.
When to NOT use Faker data
I am not advocating for changing every possible value into something that is generated from Faker. There are definitely times to use it, and times to NOT use it.
Here are some typical times you wouldn't want to reach for it:
- testing extreme values (-1, 0, very small/large values), empty arrays, etc
- when the specific value is important to a test - for example testing specific email/domain, specific ages, etc
- when testing validation you will probably need to test invalid data (such as incorrectly formed phone numbers, email addresses)
- when you need consistent values used between multiple tests for comparison purposes
- for regression tests when working with a bug that happened with specific data
Note: You can make Faker data deterministic
You can call faker.seed(someValue) to make the random data that Faker JS will return deterministic.
This can also be great if you want to use snapshots in your tests.
import { faker } from '@faker-js/faker';
faker.seed(123); // This will generate the same "random" data every time
Tip: If you are finding tests only fail on CI/CD but not locally and you suspect Faker's random data might be making this hard to debug, then temporarily making Faker deterministic can be really helpful.