Repository pattern, done right (.NET)

Edward Hieatt and Rob Mee defined the repository pattern as this:

Mediates between the domain and data mapping layers using a collection-like interface for accessing domain objects.

In essence, it's an abstraction that separates the domain from the persistence mechanism to reduce complexity and make changes in one layer independent of changes in the other. It creates cleaner code that is easy to understand and will not degrade as much when the application ages.

In short, your domain objects can be designed to solve the business problem and be properly encapsulated without having to contain compromises for the persistence layer.

As a bonus, it allows you to write unit tests instead of integration tests for the business layer.

In short: When applied correctly, the code is easier to read and maintain.

Common Misconceptions

Here are some common misconceptions regarding the purpose of the pattern.

Repositories are about being able to switch DAL implementation

Using repositories is not about being able to switch persistence technology (i.e., changing the database engine, using a web service etc.).

Repository pattern does allow you to do that, but it’s not the main purpose.

A more realistic approach is that you in UserRepository.GetUsersGroupOnSomeComplexQuery() use ADO.NET directly while you in UserRepository.Create() use Entity Framework. By doing so, you are probably saving a lot of time instead of struggling with Linq2Sql to get your complex query running.

The repository pattern lets you choose the technology that fits the current use case without affecting the rest of your code base.

Unit Testing

When people talk about the repository pattern and unit tests, they are not saying that the pattern allows you to use unit tests for the data access layer. They mean it allows you to unit test the business layer in isolation.

It’s possible as you can create repository stubs (which is a lot easier than faking nhibernate/EF interfaces) and, by doing so, write clean and readable tests for your business logic.

As you’ve separated business from data, you can also write integration tests for your data layer to make sure that the layer works with your current database schema.

If you use ORM/LINQ in your business logic, you can never be sure why the tests fail. It can be because your LINQ query is incorrect, your business logic is incorrect, or the ORM mapping is incorrect, or some intermittent problem with the db connection.

The repository pattern reduces the complexity of your tests and allows you to specialize your tests for the current layer. The tests are focused and, therefore, more readable.

The repository is for all types of DB access.

In an application, you typically have two types of operations.

The first type is to modify the state of the application. i.e., apply and persist to the domain. Changes are almost always applied to a single type of domain entity. Something like:

var userToLock = _repository.GetById(userId);
userToLock.LockAccount();
_repository.Update(userToLock);

The second type is to assemble information and present it to the user. For these situations, you typically merge information from more than one domain entity type (i.e. joins in the DB).

SELECT Users.Id, User.UserName, Posts.Title
FROM Users
JOIN Posts ON (Posts.UserId = Users.Id)

Repositories are not made for these types of queries, as they do not operate on a single entity type. It's important to understand if you would like to have repositories that are easy to manage (and have one reason to change, SRP).

How to Create a Repository

Building a correct repository implementation is very easy. In fact, you only have to follow a single rule:

Do not add anything to the repository class until the very moment that you need it.

At a minimum, CRUD (Create, Update, Delete) methods are required.

public interface ICrudOperations<TEntity>
{
    Task Create(TEntity entity);
    Task Update(TEntity entity);
    Task Delete(TEntity entity);
}

Since repositories most frequently operate on a single entity, we can also define an interface for that.

public interface ISingleKeyGet<TEntity, in TKey>
{
    Task<TEntity> GetById(TKey key);
}

As you can see, if your table has a composite key, you can create a similar interface for that, too.

It's time to compose our specific entity interface.

public interface IUserRepository 
   : ICrudOperations<User>
   , ISingleKeyGet<User, int>
{
}

That's it. If you do that for all your entity types, modifying the repository interfaces for all changes required during the application's lifetime is easy. There is no need to change an interface that is too broad to accommodate all types of entities.

Implementing the repository interface

I'm using the code from my previous post, Building a simple data mapper, to implement the interfaces.  You most likely already know how to do it with your favorite ORM library, while it can be interesting to see the code that your ORM library uses internally (ADO.NET).


public class UserRepository : IUserRepository
{
    private readonly IDbTransaction _transaction;
    private readonly IMapper<User> _mapper;

    public UserRepository(IDbTransaction transaction)
    {
        _transaction = transaction;
        _mapper = MappingRegistry.Instance.GetMapper<User>();
    }

    public async Task<User> GetById(int id)
    {
        await using var cmd = _transaction.CreateCommand();

        cmd.CommandText = "SELECT * FROM Users WHERE Id = @id";
        cmd.AddParameter("Id", id);
        return await cmd.GetOne<User>();
    }

    public async Task<List<User>> List()
    {
        await using var cmd = _transaction.CreateCommand();

        cmd.CommandText = "SELECT * FROM Users";
        cmd.AddParameter("userId", 10);
        return await cmd.ToList<User>();
    }

    public async Task Create(User entity)
    {
        await using var cmd = _transaction.CreateCommand();
        await _mapper.Insert(cmd, entity);
    }

    public async Task Update(User entity)
    {
        await using var cmd = _transaction.CreateCommand();
        await _mapper.Update(cmd, entity);
    }

    public async Task Delete(User entity)
    {
        await using var cmd = _transaction.CreateCommand();
        await _mapper.Delete(cmd, entity);
    }
}

Committing the transaction

The last thing to do is to commit the transaction. When to commit depends on the type of application. For an ASP.NET Core application, you can create an action filter.

public class TransactionFilter : IAsyncActionFilter
{
    private readonly IDbTransaction _transaction;

    public TransactionFilter(IDbTransaction transaction)
    {
        _transaction = transaction;
    }

    public async Task OnActionExecutionAsync(ActionExecutingContext context, ActionExecutionDelegate next)
    {
        var connection = _transaction.Connection;
        if (connection == null)
        {
            return;
        }

        if (connection.State != ConnectionState.Open)
            throw new NotSupportedException("The provided connection was not open!");

        var executedContext = await next.Invoke();
        if (executedContext.Exception == null)
            _transaction.Commit();
        else
            _transaction.Rollback();
    }
}

It will commit the transaction unless there has been an error in the controller action.

The final step is to add everything to the DI container. Here is an example of using Microsofts DI extensions:

builder.Services.AddScoped(x => ConnectionFactory.Create());
builder.Services.AddScoped(x => x.GetRequiredService<IDbConnection>().BeginTransaction());
builder.Services.AddScoped<TransactionFilter>();

Dealing with queries

As I mentioned, repositories should only be used to modify business entities unless you are willing to violate the Single Responsibility Principle. To be able to collect information that should be presented to the user, we can use queries instead.

The easiest way is to use your favorite ORM (or ADO.NET) directly in your API (like ASP.NET Core WebApi) and write integration tests for them. Or you can use a query library. Below, I'm using my own query library, which exists in the nuget package Griffin.Data.

Griffin.Data scaffolds query classes based on SQL scripts. Here is a script named ListUsers.query.sql:

--sorting
--paging
declare @nameToFind varchar(40) = 'TestName';

SELECT u.Id, UserName
FROM Users u
JOIN Accounts a ON (u.AccountId = a.Id)
WHERE u.UserName LIKE @name

Griffin.Data generates the following classes:

public class ListActiveUsers : IQuery<ListActiveUsersResult>
{
    public string NameToFind { get; set; }
}

// The result returned from the query handler.
public class ListActiveUsersResult
{
    public IReadOnlyList<ListActiveUsersResultItem> Items { get; set; }
}

// The result item with details.
public class ListActiveUsersResultItem
{
    public int Id  { get; set; }
    public string UserName  { get; set; }
}

// The query handler (which executes the query and generates the result).
//
// Since query handlers use ADO.NET directly, they are as fast as it gets.
// (there are a few optimisations left to do, but it should be fast enough as is)
public class ListUsersHandler : ListHandler<ListUsersResultItem>, IQueryHandler<ListUsers, ListUsersResult>
{
    public ListUsersHandler(QuerySession session) : base(session)
    {
    }

    public async Task<ListUsersResult> Execute(ListUsers query)
    {
        await using var command = Session.CreateCommand();
        command.CommandText = @"SELECT u.Id, UserName
                                FROM Users u
                                JOIN Accounts a ON (u.AccountId = a.Id)
                                WHERE a.State = 1";

        command.AddParameter("name", query.NameToFind);
        return new ListUsersResult { Items = await MapRecords(command) };
    }

    // This map method is as fast as ADO.NET gets.
    protected override void MapRecord(IDataRecord record, ListUsersResultItem item)
    {
        item.Id = record.GetInt32(0);
        item.UserName = record.GetString(1);
    }
}

Which can be used like this:

var query = new ListActiveUsers('jo%");
var result = await Session.Query(query);

The upside with code generation is that debugging and fixing errors is dead easy since you can step through the code.

Summary

Time to end this post. I hope you better understand what the repository pattern is and in which scenarios it fits.

But before we go, let's end with another thought.

Why you shouldn't mix data access and business code.

Here is a simple example of why it’s hard to spot bugs if you mix LINQ and business logic.

var brokenTrucks = _dbScope.Query()
                           .Where(x => x.State == 1);
                           
foreach (var truck in brokenTrucks)
{
    if (truck.CalculateReponseTime().TotalDays > 30)
    {
        SendEmailToManager(truck);
    }
}

What does that give us? Broken trucks?

Well. No. The statement was copied from another place in the code, and the developer had forgotten to update the query. Any unit tests would likely check that some trucks are returned and that they are emailed to the manager.

So we basically have two problems here:

Most developers will likely check the variable's name and not the query.‌‌ Any unit tests are against the business logic and not the query.‌‌ If we create repositories, we have unit tests for the business and integration tests for the data layer.

Reduced complexity is always one of the most important things for a developer because code that is easy to read and understand is less prone to bugs since the person after you won't interpret your code incorrectly (or guess what it does if it's too complex).

That is important to understand since many articles about the repository pattern don't address that. They are more "write less code" but not really an abstraction between the business and data layers.

The end

Is something unclear? Don't you agree? Leave comments for interesting discussions :)