Blog

Fixing Entity Framework

Magnus Toftdal Lund Senior Software Developer, Solita

Published 01 Oct 2024

Reading time 9 min

Object-Relational-Mapping (ORMs) has been around for decades and basically tries to bridge the gap between objects in memory and the tables and columns of the persisted relational databases. It is a story of trade-offs as the paradigms are different and there isn’t one solution that fits perfectly. Entity Framework is Microsoft’s take on the problem – and has pretty much become the Gold Standard in C#.

I have been along for the ride from when ORMs in C# were ports from Java. And I’ve been taught the ins and outs of nhibernate.org by Oren Eini‘s blog who was one of the main contributers way back when. Patterns and practises to write code that adherred to the Holy Grail of good Object-Oriented dogma (Was it Balmer, that yelled “Encapsulation, encapsulation, encapsulation” at one point?). So, my primary motivation for this blog post is from the vantage point of “Old man yelling at skies”. 

ORMs and their ill repute

It is beyond the scope of this blog post to argue the pros and cons of ORMs in general – and even more so which framework does it better. For the latter – it is pretty much a lost battle at this point, anyway, so we might as well embrace Entity Framework and get the best out of it.

Entity Framework is good, but..

Microsoft (the company that brought you Teams) is known for many things, but given that Entity Framework is now in it’s third rewrite: “You can count on the Americans to do the right thing… when they’ve exhausted all other possibilities” (A quote attributed to Churchill during WW2, but likely he never said it). In my view, the framework has had to do a lot of compromises on good structure and design to make it very beginner-friendly and very easy to do TODO apps in. So, let me break down the problem areas:

The missing Unit of Work pattern

The idea with this pattern is that it “maintains a list of objects affected by a business transaction and coordinates the writing out of changes and the resolution of concurrency problems.” So, in other words, basically a glorified database transaction that spans a business operation. In Entity Framework terms, this -should- be the job of the DbContext.

And don’t get me wrong – it has all the parts of a Unit of Work – you (can) have Dirty-tracking of your entities, you (can) have concurrency checks and you (can) have database transactions. The problem in my view is that none of it is enforced nor encouraged. Let’s look at the evidence – behold the sample.

 public class SchoolContext : DbContext
{
public SchoolContext (DbContextOptions options)
: base(options)
{
}
public DbSet Students { get; set; }
public DbSet Enrollments { get; set; }
public DbSet Courses { get; set; }
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity().ToTable("Course");
modelBuilder.Entity().ToTable("Enrollment");
modelBuilder.Entity().ToTable("Student");
}
}

Two of the model classes:

public class Student
{
public int ID { get; set; }
public string LastName { get; set; }
public string FirstMidName { get; set; }
public DateTime EnrollmentDate { get; set; }
public ICollection Enrollments { get; set; }
}
public class Enrollment
{
public int EnrollmentID { get; set; }
public int CourseID { get; set; }
public int StudentID { get; set; }
[DisplayFormat(NullDisplayText = "No grade")]
public Grade? Grade { get; set; }
public Course Course { get; set; }
public Student Student { get; set; }
}

And then the usage (implying that the DbContext is injected in the controller’s ctor):

public async Task OnGetAsync()
{
Student = await _context.Students.Take(10).ToListAsync();
}

In my view, the DbContext is basically a facade for the Database in this code – not a unit of work. Depending on the injected settings, it can behave in a lot of different ways – from a (heavy-weight) Dapper, basically just mapping columns to properties – all the way to fully-fledged lazy-loading, change-tracking, and transaction-handling ORM. And you basically don’t know when you are in the middle of a controller method which you have. 

Not to mention that you can change it mid-way through… In the example above – would your guess be that the Enrollment collection on the Student class is populated when it is transformed into Json? When is the DbContext disposed? Can the code trigger an update to the Students? The point here is: You have no way of knowing unless you dig into settings, setup, and the controller code. Which brings me to my next point.

The missing encapsulation

As demonstrated above – the DbContext is a leaking abstraction – the client of the DbContext is basically free to do whatever it wants. It can call .SaveChanges() as many times as it wants – exception handling is left up to being implemented every single place the DbContext is used. Because things -will- go wrong when you have a database in play. 

A deadlock, a concurrency-trip-up, a competing transaction, a connection-hickup. Databases are fickle beasts that you need to tickle the right way to get along with. The default implementation is showing all its innards for all to abuse – and that isn’t a good thing. Flexible, yes – spaghetti-incident-prone code: also yes.

Entity Framework has luckily moved away from using partial classes – the whole point of the Object Oriented paradigm is that logic and data should be coupled. Still, there is a tendency to treat the classes covered by a DbContext as an anemic domain model. Since, by convention, Entity Framework considers all public properties as backed by the database – you can quickly end up with columns in the database (since the DbContext can also handle database migrations) that you never intended to be there.

The leaking abstraction

The whole premise of the ORM is to have your domain code a.k.a. the business logic be independent of storage. I heavily dislike attributes in domain code – and having two properties for the same business idea is simply a no-go just to facilitate how the ORM behave (what happens if the .StudentID of an Enrollment points to another student than the one in the .Student property?). 

And if you do scaffolding on an existing database, you will have full two-way collection/property around every single foreign key that exists. This may be convenient and easy, but it leads to another non-performing spaghetti incident.

The missing Id-map

Consider the following code:

public void CreateStudent()
{
var student = new Student{FirstMidName="Carson",LastName="Alexander",EnrollmentDate=DateTime.Parse("2019-09-01")};
var entry = _dbContext.Students.Add(student);
_dbContext.SaveChanges();
Console.WriteLine($"The student .ID: {student.ID}");
Console.WriteLine($"No, really - the Student .ID: {entry.Entity.ID});
}

In my world, the .IDs should be the same. They aren’t (the first is always the default value). And this can lead to subtle bugs, everywhere. The student and the entry.Entity variables are two different objects in memory – and if you alter the wrong one… Well, tough luck?

In conclusion

The DbContext wants to do too much for its own good: querying, updating (both of database and models), mapping, configuration, listeners, triggers and it exposes all the things – and it ends up doing everything 50-80% with the defaults set. 

The enemy here is the complexity from too many options. Just the usage of concrete classes instead of interfaces irks me (There is no IDbSet interface). My feeling is that the Microsoft Engineers might know the element of least surprise, but then got distracted by how you could save four lines of code making a TODO app if you ignored it. And don’t get me started on using your domain classes for database migrations – we would be here all week.

So, what do we do?

What I propose is to apply some good old CQRS for a gentle start. To keep it concise – I will not alter the models, but only the implementation of the DbContext. First, a few intent-driven interfaces and structs to show we mean business:

public interface IUnitOfWork
{
Task<QueryResult<TResult>> TryExecuteQuery<T, TResult>(IExecutableQuery<T, TResult> query) where T : class;
Task<CommandResult> TryExecuteCommand<T>(IExecutableCommand<T> command) where T : class;
Task<CommandResult> TryWrapCommands(IWrappingCommand wrappingCommand);
}
public interface IExecutableCommand<T> where T : class
{
Task<CommandResult> TryExecute(IDbSet<T> session);
}
public interface IWrappingCommand
{
Task<CommandResult> TryExecute(IUnitOfWork context);
}
public interface IExecutableQuery<T, TResult> where T : class
{
Task<QueryResult<TResult>> TryExecute(IQueryable<T> session);
}
public interface IDbSet<T> where T : class
{
Task<T> AddAsync(T entity);
//... Other needed methods from DbSet<T> goes here
}
public struct CommandResult
{
... ctors go here
public bool Success { get; }
public IEnumerable<string> Errors { get; }
}
public struct QueryResult<T>
{
... ctors go here
public bool Success { get; }
public IEnumerable<string> Errors { get; }
public T Result { get; }
}

And now the implementation of the DbContext:

    public class BaseDbContext : DbContext, IUnitOfWork
{
//... All the usual implementations with OnModelCreating and so on and so forth.
public async Task<QueryResult<TResult>> TryExecuteQuery<T, TResult>(IExecutableQuery<T, TResult> query) where T : class
{
try
{
return await query.TryExecute(Set<T>());
}
catch (Exception ex)
{
//Handle all the bad stuff
return new QueryResult<TResult>(ex);
}
}
public async Task<CommandResult> TryExecuteCommand<T>(IExecutableCommand<T> command) where T : class
{
try
{
var result = await command.TryExecute(new DbSetFacade<T>(Set<T>()));
if (result.Success)
{
await SaveChangesAsync();
}
return result;
}
catch (Exception ex)
{
//Handle all the bad stuff
return new CommandResult(ex);
}
}
public async Task<CommandResult> TryWrapCommands(IWrappingCommand wrappingCommand)
{
try
{
using var transaction = Database.BeginTransaction();
var result = await wrappingCommand.TryExecute(this);
if (result.Success)
{
await SaveChangesAsync();
await transaction.CommitAsync();
}
return result;
}
catch (Exception ex)
{
//Handle all the bad stuff
return new CommandResult(ex);
}
}
private class DbSetFacade<T> : IDbSet<T> where T : class
{
private readonly DbSet<T> _dbSet;
public DbSetFacade(DbSet<T> dbSet)
{
_dbSet = dbSet;
}
public async Task<T> AddAsync(T entity)
{
var result = await _dbSet.AddAsync(entity);
return result.Entity;
}
}
}
public class CreateStudentCommand : IExecutableCommand<Student>
{
private readonly StudentDto _dto;
private CreateStudentCommand(StudentDto dto)
{
_dto = dto;
}
public async Task<CommandResult> TryExecute(IDbSet<Student> dbSet)
{
var student = new Student{FirstMidName=_dto.FirstMidName,LastName=_dto.LastName,EnrollmentDate=_dto.EnrollmentDate_};
// .. Validate the student has valid values etc.
CreatedStudent = await dbSet.AddAsync(student);
return new CommandResult();
}
public Student CreatedStudent { get; private set; }
}

A few things to note here

  • Notice that the IExecutableQuery takes an IQueryable<T> and not the DbSet<T> – we want to emphasise when we do writes
  • Callers should rely on the interface IUnitOfWork (as you generally should) since it is far beyond the scope of this blog post to roll our own DbContext
  • We no longer publish the DbSet’s on the DbContext – they are basically not needed. Should you want to limit which domain classes can be used (I like an IEntity interface) – generic constraints are the way forward
  • The DbSetFacade class is a replacement for Microsoft’s missing interface for DbSet<T>. Basically, we enforce that only the method that we consider commands are allowed to be called with a command
  • The TryWrapCommands method is to handle when you have more than one Command/DbSet/Query in play – and you want the transaction to span all of them
  • Entity Framework uses a transaction for a .SaveChanges() call unless you have your own transaction logic
  • The CreateStudentCommand hides the shortcoming of a missing ID-map by only having the student without an .Id as a transient variable, not accessible from the outside

Now, this is just the first step in making your code more Object Oriented, Clean, guess-the-next-bussword and by no means and end-all solution. I would very much like to have separation between the Read models and the Write models as the next step, but that is for a future post. And also, this is just an old man yelling at skies – so, feel free to correct any misconceptions, errors or better approaches.

  1. Tech