Historically, generating and returning a CSV file via .NET with CsvHelper might look something like:
var dapper = new Dapper(new SqlConnection("..")); // Connection string
var query = "SELECT * FROM Users";
var data = await dapper.QueryAsync<ReportRow>(query);
using var memoryStream = new MemoryStream();
using var streamWriter = new StreamWriter(memoryStream);
using var csvWriter = new CsvWriter(streamWriter);
csvWriter.WriteRecords(data);
streamWriter.Flush();
memoryStream.Seek(0, SeekOrigin.Begin);
return File(memoryStream, "text/csv", "my-report.csv");This works well enough, but has some important considerations that don't become apparent when working with small files or low traffic.
The Problem
Given the above snippet, we can draw some observations:
All of the data is buffered from the database via
dapper.QueryAsync, and is all allocated to thedataarray. This also blocks execution until fully buffered, meaning the code won't continue on until everything has been read from the DB.The entire
dataarray is written to thememoryStreamStream sequentially and in a blocking fashion. Execution won't continue until all rows are written.This entire process is sequential. The HTTP file result won't start returning to the client browser until all prior steps are complete.
Overall, this is a lot of waiting around as we gradually step through each part of the code.
The Solution
With a few small changes, we can gain some nice performance boosts here.
var sqlConnection = new SqlConnection(".."); // Connection string
var query = "SELECT * FROM Users";
var data = sqlConnection.QueryUnbufferedAsync<ReportRow>(query);
Response.StatusCode = 200;
Response.ContentType = "text/csv";
Response.Headers.Add(
"Content-Disposition",
"attachment; filename=\"my-report.csv\"");
await using (var stream = Response.Body)
await using (var streamWriter = new StreamWriter(stream))
await using (var csvWriter = new CsvWriter(streamWriter))
{
await foreach (var row in data)
{
csvWriter.WriteRecord(row);
await csvWriter.NextRecordAsync();
}
await csvWriter.FlushAsync();
}
return new EmptyResult();What we have changed is:
Swapped to using
QueryUnbufferedAsyncfrom Dapper. This returns anIAsyncEnumerable<T>and lets us continue on with our execution.Instead of writing to a new
MemoryStreamto hold our formatted output, write directly to theResponse.Bodystream.Consume our
IAsyncEnumerableto write directly to theCsvWriterin anawait foreach, making full use of the async power we've gained.Manually formatting the
Responseproperties that were previously handled byreturn File(...), such asStatusCode,ContentType, etc.
Why?
With the above changes...
Our execution continues throughout the code, not blocking on any one particular line of code. Ultimately, the final
return new EmptyResult()executes very quickly (~20ms after Controller invocation) while the file continues to be streamed to completion afterwards.We don't allocate and re-allocate different collections or memory streams. There's no wasted memory here. As an added side-effect, we're never allocating the entire data set at once. This allows us to generate and return files of limitless size!
How?
This, under the hood, makes use of the chunked Transfer-Encoding header, which is essentially streaming. It omits the Content-Length header, and instead just streams all the data to completion.
