Streaming CSV Files direct from the DB to the Browser WhatFor

Historically, generating and returning a CSV file via .NET with CsvHelper might look something like:

var dapper = new Dapper(new SqlConnection("..")); // Connection string
var query = "SELECT * FROM Users";
var data = await dapper.QueryAsync<ReportRow>(query);

using var memoryStream = new MemoryStream();
using var streamWriter = new StreamWriter(memoryStream);
using var csvWriter = new CsvWriter(streamWriter);

csvWriter.WriteRecords(data);
streamWriter.Flush();
memoryStream.Seek(0, SeekOrigin.Begin);

return File(memoryStream, "text/csv", "my-report.csv");

This works well enough, but has some important considerations that don't become apparent when working with small files or low traffic.

The Problem

Given the above snippet, we can draw some observations:

All of the data is buffered from the database via dapper.QueryAsync, and is all allocated to the data array. This also blocks execution until fully buffered, meaning the code won't continue on until everything has been read from the DB.
The entire data array is written to the memoryStream Stream sequentially and in a blocking fashion. Execution won't continue until all rows are written.
This entire process is sequential. The HTTP file result won't start returning to the client browser until all prior steps are complete.

Overall, this is a lot of waiting around as we gradually step through each part of the code.

The Solution

With a few small changes, we can gain some nice performance boosts here.

var sqlConnection = new SqlConnection(".."); // Connection string
var query = "SELECT * FROM Users";
var data = sqlConnection.QueryUnbufferedAsync<ReportRow>(query);

Response.StatusCode = 200;
Response.ContentType = "text/csv";
Response.Headers.Add(
	"Content-Disposition",
	"attachment; filename=\"my-report.csv\"");

await using (var stream = Response.Body)
await using (var streamWriter = new StreamWriter(stream))
await using (var csvWriter = new CsvWriter(streamWriter))
{  
	await foreach (var row in data)  
	{  
		csvWriter.WriteRecord(row);  
		await csvWriter.NextRecordAsync();  
	}

         await csvWriter.FlushAsync();  
}

return new EmptyResult();

What we have changed is:

Swapped to using QueryUnbufferedAsync from Dapper. This returns an IAsyncEnumerable<T> and lets us continue on with our execution.
Instead of writing to a new MemoryStream to hold our formatted output, write directly to the Response.Body stream.
Consume our IAsyncEnumerable to write directly to the CsvWriter in an await foreach, making full use of the async power we've gained.
Manually formatting the Response properties that were previously handled by return File(...), such as StatusCode, ContentType, etc.

Why?

With the above changes...

Our execution continues throughout the code, not blocking on any one particular line of code. Ultimately, the final return new EmptyResult() executes very quickly (~20ms after Controller invocation) while the file continues to be streamed to completion afterwards.
We don't allocate and re-allocate different collections or memory streams. There's no wasted memory here. As an added side-effect, we're never allocating the entire data set at once. This allows us to generate and return files of limitless size!

How?

This, under the hood, makes use of the chunked Transfer-Encoding header, which is essentially streaming. It omits the Content-Length header, and instead just streams all the data to completion.

Streaming CSV Files direct from the DB to the Browser

The Problem

The Solution

Why?

How?

Interested in some more?

Chris Walker

Recommended Posts