Large or asynchronous file uploads in ASP.NET MVC
Share
The challenge of uploading large files in ASP.NET through HTTP is not a new challenge. It’s been covered extensively, by Jon Galloway, Stephen Walther, Milan Negovan and by one of the most active ASP.NET forum threads in history. In addition to handling large files, users often request an experience that shows the progress of an upload as it occurs. When you have either or both of these requirements, or if you simply need direct control of a stream of file data uploaded from a browser, you invariably hit the wall.
The vast majority of the conclusions reached on the best practice around large files and ASP.NET involve the following solutions:
- Don’t do it; you’re better off embedding a Silverlight or Flash process on the page to move the file uploading process outside of the pipeline
- Don’t do it; HTTP wasn’t designed for large file uploads, rethink your feature
- Don’t do it; ASP.NET wasn’t designed to handle files larger than 2GB
- Buy a commercial product like SlickUpload, that uses an HttpModule to stream a file in chunks
- Use an open source product like NeatUpload, that uses an HttpModule to stream a file in chunks
Recently I needed to build an upload utility that accomplished the following tasks:
- It had to work with the HTTP protocol
- It had to allow very large file uploads (files of unusual size)
- It had to enable continuation of partial uploads in the event of a network failure
- It had to simultaneously attempt to upload the same stream to a cloud service
The first three recommendations are out right away based on my requirements, and the other solutions were, for me, too heavy or too weighted in WebForms architecture to consider. So, I set about solving the problem in an ASP.NET MVC-specific way. If you’ve already read the background or have experience with this challenge, you know that most of the problem comes down to wrestling control of the input stream from ASP.NET and its chain of request processes. Community literature will tell you that as soon as you access the HttpRequest’s InputStream property in your code, whether you’ve accounted for it or not, ASP.NET will buffer the entire file upload, before you have a chance to access the stream. This means that if you have a requirement similar to mine where I needed to upload the file to a cloud service, you would have to wait for the entire, huge file to reach your server, and only then be able to stream it to your intended destination. For the bean counters, that means double the time required, and when you’re dealing with large file upload times, that’s a show stopper.
First, I recommend you read Scott Hanselman’s excellent tutorial on file uploads in ASP.NET MVC if you haven’t reached the point where you’re uploading files through a view and studying the results. It would seem that with that solution already in place, it shouldn’t take much to get large file uploads working. In Scott’s code, the only change you’d really need to keep large files in check is to add this line to your web.config file to ensure ASP.NET allows files up to 2GB in length, and doesn’t eat your server memory for breakfast, buffering to disk instead if the file size is larger than 256KB.
A simple and supported solution for most situations, to be sure, but I couldn’t deal with having to use the SaveAs method to wait for a huge file to download before sending it to a cloud service, and any attempt to read data from the stream causes the same delay: in summary, there’s no way around this behavior as-is. We have to go deep. In addition, I wasn’t thrilled with the memory usage pattern with this approach. Even though the file is buffered to disk, the SaveAs method causes a large memory spike that I wasn’t comfortable with.
So how, then, do I upload large files in ASP.NET MVC with direct access to the stream, without triggering any buffering? The solution lies in staying as far away from ASP.NET as possible. Let’s look at the UploadController. It has three action methods, one is just the index page hosting our file upload markup, one is based on the buffering logic discussed previously, and the other is based on a live streaming approach.
public class UploadController : Controller
{
[AcceptVerbs(HttpVerbs.Get)]
[Authorize]
public ActionResult Index()
{
return View();
}
[AcceptVerbs(HttpVerbs.Post)]
public ActionResult BufferToDisk()
{
var path = Server.MapPath("~/Uploads");
foreach (string file in Request.Files)
{
var fileBase = Request.Files[file];
try
{
if (fileBase.ContentLength > 0)
{
fileBase.SaveAs(Path.Combine(path, fileBase.FileName));
}
}
catch (IOException)
{
}
}
return RedirectToAction("Index", "Upload");
}
//[AcceptVerbs(HttpVerbs.Post)]
//[Authorize]
public void LiveStream()
{
var path = Server.MapPath("~/Uploads");
var context = ControllerContext.HttpContext;
var provider = (IServiceProvider)context;
var workerRequest = (HttpWorkerRequest)provider.GetService(typeof(HttpWorkerRequest));
//[AcceptVerbs(HttpVerbs.Post)]
var verb = workerRequest.GetHttpVerbName();
if(!verb.Equals("POST"))
{
Response.StatusCode = (int)HttpStatusCode.NotFound;
Response.SuppressContent = true;
return;
}
//[Authorize]
if(!context.User.Identity.IsAuthenticated)
{
Response.StatusCode = (int)HttpStatusCode.Unauthorized;
Response.SuppressContent = true;
return;
}
var encoding = context.Request.ContentEncoding;
var processor = new UploadProcessor(workerRequest);
processor.StreamToDisk(context, encoding, path);
//return RedirectToAction("Index", "Upload");
Response.Redirect(Url.Action("Index", "Upload"));
}
}
While there is obviously a class or two missing here, the basic approach is illustrated. It doesn’t look a whole lot different than the buffering logic, we’re still streaming to disk, but it does possess some crucial differences. Namely, there are no attributes associated with the method; verb and authorization constraints are removed, and replaced with manual equivalents. The reason for requiring manual response manipulation rather than convenient ActionFilterAttribute declarations, is that these attributes touch some important ASP.NET pipeline code that in turn triggers the request input stream handling to kick in. That’s even worse in this situation because we’re actually intercepting the raw HttpWorkerRequest to do our file uploading, and it can’t be doing two things at once. In other words, this controller action will stall out indefinitely the moment you add a filter attribute. Inconvenient, yes, but for a specialized method like this, it’s tolerable.
The HttpWorkerRequest has VIP access to the incoming request, and normally, it is enlisted by ASP.NET itself to do the work, but we are effectively kidnapping the request to do our bidding instead, and then tricking the rest of the request into thinking that it did the work already (more on that later). To do that, we’ll need the UploadProcessor class that is missing in the example above. The UploadProcessor’s responsibility is to physically read each chunk of data from the incoming browser, and save it to disk. Since an upload is coming as a multi-part form post, the processor needs to locate the content headers and strip them out of the output, as well as coordinate when it is time to move on to a new file, since multi-part form posts can support multiple files in one upload.
internal class UploadProcessor
{
private byte[] _buffer;
private byte[] _boundaryBytes;
private byte[] _endHeaderBytes;
private byte[] _endFileBytes;
private byte[] _lineBreakBytes;
private const string _lineBreak = "\r\n";
private readonly Regex _filename =
new Regex(@"Content-Disposition:\s*form-data\s*;\s*name\s*=\s*""file""\s*;\s*filename\s*=\s*""(.*)""",
RegexOptions.IgnoreCase | RegexOptions.Compiled);
private readonly HttpWorkerRequest _workerRequest;
public UploadProcessor(HttpWorkerRequest workerRequest)
{
_workerRequest = workerRequest;
}
public void StreamToDisk(IServiceProvider provider, Encoding encoding, string rootPath)
{
var buffer = new byte[8192];
if (!_workerRequest.HasEntityBody())
{
return;
}
var total = _workerRequest.GetTotalEntityBodyLength();
var preloaded = _workerRequest.GetPreloadedEntityBodyLength();
var loaded = preloaded;
SetByteMarkers(_workerRequest, encoding);
var body = _workerRequest.GetPreloadedEntityBody();
if (body == null) // IE normally does not preload
{
body = new byte[8192];
preloaded = _workerRequest.ReadEntityBody(body, body.Length);
loaded = preloaded;
}
var text = encoding.GetString(body);
var fileName = _filename.Matches(text)[0].Groups[1].Value;
fileName = Path.GetFileName(fileName); // IE captures full user path; chop it
var path = Path.Combine(rootPath, fileName);
var files = new List {fileName};
var stream = new FileStream(path, FileMode.Create);
if (preloaded > 0)
{
stream = ProcessHeaders(body, stream, encoding, preloaded, files, rootPath);
}
// Used to force further processing (i.e. redirects) to avoid buffering the files again
var workerRequest = new StaticWorkerRequest(_workerRequest, body);
var field = HttpContext.Current.Request.GetType().GetField("_wr", BindingFlags.NonPublic | BindingFlags.Instance);
field.SetValue(HttpContext.Current.Request, workerRequest);
if (!_workerRequest.IsEntireEntityBodyIsPreloaded())
{
var received = preloaded;
while (total - received >= loaded && _workerRequest.IsClientConnected())
{
loaded = _workerRequest.ReadEntityBody(buffer, buffer.Length);
stream = ProcessHeaders(buffer, stream, encoding, loaded, files, rootPath);
received += loaded;
}
var remaining = total - received;
buffer = new byte[remaining];
loaded = _workerRequest.ReadEntityBody(buffer, remaining);
stream = ProcessHeaders(buffer, stream, encoding, loaded, files, rootPath);
}
stream.Flush();
stream.Close();
stream.Dispose();
}
private void SetByteMarkers(HttpWorkerRequest workerRequest, Encoding encoding)
{
var contentType = workerRequest.GetKnownRequestHeader(HttpWorkerRequest.HeaderContentType);
var bufferIndex = contentType.IndexOf("boundary=") + "boundary=".Length;
var boundary = String.Concat("--", contentType.Substring(bufferIndex));
_boundaryBytes = encoding.GetBytes(string.Concat(boundary, _lineBreak));
_endHeaderBytes = encoding.GetBytes(string.Concat(_lineBreak, _lineBreak));
_endFileBytes = encoding.GetBytes(string.Concat(_lineBreak, boundary, "--", _lineBreak));
_lineBreakBytes = encoding.GetBytes(string.Concat(_lineBreak + boundary + _lineBreak));
}
private FileStream ProcessHeaders(byte[] buffer, FileStream stream, Encoding encoding, int count, ICollection files, string rootPath)
{
buffer = AppendBuffer(buffer, count);
var startIndex = IndexOf(buffer, _boundaryBytes, 0);
if (startIndex != -1)
{
var endFileIndex = IndexOf(buffer, _endFileBytes, 0);
if (endFileIndex != -1)
{
var precedingBreakIndex = IndexOf(buffer, _lineBreakBytes, 0);
if (precedingBreakIndex > -1)
{
startIndex = precedingBreakIndex;
}
endFileIndex += _endFileBytes.Length;
var modified = SkipInput(buffer, startIndex, endFileIndex, ref count);
stream.Write(modified, 0, count);
}
else
{
var endHeaderIndex = IndexOf(buffer, _endHeaderBytes, 0);
if (endHeaderIndex != -1)
{
endHeaderIndex += _endHeaderBytes.Length;
var text = encoding.GetString(buffer);
var match = _filename.Match(text);
var fileName = match != null ? match.Groups[1].Value : null;
fileName = Path.GetFileName(fileName); // IE captures full user path; chop it
if (!string.IsNullOrEmpty(fileName) && !files.Contains(fileName))
{
files.Add(fileName);
var filePath = Path.Combine(rootPath, fileName);
stream = ProcessNextFile(stream, buffer, count, startIndex, endHeaderIndex, filePath);
}
else
{
var modified = SkipInput(buffer, startIndex, endHeaderIndex, ref count);
stream.Write(modified, 0, count);
}
}
else
{
_buffer = buffer;
}
}
}
else
{
stream.Write(buffer, 0, count);
}
return stream;
}
private static FileStream ProcessNextFile(FileStream stream, byte[] buffer, int count, int startIndex, int endIndex, string filePath)
{
var fullCount = count;
var endOfFile = SkipInput(buffer, startIndex, count, ref count);
stream.Write(endOfFile, 0, count);
stream.Flush();
stream.Close();
stream.Dispose();
stream = new FileStream(filePath, FileMode.Create);
var startOfFile = SkipInput(buffer, 0, endIndex, ref fullCount);
stream.Write(startOfFile, 0, fullCount);
return stream;
}
private static int IndexOf(byte[] array, IList value, int startIndex)
{
var index = 0;
var start = Array.IndexOf(array, value[0], startIndex);
if (start == -1)
{
return -1;
}
while ((start + index) < array.Length)
{
if (array[start + index] == value[index])
{
index++;
if (index == value.Count)
{
return start;
}
}
else
{
start = Array.IndexOf(array, value[0], start + index);
if (start != -1)
{
index = 0;
}
else
{
return -1;
}
}
}
return -1;
}
private static byte[] SkipInput(byte[] input, int startIndex, int endIndex, ref int count)
{
var range = endIndex - startIndex;
var size = count - range;
var modified = new byte[size];
var modifiedCount = 0;
for (var i = 0; i < input.Length; i++)
{
if (i >= startIndex && i < endIndex)
{
continue;
}
if (modifiedCount >= size)
{
break;
}
modified[modifiedCount] = input[i];
modifiedCount++;
}
input = modified;
count = modified.Length;
return input;
}
private byte[] AppendBuffer(byte[] buffer, int count)
{
var input = new byte[_buffer == null ? buffer.Length : _buffer.Length + count];
if (_buffer != null)
{
Buffer.BlockCopy(_buffer, 0, input, 0, _buffer.Length);
}
Buffer.BlockCopy(buffer, 0, input, _buffer == null ? 0 : _buffer.Length, count);
_buffer = null;
return input;
}
}
In the middle of that processing code you’ll notice another class called StaticWorkerRequest. This class is responsible for tricking ASP.NET into thinking that no files were posted when the submit button was pressed. This is necessary because once the upload is complete, if we try to redirect to the desired page, ASP.NET will detect that there is still data in the HTTP entity body, and then once again attempt to buffer the entire upload, leaving us back at square one. To avoid this, we spoof the HttpWorkerRequest and inject it into the HttpContext, giving the StaticWorkerRequest just the first few bytes of the request, which it report is the only data available (simply: it lies). Unfortunately, this does require private reflection, so if you don’t have that capability with your web host, you’ll need to elevate your solution to full trust. Unfortunately I don’t know a way around this.
internal class StaticWorkerRequest : HttpWorkerRequest
{
readonly HttpWorkerRequest _request;
private readonly byte[] _buffer;
public StaticWorkerRequest(HttpWorkerRequest request, byte[] buffer)
{
_request = request;
_buffer = buffer;
}
public override int ReadEntityBody(byte[] buffer, int size)
{
return 0;
}
public override int ReadEntityBody(byte[] buffer, int offset, int size)
{
return 0;
}
public override byte[] GetPreloadedEntityBody()
{
return _buffer;
}
public override int GetPreloadedEntityBody(byte[] buffer, int offset)
{
Buffer.BlockCopy(_buffer, 0, buffer, offset, _buffer.Length);
return _buffer.Length;
}
public override int GetPreloadedEntityBodyLength()
{
return _buffer.Length;
}
public override int GetTotalEntityBodyLength()
{
return _buffer.Length;
}
public override string GetKnownRequestHeader(int index)
{
return index == HeaderContentLength
? "0"
: _request.GetKnownRequestHeader(index);
}
// All other methods elided, they're just passthrough
}
With the StaticWorkerRequest serving up false claims, you can now upload large files in ASP.NET MVC with direct access to the stream. With this code as a starting point, you could easily save progress data and interrogate it with an AJAX call to another controller action, buffer a large file to a temporary area to enable upload continuation, or simulcast an upload to a cloud service, all simultaneously without waiting for the entire file to buffer to disk with the ASP.NET process. As an added bonus, saving the file does not have the same memory footprint as the SaveAs method.

You can download a complete example solution (VS2008 and ASP.NET MVC 2) containing the code in this post.

Socialized