Using .Net Core to Tee Streams and Buffered AWS S3 Uploads

An approach to enable .NET to write to multiple streams simultaneously

Introduction

In the process, the SVG and the converted PDF were saved as temp files. I asked if, instead of temp files, they thought about piping streams. There turned out to be two reasons why the didn’t:

  1. TransferUtility only supports “seekable” streams where the content length is known. The STDOUT pipe from a Process call is not seekable.
  2. .NET doesn’t have built-in support for “Tee”-ing or multiplexing a stream.

I should have left it at that and moved on, but my curiosity got the best of me and I didn’t. Writing temp files is not a problem if you make sure to clean up after them. The solution was “good enough”. Nonetheless, I started messing around with this topic. I had been porting stuff to AWS Lambda (serverless) and temp files are not the best thing to be doing in that universe. I also was interested having more control over the memory footprint during transfer (I did not want to buffer entire image files in memory). I ended up writing an extension to TransferUtility that supports non-seekable streams, as well as class that facilitates redirecting a stream to multiple streams (i.e. a “tee”).

Basically, what I wanted to do was to get from this:

Conversion to Thumbnails using Temp files

to this…

Conversion to Thumbnails using Streams

In doing so, I wanted something that would work with any “recent” .NET Core code base with minimal external dependencies and libraries. Below is a description of the two pieces of the solution that were created:

  1. Extension of AWS Transfer Utility to support non-seekable streams and the creation of a .NET Stream to write S3 objects
  2. A .NET Stream class to redirect data written to it to multiple outputs (TeeStream)

Adding Support for Non-Seekable Streams to AWS TransferUtility

tl;dr

Implementation

To help performance, the stream will buffer content until a configurable threshold is reached, and then write the buffered content to S3. This lowers the call count to S3.

Like TransferUpload, this class does not support concurrent uploads of file parts. For the most part, this is because we have to be able to identify the final part if using S3-side encryption, and we don’t know the end until we reach it. Additionally, since I plan on using this in Lambda, I am avoiding scaling out multiple threads each with their own buffer; otherwise, I would need to size my Lambda function more aggressively. The trade-off is between speed and scalability/cost. For “reasonably sized” image files, this trade-off works for me.

With the S3BufferedUploadStream class, I can now start to model my conversion and upload as a stream pipeline. But, I still want to be able to split/multiplex streams.

.NET TeeStream Class

tl;dr

The class TeeStream, at its most basic, simply forwards the bytes read from an Input stream to one or more output streams. It does not buffer, have any backflow handling, etc. If an Tee Stream can’t write to an Output Stream, it will wait until it can.

Simple Multiplexed Output

What is a little more interesting is that a TeeStream object itself can be used as an input. Which means that we can send the data from the Input stream to anything that accepts a readable stream as an input, while still writing to one or more Output streams.

Tee Streams Can Be Used As Input Streams

Implementation

When data is written to a TeeStream, it is forwarded on to the Output streams. In the event that Output streams cannot be written to, the TeeStream will block until they can. You can set timeouts on the Output streams if you do not want to wait “indefinitely”.

When using the TeeStream as an Output source, a fixed-length buffer is created upon construction. This buffer should be large enough to cache data between reads from whatever is consuming the TeeStream itself. If the buffer fills up, the TeeStream will block until data is consumed.

Dealing With the End of a Stream

Using CopyTo to send data to a TeeStream presents a problem: TeeStream never “knows” when the Input stream is exhausted. This is important because TeeStream needs to generate its own end of stream indicator (i.e. return zero from Read/ReadAsync).

Setting up a timeout isn’t a good option, because there may be something happening “upstream” that takes time. There are two ways of dealing with this:

  1. Call TeeStream’s SetAtEnd method after sending data to it using CopyTo/CopyToAsync.
  2. Use TeeStream’s CopyFrom/CopyFromAsync methods to “pull” data from an Input stream instead of “pushing” it using CopyTo.

Putting It Together

Save Full-Sized and Thumbnail to S3, return Thumbnail in Response.Body

Obviously, embedding the conversion code inline is sloppy, but it’s easier to show here as a single code block. I am demonstrating using ImageMagick convert instead of Inkscape, since it is a more common use case.

What’s going on in this code?

  1. Instantiate an AmazonS3Client object. If necessary, you could set up credentials, region, etc. here.
  2. Create S3BufferedUpload streams to create publicly readable S3 objects for the full-sized and thumbnail image files.
  3. Create a TeeStream to output to the S3 thumbnail object and the Response.Body
  4. Create another TeeStream to output to the S3 full-sized object and allow the TeeStream itself to be read from
  5. Create a Task that calls Convert, using the TeeStream in step #4 as input and the TeeStream in step #3 as output (if something goes wrong, an exception is thrown with the contents of STDERR used to populate the Exception)
  6. Copy the posted Request.Body to the TeeStream created in step #3 (this feeds the Request.Body image to the Convert task while saving it to S3)
  7. Wait for the task to finish, which will include the writing of thumbnail image data to S3 as well as the Response.Body

Admittedly this demands refactoring and dependency injection, but for demonstration purposes, it meets the need.

Where to Get It

NuGet

Using dotnet to add these packages to a project…

dotnet add package TeeStreaming
dotnet add package S3BufferedUpload

GitHub

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store