S3 uploader high memory usage
How to prevent high memory usage when uploading many files via the Go S3 manager uploader.
Tldr
When using the S3 manager uploader, read the
Body
from anio.ReadSeekerAt
(and not anio.Reader
) to prevent high memory usage.
There are 2 options to upload files to S3 using the Go V2 AWS SDK (besides using presigned URLs):
Both accept PutObjectInput, where the Body
must be an io.Reader
.
From what I understand, option 2 is recommended when uploading many (large) files, because it:
- Safely uploads files concurrently across goroutines.
- Buffers large files into smaller chunks and uploads them in parallel.
The problem
While working on a project that needed to upload zip archives containing lots of files, I chose the S3 manager uploader for its concurrent upload capabilities.
But I quickly ran into memory issues: when uploading zip archives that contain hundreds of files, my Go service would often run out of memory (OOM).
For example, uploading a zip archive with ~100 files caused the service memory usage to consistently spike to ~500 MiB.
The code
Opening a file in the zip archive returns an io.ReadCloser
, and passing that to the Body
when uploading should stream the contents of the file efficiently to S3. So why the memory issues?
Profiling using a benchmark didn’t show any issues. So I started digging into the S3 manager code.
Root cause: default part size
The S3 manager uploader memory behavior is controlled by the PartSize parameter. By default, it’s set to 5 MiB and is also used in the memory pool (so the allocated buffer memory can be reused between uploads).
This is the interesting part: by default the uploader allocates the 5 MiB buffer for every file being uploaded, regardless of the file’s actual size.
This happens because the uploader needs to calculate part sizes before uploading, and with an io.Reader
it can only do this by reading the entire content into memory first. But if an io.ReadSeekerAt is used, the uploader can figure out the number of parts needed without buffering the entire content in memory.
The fix
Opening a zip file returns an io.ReadCloser
. So it must be “converted” to an io.ReadSeekerAt
to prevent memory issues (while still using the S3 manager to upload files concurrently).
I think the simplest options to do this are (before uploading):
- Write each file in the zip archive to a temporary file.
- Read each file in the zip archive into memory using
io.ReadAll()
andbytes.NewReader()
.
After testing both, I found using option 1 to be the (slightly) better choice. While both methods had similar memory overhead, writing to a temporary file used (slightly) less CPU and had (slightly) less Garbage Collector (GC) overhead.