zlib compression fails on large IOBufs
authorNicholas Ormrod <njormrod@fb.com>
Wed, 26 Nov 2014 00:36:20 +0000 (16:36 -0800)
committerDave Watson <davejwatson@fb.com>
Thu, 11 Dec 2014 15:58:58 +0000 (07:58 -0800)
Summary:
If a single IOBuf has size exceeding 2^32, then our zlib
compression algorithm fails. Specifically, zlib z_stream.avail_in is
only 32 bytes (I think it's a long?
http://www.gzip.org/zlib/zlib_faq.html#faq32), and so a too-big IOBuf
will overflow the z_stream and cause data loss.

This diff breaks up large IOBufs into smaller chunks.

Test Plan:
fbconfig -r folly && fbmake runtests

Also compressed biggrep's configerator blob, which is how this bug was
caught. It now works. See the associated task.

Reviewed By: robbert@fb.com

Subscribers: trunkagent, sdwilsh, njormrod, folly-diffs@

FB internal diff: D1702925

Tasks: 5648445

Signature: t1:1702925:1416958232:459d498ff1db13e1a20766855e6f2f97da8cde8c

folly/io/Compression.cpp

index 8c8fe61f11fdcf7ce5e01ff9a3104c74d765eae2..d7a0544dc3f094d371774f249607e04a9adc2717 100644 (file)
@@ -553,21 +553,25 @@ std::unique_ptr<IOBuf> ZlibCodec::doCompress(const IOBuf* data) {
        defaultBufferLength));
 
   for (auto& range : *data) {
-    if (range.empty()) {
-      continue;
-    }
-
-    stream.next_in = const_cast<uint8_t*>(range.data());
-    stream.avail_in = range.size();
-
-    while (stream.avail_in != 0) {
-      if (stream.avail_out == 0) {
-        out->prependChain(addOutputBuffer(&stream, defaultBufferLength));
+    uint64_t remaining = range.size();
+    uint64_t written = 0;
+    while (remaining) {
+      uint32_t step = (remaining > maxSingleStepLength ?
+                       maxSingleStepLength : remaining);
+      stream.next_in = const_cast<uint8_t*>(range.data() + written);
+      stream.avail_in = step;
+      remaining -= step;
+      written += step;
+
+      while (stream.avail_in != 0) {
+        if (stream.avail_out == 0) {
+          out->prependChain(addOutputBuffer(&stream, defaultBufferLength));
+        }
+
+        rc = deflate(&stream, Z_NO_FLUSH);
+
+        CHECK_EQ(rc, Z_OK) << stream.msg;
       }
-
-      rc = deflate(&stream, Z_NO_FLUSH);
-
-      CHECK_EQ(rc, Z_OK) << stream.msg;
     }
   }