字符串压缩(二)之LZ4

  本文来自博客园,作者:T-BARBARIANS,转载请注明原文链接:https://www.cnblogs.com/t-bar/p/16451185.html 谢谢!

 

  上一篇对google精品ZSTD的压缩、解压缩方法,压缩、解压缩的性能表现,以及多线程压缩的使用方法进行了介绍。

  本篇,我们从类似的角度,看看LZ4有如何表现。

一、LZ4压缩与解压

  LZ4有两个压缩函数。默认压缩函数原型:

  int LZ4_compress_default(const char* src, char* dst, int srcSize, int dstCapacity);

  快速压缩函数原型:

  int LZ4_compress_fast (const char* src, char* dst, int srcSize, int dstCapacity, int acceleration);

  快速压缩函数acceleration的参数范围:[1 ~ LZ4_ACCELERATION_MAX],其中LZ4_ACCELERATION_MAX为65537。什么意思呢,简单的说就是acceleration值越大,压缩速率越快,但是压缩比就越低,后面我会用实验数据来进行说明。

  另外,当acceleration = 1时,就是简化版的LZ4_compress_defaultLZ4_compress_default函数默认acceleration = 1。

 

  LZ4也有两个解缩函数。安全解缩函数原型:

  int LZ4_decompress_safe (const char* src, char* dst, int compressedSize, int dstCapacity);

  快速解缩函数原型:
  int LZ4_decompress_fast (const char* src, char* dst, int originalSize);

  快速解压函数不建议使用。因为LZ4_decompress_fast 缺少被压缩后的文本长度参数,被认为是不安全的,LZ4建议使用LZ4_decompress_safe。

  同样,我们先来看看LZ4的压缩与解压缩示例。

 1 #include <stdio.h>  2 #include <string.h>  3 #include <sys/time.h>  4 #include <malloc.h>  5 #include <lz4.h>  6 #include <iostream>  7   8 using namespace std;  9  10 int main() 11 { 12     char peppa_pig_buf[2048] = "Narrator: It is raining today. So, Peppa and George cannot  13     play outside.Peppa: Daddy, it's stopped raining. Can we go out to play?Daddy: Alright,  14     run along you two.Narrator: Peppa loves jumping in muddy puddles.Peppa: I love muddy puddles. 15     Mummy: Peppa. If you jumping in muddy puddles, you must wear your boots.Peppa: Sorry, Mummy. 16     Narrator: George likes to jump in muddy puddles, too.Peppa: George. If you jump in muddy  17     puddles, you must wear your boots.Narrator: Peppa likes to look after her little brother,  18     George.Peppa: George, let's find some more pud dles.Narrator: Peppa and George are having  19     a lot of fun. Peppa has found a lttle puddle. George hasfound a big puddle.Peppa: Look,  20     George. There's a really big puddle.Narrator: George wants to jump into the big puddle first. 21     Peppa: Stop, George. | must check if it's safe for you. Good. It is safe for you.  22     Sorry, George. It'sonly mud.Narrator: Peppa and George love jumping in muddy puddles. 23     Peppa: Come on, George. Let's go and show Daddy.Daddy: Goodness me.Peppa: Daddy. Daddy.  24     Guess what we' ve been doing.Daddy: Let me think... Have you been wa tching television? 25     Peppa: No. No. Daddy.Daddy: Have you just had a bath?Peppa: No. No.Daddy: | know.  26     You've been jumping in muddy puddles.Peppa: Yes. Yes. Daddy. We've been jumping in muddy  27     puddles.Daddy: Ho. Ho. And look at the mess you're in.Peppa: Oooh....Daddy: Oh, well,  28     it's only mud. Let's clean up quickly before Mummy sees the mess.Peppa: Daddy,  29     when we've cleaned up, will you and Mummy Come and play, too?Daddy: Yes, we can all play  30     in the garden.Narrator: Peppa and George are wearing their boots. Mummy and Daddy are  31     wearing their boots.Peppa loves jumping up and down in muddy puddles. Everyone loves jumping  32     up and down inmuddy puddles.Mummy: Oh, Daddy pig, look at the mess you're in. .Peppa:  33     It's only mud."; 34  35     size_t com_space_size; 36     size_t peppa_pig_text_size; 37  38     char *com_ptr = NULL; 39  40     // compress 41     peppa_pig_text_size = strlen(peppa_pig_buf); 42     com_space_size = LZ4_compressBound(peppa_pig_text_size); 43      44     com_ptr = (char *)malloc(com_space_size); 45     if(NULL == com_ptr) { 46         cout << "compress malloc failed" << endl; 47         return -1; 48     } 49  50     memset(com_ptr, 0, com_space_size); 51  52     size_t com_size; 53     //com_size = LZ4_compress_default(peppa_pig_buf, com_ptr, peppa_pig_text_size, com_space_size); 54     com_size = LZ4_compress_fast(peppa_pig_buf, com_ptr, peppa_pig_text_size, com_space_size, 1); 55     cout << "peppa pig text size:" << peppa_pig_text_size << endl; 56     cout << "compress text size:" << com_size << endl; 57     cout << "compress ratio:" << (float)peppa_pig_text_size / (float)com_size << endl << endl; 58  59  60     // decompress 61     size_t decom_size; 62     char* decom_ptr = NULL; 63      64     decom_ptr = (char *)malloc((size_t)peppa_pig_text_size); 65     if(NULL == decom_ptr) { 66         cout << "decompress malloc failed" << endl; 67         return -1; 68     } 69  70     decom_size = LZ4_decompress_safe(com_ptr, decom_ptr, com_size, peppa_pig_text_size); 71     cout << "decompress text size:" << decom_size << endl; 72  73     // use decompress buf compare with origin buf 74     if(strncmp(peppa_pig_buf, decom_ptr, peppa_pig_text_size)) { 75         cout << "decompress text is not equal peppa pig text" << endl; 76     } 77      78     free(com_ptr); 79     free(decom_ptr); 80     return 0; 81 }

执行结果:

字符串压缩(二)之LZ4

  从结果可以发现,压缩之前的peppa pig文本长度为1848,压缩后的文本长度为1125(上一篇ZSTD为759),压缩率为1.6,解压后的长度与压缩前相等。相同文本情况下,压缩率低于ZSTD的2.4。从文本被压缩后的长度表现来说,LZ4比ZSTD要差。

  下图图1是LZ4随着acceleration的递增,文本被压缩后的长度与acceleration的关系。随着acceleration的递增,文本被压缩后的长度越来越长。

字符串压缩(二)之LZ4

图1

  图2是LZ4随着acceleration的递增,压缩率acceleration的关系。随着acceleration的递增,压缩率也越来越低。
字符串压缩(二)之LZ4

 图2

  这是为什么呢?还是上一篇提到的 鱼(性能)和熊掌(压缩比)的关系。获得了压缩的高性能,失去了算法的压缩率。

二、LZ4压缩性能探索

  接下来摸索一下LZ4的压缩性能,以及LZ4在不同acceleration级别下的压缩性能。

  测试方法是,使用LZ4_compress_fast,连续压缩同一段文本并持续10秒。每一次分别使用不同的acceleration级别,最后得到每一种acceleration级别下每秒的平均压缩速率。测试压缩性能的代码示例如下:

 1 #include <stdio.h>  2 #include <string.h>  3 #include <sys/time.h>  4 #include <malloc.h>  5 #include <lz4.h>  6 #include <iostream>  7   8 using namespace std;  9  10 int main() 11 { 12     char peppa_pig_buf[2048] = "Narrator: It is raining today. So, Peppa and George cannot  13     play outside.Peppa: Daddy, it's stopped raining. Can we go out to play?Daddy: Alright,  14     run along you two.Narrator: Peppa loves jumping in muddy puddles.Peppa: I love muddy puddles. 15     Mummy: Peppa. If you jumping in muddy puddles, you must wear your boots.Peppa: Sorry, Mummy. 16     Narrator: George likes to jump in muddy puddles, too.Peppa: George. If you jump in muddy  17     puddles, you must wear your boots.Narrator: Peppa likes to look after her little brother,  18     George.Peppa: George, let's find some more pud dles.Narrator: Peppa and George are having  19     a lot of fun. Peppa has found a lttle puddle. George hasfound a big puddle.Peppa: Look,  20     George. There's a really big puddle.Narrator: George wants to jump into the big puddle first. 21     Peppa: Stop, George. | must check if it's safe for you. Good. It is safe for you.  22     Sorry, George. It'sonly mud.Narrator: Peppa and George love jumping in muddy puddles. 23     Peppa: Come on, George. Let's go and show Daddy.Daddy: Goodness me.Peppa: Daddy. Daddy.  24     Guess what we' ve been doing.Daddy: Let me think... Have you been wa tching television? 25     Peppa: No. No. Daddy.Daddy: Have you just had a bath?Peppa: No. No.Daddy: | know.  26     You've been jumping in muddy puddles.Peppa: Yes. Yes. Daddy. We've been jumping in muddy  27     puddles.Daddy: Ho. Ho. And look at the mess you're in.Peppa: Oooh....Daddy: Oh, well,  28     it's only mud. Let's clean up quickly before Mummy sees the mess.Peppa: Daddy,  29     when we've cleaned up, will you and Mummy Come and play, too?Daddy: Yes, we can all play  30     in the garden.Narrator: Peppa and George are wearing their boots. Mummy and Daddy are  31     wearing their boots.Peppa loves jumping up and down in muddy puddles. Everyone loves jumping  32     up and down inmuddy puddles.Mummy: Oh, Daddy pig, look at the mess you're in. .Peppa:  33     It's only mud."; 34  35     int cnt = 0; 36      37     size_t com_size; 38     size_t com_space_size; 39     size_t peppa_pig_text_size; 40  41     timeval st, et; 42     char *com_ptr = NULL; 43  44     peppa_pig_text_size = strlen(peppa_pig_buf); 45     com_space_size = LZ4_compressBound(peppa_pig_text_size); 46  47     int test_times = 6; 48     int acceleration = 1; 49      50     // compress performance test 51     while(test_times >= 1) { 52      53         gettimeofday(&st, NULL); 54         while(1) { 55          56             com_ptr = (char *)malloc(com_space_size); 57             if(NULL == com_ptr) { 58                 cout << "compress malloc failed" << endl; 59                 return -1; 60             } 61              62             com_size = LZ4_compress_fast(peppa_pig_buf, com_ptr, peppa_pig_text_size, com_space_size, acceleration); 63             if(com_size <= 0) { 64                 cout << "compress failed, error code:" << com_size << endl; 65                 free(com_ptr); 66                 return -1; 67             } 68              69             free(com_ptr); 70          71             cnt++; 72             gettimeofday(&et, NULL); 73             if(et.tv_sec - st.tv_sec >= 10) { 74                 break; 75             } 76         } 77          78         cout << "acceleration:" << acceleration << ", compress per second:" << cnt/10 << " times" << endl; 79  80         ++acceleration; 81         --test_times; 82     } 83  84     return 0; 85 }

执行结果:

 字符串压缩(二)之LZ4

  结果可以总结为两点:一是acceleration为默认值1时,即LZ4_compress_default函数的默认值时,每秒的压缩性能在20W+;二是随着acceleration的递增,每秒的压缩性能也在递增,但是代价就是获得更低的压缩率。

  acceleration递增与压缩速率的关系如下图所示:

字符串压缩(二)之LZ4

 图3

三、LZ4解压性能探索

  接下来继续了解一下LZ4的解压性能。

  测试方法是先使用LZ4_compress_fastacceleration = 1压缩文本,再使用安全解压函数LZ4_decompress_safe,连续解压同一段文本并持续10秒,最后得到每秒的平均解压速率。测试解压性能的代码示例如下:

 1 #include <stdio.h>  2 #include <string.h>  3 #include <sys/time.h>  4 #include <malloc.h>  5 #include <lz4.h>  6 #include <iostream>  7   8 using namespace std;  9  10 int main() 11 { 12     char peppa_pig_buf[2048] = "Narrator: It is raining today. So, Peppa and George cannot  13     play outside.Peppa: Daddy, it's stopped raining. Can we go out to play?Daddy: Alright,  14     run along you two.Narrator: Peppa loves jumping in muddy puddles.Peppa: I love muddy puddles. 15     Mummy: Peppa. If you jumping in muddy puddles, you must wear your boots.Peppa: Sorry, Mummy. 16     Narrator: George likes to jump in muddy puddles, too.Peppa: George. If you jump in muddy  17     puddles, you must wear your boots.Narrator: Peppa likes to look after her little brother,  18     George.Peppa: George, let's find some more pud dles.Narrator: Peppa and George are having  19     a lot of fun. Peppa has found a lttle puddle. George hasfound a big puddle.Peppa: Look,  20     George. There's a really big puddle.Narrator: George wants to jump into the big puddle first. 21     Peppa: Stop, George. | must check if it's safe for you. Good. It is safe for you.  22     Sorry, George. It'sonly mud.Narrator: Peppa and George love jumping in muddy puddles. 23     Peppa: Come on, George. Let's go and show Daddy.Daddy: Goodness me.Peppa: Daddy. Daddy.  24     Guess what we' ve been doing.Daddy: Let me think... Have you been wa tching television? 25     Peppa: No. No. Daddy.Daddy: Have you just had a bath?Peppa: No. No.Daddy: | know.  26     You've been jumping in muddy puddles.Peppa: Yes. Yes. Daddy. We've been jumping in muddy  27     puddles.Daddy: Ho. Ho. And look at the mess you're in.Peppa: Oooh....Daddy: Oh, well,  28     it's only mud. Let's clean up quickly before Mummy sees the mess.Peppa: Daddy,  29     when we've cleaned up, will you and Mummy Come and play, too?Daddy: Yes, we can all play  30     in the garden.Narrator: Peppa and George are wearing their boots. Mummy and Daddy are  31     wearing their boots.Peppa loves jumping up and down in muddy puddles. Everyone loves jumping  32     up and down inmuddy puddles.Mummy: Oh, Daddy pig, look at the mess you're in. .Peppa:  33     It's only mud."; 34  35     int cnt = 0; 36      37     size_t com_size; 38     size_t com_space_size; 39     size_t peppa_pig_text_size; 40  41     timeval st, et; 42     char *com_ptr = NULL; 43  44     // compress 45     peppa_pig_text_size = strlen(peppa_pig_buf); 46     com_space_size = LZ4_compressBound(peppa_pig_text_size); 47  48     com_ptr = (char *)malloc(com_space_size); 49     if(NULL == com_ptr) { 50         cout << "compress malloc failed" << endl; 51         return -1; 52     } 53  54     com_size = LZ4_compress_fast(peppa_pig_buf, com_ptr, peppa_pig_text_size, com_space_size, 1); 55     if(com_size <= 0) { 56         cout << "compress failed, error code:" << com_size << endl; 57         free(com_ptr); 58         return -1; 59     } 60  61     // decompress 62     size_t decom_size; 63     char* decom_ptr = NULL; 64      65     // decompress performance test 66     gettimeofday(&st, NULL); 67     while(1) { 68  69         decom_ptr = (char *)malloc((size_t)peppa_pig_text_size); 70         if(NULL == decom_ptr) { 71             cout << "decompress malloc failed" << endl; 72             free(com_ptr); 73             return -1; 74         } 75          76         decom_size = LZ4_decompress_safe(com_ptr, decom_ptr, com_size, peppa_pig_text_size); 77         if(decom_size <= 0) { 78             cout << "decompress failed, error code:" << decom_size << endl; 79             free(com_ptr); 80             free(decom_ptr); 81             return -1; 82         } 83  84         free(decom_ptr); 85  86         cnt++; 87         gettimeofday(&et, NULL); 88         if(et.tv_sec - st.tv_sec >= 10) { 89             break; 90         } 91     } 92  93     free(com_ptr); 94     cout << "decompress per second:" << cnt/10 << " times" << endl; 95      96     return 0; 97 }

执行结果:

字符串压缩(二)之LZ4

   结果显示LZ4的解压性能大概在每秒54W次左右,解压速率还是非常可观。

四、LZ4对比ZSTD

  使用相同的待压缩文本,分别使用ZSTD与LZ4进行压缩、解压、压缩性能、解压性能测试后有表1的数据。

表1

字符串压缩(二)之LZ4

  

  抛开算法的优劣对比,从实验结果来看,ZSTD更加侧重于压缩率,LZ4(acceleration = 1)更加侧重于压缩性能。

五、总结

  无论任何算法,都很难做到既有高性能压缩的同时,又有特别高的压缩率。两者必须要做一个取舍,或者找到一个合适的平衡点。

  如果在性能可以接受的情况下,选择具有更高压缩率的ZSTD将更加节约存储空间(通过线程池进行多线程压缩可以进一步提升性能);如果对压缩率不是特别看中,追求更高的压缩性能,那LZ4也是一个不错的选择。

  最后,看到这里是不是觉得任何长度的字符串都可以被ZSTD、LZ4之类的压缩算压缩得很好呢?欲知后事如何,请听下回分解!码字不易,还请各位技术爱好者登录点个赞呀!

 

  本文来自博客园,作者:T-BARBARIANS,转载请注明原文链接:https://www.cnblogs.com/t-bar/p/16451185.html 谢谢!

发表评论

相关文章