View Issue Details

IDProjectCategoryView StatusLast Update
0000328JVT JM H.264/AVC reference softwaredecoderpublic2013-09-02 21:22
ReporterDag Ågren Assigned ToKarsten Suehring  
PrioritynormalSeveritycrashReproducibilityalways
Status resolvedResolutionfixed 
PlatformARM, DSPOSBare metal 
Product VersionJM 18.5 
Fixed in VersionJM 18.6 
Summary0000328: Loop filter relies on undefined behavior
DescriptionThe loop filter does a large number of casts from byte pointers to int and int64 pointers. It relies on ints containing four bytes in little-endian order, and unaligned reads and writes being possible. This is not the case on several platforms. It causes crashes on ARM, and incorrect decoding on TI TMS320C64+ DSPs.

In many places, the code casts a byte pointer to an int pointer in order to check four bytes in parallel. This does not work if an int does not contain exactly four bytes. This seems to be the case at least on the TI C64+ DSP platform, where this will cause incorrect decoding. There are a few similar cases with casts to int64, to process eight bytes in parallel.

In a few places, the code casts a byte pointer to an int pointer in order to write four bytes at a time. This also does not work under the same circumstances as above.

And in a few other places, the parallel writing code mentioned above has apparently been incorrectly copied into places where only a single byte needs to be written. This causes the three following bytes to be zeroed, which will write outside the bounds of the buffer. It will also not work on a big-endian platform, where it will just fill the buffer with zeroes. Furthermore, it will do unaligned 32-bit writes, which ARM does nor support, causing a crash.

(This crash seems not to happen when running under Linux on ARM, but only when we tried running on the bare metal. It may be that Linux is catching the exceptions caused by unaligned reads and emulating the reads in software. This would be very, very slow, however.)
Additional InformationA diff is included that removes all pointer casts, and also removes the use of memset and memcpy. We have not tested conclusively whether memset and memcpy cause problems, but these were changed just to be on the safe side. The code is also somewhat clearer without them, and a good compiler would optimize those anyway.

The comments for some of the corrected lines were also incorrect, and we have attempted to fix them, too.
TagsNo tags attached.

Activities

Dag Ågren

2013-07-11 13:33

reporter  

loopfilter.diff (16,097 bytes)   
diff -r 33318c0be44f ldecod/src/loopFilter.c
--- a/ldecod/src/loopFilter.c	Thu Jul 11 13:28:39 2013 +0300
+++ b/ldecod/src/loopFilter.c	Thu Jul 11 14:11:06 2013 +0300
@@ -203,7 +203,6 @@
     int           edge;
 
     byte Strength[16];
-    int64         *p_Strength64 = (int64 *) Strength;
     short         mb_x, mb_y;
 
     int           filterNon8x8LumaEdgesFlag[4] = {1,1,1,1};
@@ -266,7 +265,10 @@
         // Strength for 4 blks in 1 stripe
         get_strength_ver_MBAff(Strength, MbQ, edge << 2, mvlimit, p);
 
-        if ((p_Strength64[0]) || (p_Strength64[1])) // only if one of the 16 Strength bytes is != 0
+        if ( Strength[0] != 0 || Strength[1] != 0 || Strength[2] != 0 || Strength[3] !=0 ||
+        Strength[4] != 0 || Strength[5] != 0 || Strength[6] != 0 || Strength[7] !=0 ||
+        Strength[8] != 0 || Strength[9] != 0 || Strength[10] != 0 || Strength[11] !=0 ||
+        Strength[12] != 0 || Strength[13] != 0 || Strength[14] != 0 || Strength[15] !=0 ) // only if one of the 16 Strength bytes is != 0
         {
           if (filterNon8x8LumaEdgesFlag[edge])
           {
@@ -312,7 +314,10 @@
         // Strength for 4 blks in 1 stripe
         get_strength_hor_MBAff(Strength, MbQ, edge << 2, mvlimit, p);
 
-        if ((p_Strength64[0]) || (p_Strength64[1])) // only if one of the 16 Strength bytes is != 0
+        if ( Strength[0] != 0 || Strength[1] != 0 || Strength[2] != 0 || Strength[3] !=0 ||
+        Strength[4] != 0 || Strength[5] != 0 || Strength[6] != 0 || Strength[7] !=0 ||
+        Strength[8] != 0 || Strength[9] != 0 || Strength[10] != 0 || Strength[11] !=0 ||
+        Strength[12] != 0 || Strength[13] != 0 || Strength[14] != 0 || Strength[15] !=0 ) // only if one of the 16 Strength bytes is != 0
         {
           if (filterNon8x8LumaEdgesFlag[edge])
           {
@@ -554,7 +559,7 @@
       {      
         byte *Strength = MbQ->strength_ver[edge];
 
-        if ((*((int *) Strength))) // only if one of the 16 Strength bytes is != 0
+        if ( Strength[0] != 0 || Strength[1] != 0 || Strength[2] != 0 || Strength[3] != 0 ) // only if one of the 4 first Strength bytes is != 0
         {
           if (filterNon8x8LumaEdgesFlag[edge])
           {
@@ -599,7 +604,10 @@
       {
         byte *Strength = MbQ->strength_hor[edge];
 
-        if ((*((int64 *) Strength)) || ((*(((int64 *) Strength) + 1)))) // only if one of the 16 Strength bytes is != 0
+        if ( Strength[0] != 0 || Strength[1] != 0 || Strength[2] != 0 || Strength[3] !=0 ||
+        Strength[4] != 0 || Strength[5] != 0 || Strength[6] != 0 || Strength[7] !=0 ||
+        Strength[8] != 0 || Strength[9] != 0 || Strength[10] != 0 || Strength[11] !=0 ||
+        Strength[12] != 0 || Strength[13] != 0 || Strength[14] != 0 || Strength[15] !=0 ) // only if one of the 16 Strength bytes is != 0
         {
           if (filterNon8x8LumaEdgesFlag[edge])
           {
diff -r 33318c0be44f ldecod/src/loop_filter_mbaff.c
--- a/ldecod/src/loop_filter_mbaff.c	Thu Jul 11 13:28:39 2013 +0300
+++ b/ldecod/src/loop_filter_mbaff.c	Thu Jul 11 14:11:06 2013 +0300
@@ -89,7 +89,7 @@
   short  blkP, blkQ, idx;
   //short  blk_x, blk_x2, blk_y, blk_y2 ;
 
-  int    StrValue;
+  int    StrValue, i;
   short  mb_x, mb_y;
 
   Macroblock *MbP;
@@ -126,7 +126,7 @@
         //printf("idx %d %d %d %d %d\n", idx, pixP.x, pixP.y, pixP.pos_x, pixP.pos_y);
         // Start with Strength=3. or Strength=4 for Mb-edge
         StrValue = (edge == 0) ? 4 : 3;
-        memset(Strength, (byte) StrValue, MB_BLOCK_SIZE * sizeof(byte));
+        for( i = 0; i < BLOCK_SIZE; i ++ ) Strength[i] = StrValue;
       }
       else
       {
@@ -189,7 +189,10 @@
               StrValue = 1;
             }                      
           }
-          *(int*)(Strength+idx) = StrValue * 0x01010101;
+          Strength[idx] = StrValue;
+          Strength[idx + 1] = StrValue;
+          Strength[idx + 2] = StrValue;
+          Strength[idx + 3] = StrValue;
           pixP.y += 4;
           pixP.pos_y += 4;
         }
@@ -295,7 +298,7 @@
   short  blkP, blkQ, idx;
   short  blk_x, blk_x2, blk_y, blk_y2 ;
 
-  int    StrValue;
+  int    StrValue, i;
   int    xQ, yQ = (edge < MB_BLOCK_SIZE ? edge : 1);
   short  mb_x, mb_y;
 
@@ -320,7 +323,10 @@
 
       StrValue = (edge == 0 && (!MbP->mb_field && !MbQ->mb_field)) ? 4 : 3;
       
-      *(int*)(Strength+idx) = StrValue * 0x01010101;
+      Strength[idx] = StrValue;
+      Strength[idx+1] = StrValue;
+      Strength[idx+2] = StrValue;
+      Strength[idx+3] = StrValue;
     }
   }
   else
@@ -333,7 +339,7 @@
     if (MbQ->is_intra_block == TRUE || MbP->is_intra_block == TRUE)
     {      
       StrValue = (edge == 0 && (!MbP->mb_field && !MbQ->mb_field)) ? 4 : 3;
-      memset(Strength, (byte) StrValue, MB_BLOCK_SIZE * sizeof(byte));
+      for( i = 0; i < BLOCK_SIZE; i ++ ) Strength[i] = StrValue;
     }
     else
     {
@@ -412,7 +418,10 @@
             }
           }
         }
-        *(int*)(Strength+idx) = StrValue * 0x01010101;
+        Strength[idx] = StrValue;
+        Strength[idx + 1] = StrValue;
+        Strength[idx + 2] = StrValue;
+        Strength[idx + 3] = StrValue;
       }
     }
   }
diff -r 33318c0be44f ldecod/src/loop_filter_normal.c
--- a/ldecod/src/loop_filter_normal.c	Thu Jul 11 13:28:39 2013 +0300
+++ b/ldecod/src/loop_filter_normal.c	Thu Jul 11 14:11:06 2013 +0300
@@ -94,14 +94,14 @@
 {
   byte *Strength = MbQ->strength_ver[edge];
   Slice *currSlice = MbQ->p_Slice;
-  int     StrValue;
+  int     StrValue, i;
   BlockPos *PicPos = MbQ->p_Vid->PicPos;
 
   if ((currSlice->slice_type==SP_SLICE)||(currSlice->slice_type==SI_SLICE) )
   {
     // Set strength to either 3 or 4 regardless of pixel position
     StrValue = (edge == 0) ? 4 : 3;
-    memset(Strength, (byte) StrValue, BLOCK_SIZE * sizeof(byte));
+    for( i = 0; i < BLOCK_SIZE; i ++ ) Strength[i] = StrValue;
   }
   else
   {    
@@ -116,7 +116,7 @@
       {
         if (edge && (currSlice->slice_type == P_SLICE && MbQ->mb_type == PSKIP))
         {
-          memset(Strength, 0, BLOCK_SIZE * sizeof(byte));
+          for( i = 0; i < BLOCK_SIZE; i ++ ) Strength[i] = 0;
         }
         else  if (edge && ((MbQ->mb_type == P16x16)  || (MbQ->mb_type == P16x8)))
         {
@@ -202,14 +202,14 @@
       {
         // Start with Strength=3. or Strength=4 for Mb-edge
         StrValue = (edge == 0) ? 4 : 3;
-        memset(Strength, (byte) StrValue, BLOCK_SIZE * sizeof(byte));
+        for( i = 0; i < BLOCK_SIZE; i ++ ) Strength[i] = StrValue;
       }      
     }
     else
     {
       // Start with Strength=3. or Strength=4 for Mb-edge
       StrValue = (edge == 0) ? 4 : 3;
-      memset(Strength, (byte) StrValue, BLOCK_SIZE * sizeof(byte));
+      for( i = 0; i < BLOCK_SIZE; i ++ ) Strength[i] = StrValue;
     }      
   }
 }
@@ -223,7 +223,7 @@
 static void get_strength_hor(Macroblock *MbQ, int edge, int mvlimit, StorablePicture *p)
 {  
   byte  *Strength = MbQ->strength_hor[edge];
-  int    StrValue;
+  int    StrValue, i;
   Slice *currSlice = MbQ->p_Slice;
   BlockPos *PicPos = MbQ->p_Vid->PicPos;
 
@@ -231,7 +231,7 @@
   {
     // Set strength to either 3 or 4 regardless of pixel position
     StrValue = (edge == 0 && (((p->structure==FRAME)))) ? 4 : 3;
-    memset(Strength, (byte) StrValue, BLOCK_SIZE * sizeof(byte));
+    for( i = 0; i < BLOCK_SIZE; i ++ ) Strength[i] = StrValue;
   }
   else
   {    
@@ -248,7 +248,7 @@
       {       
         if (edge && (currSlice->slice_type == P_SLICE && MbQ->mb_type == PSKIP))
         {
-          memset(Strength, 0, BLOCK_SIZE * sizeof(byte));
+          for( i = 0; i < BLOCK_SIZE; i ++ ) Strength[i] = 0;
         }
         else if (edge && ((MbQ->mb_type == P16x16)  || (MbQ->mb_type == P8x16)))
         {
@@ -264,7 +264,7 @@
             else
               StrValue = 0; // if internal edge of certain types, we already know StrValue should be 0
 
-            *(int*)(Strength + idx) = StrValue;
+            Strength[idx] = StrValue;
           }
         }
         else
@@ -329,7 +329,7 @@
               else
                 StrValue = 1;
             }
-            *(int*)(Strength + idx) = StrValue;
+            Strength[idx] = StrValue;
           }
         }
       }
@@ -337,14 +337,14 @@
       {
         // Start with Strength=3. or Strength=4 for Mb-edge
         StrValue = (edge == 0 && (p->structure == FRAME)) ? 4 : 3;
-        memset(Strength, (byte) StrValue, BLOCK_SIZE * sizeof(byte));
+        for( i = 0; i < BLOCK_SIZE; i ++ ) Strength[i] = StrValue;
       }      
     }
     else
     {
       // Start with Strength=3. or Strength=4 for Mb-edge
       StrValue = (edge == 0 && (p->structure == FRAME)) ? 4 : 3;
-      memset(Strength, (byte) StrValue, BLOCK_SIZE * sizeof(byte));
+      for( i = 0; i < BLOCK_SIZE; i ++ ) Strength[i] = StrValue;
     }      
   }
 }
@@ -983,7 +983,7 @@
       {      
         byte *Strength = MbQ->strength_ver[edge];
 
-        if ((*((int *) Strength))) // only if one of the 16 Strength bytes is != 0
+        if ( Strength[0] != 0 || Strength[1] != 0 || Strength[2] != 0 || Strength[3] != 0 ) // only if one of the first 4 Strength bytes is != 0
         {
           edge_loop_luma_ver( PLANE_Y, imgY, Strength, MbQ, edge << 2);
           edge_loop_luma_ver(PLANE_U, imgUV[0], Strength, MbQ, edge << 2);
@@ -1009,7 +1009,7 @@
       {
         byte *Strength = MbQ->strength_hor[edge];
 
-        if ((*((int *) Strength))) // only if one of the 16 Strength bytes is != 0
+        if (Strength[0]!=0 || Strength[1]!=0 || Strength[2]!=0 || Strength[3]!=0) // only if one of the 16 Strength bytes is != 0
         {
           edge_loop_luma_hor( PLANE_Y, imgY, Strength, MbQ, edge << 2, p) ;
           edge_loop_luma_hor(PLANE_U, imgUV[0], Strength, MbQ, edge << 2, p);
@@ -1039,7 +1039,7 @@
       {      
         byte *Strength = MbQ->strength_ver[edge];
 
-        if ((*((int *) Strength))) // only if one of the 16 Strength bytes is != 0
+        if ( Strength[0] != 0 || Strength[1] != 0 || Strength[2] != 0 || Strength[3] != 0 ) // only if one of the first 4 Strength bytes is != 0
         {              
           edge_loop_luma_ver( PLANE_Y, imgY, Strength, MbQ, edge << 2);
           edge_loop_luma_ver(PLANE_U, imgUV[0], Strength, MbQ, edge << 2);
@@ -1067,7 +1067,7 @@
       {
         byte *Strength = MbQ->strength_hor[edge];
 
-        if ((*((int *) Strength))) // only if one of the 16 Strength bytes is != 0
+        if ( Strength[0] != 0 || Strength[1] != 0 || Strength[2] != 0 || Strength[3] != 0 ) // only if one of the first 4 Strength bytes is != 0
         {
           edge_loop_luma_hor( PLANE_Y, imgY, Strength, MbQ, edge << 2, p) ;          
           edge_loop_luma_hor(PLANE_U, imgUV[0], Strength, MbQ, edge << 2, p);
@@ -1117,7 +1117,7 @@
       {      
         byte *Strength = MbQ->strength_ver[edge];
 
-        if ((*((int *) Strength))) // only if one of the 16 Strength bytes is != 0
+        if ( Strength[0] != 0 || Strength[1] != 0 || Strength[2] != 0 || Strength[3] != 0 ) // only if one of the first 4 Strength bytes is != 0
         {
           edge_loop_luma_ver( PLANE_Y, imgY, Strength, MbQ, edge << 2);
 
@@ -1141,7 +1141,7 @@
       {
         byte *Strength = MbQ->strength_hor[edge];
 
-        if ((*((int *) Strength))) // only if one of the 16 Strength bytes is != 0
+        if ( Strength[0] != 0 || Strength[1] != 0 || Strength[2] != 0 || Strength[3] != 0 ) // only if one of the first 4 Strength bytes is != 0
         {
           edge_loop_luma_hor( PLANE_Y, imgY, Strength, MbQ, edge << 2, p) ;
 
@@ -1167,7 +1167,7 @@
       {      
         byte *Strength = MbQ->strength_ver[0];
 
-        if ((*((int *) Strength))) // only if one of the 16 Strength bytes is != 0
+        if ( Strength[0] != 0 || Strength[1] != 0 || Strength[2] != 0 || Strength[3] != 0 ) // only if one of the first 4 Strength bytes is != 0
         {
           edge_loop_luma_ver( PLANE_Y, imgY, Strength, MbQ, 0);                
 
@@ -1188,7 +1188,7 @@
       {
         byte *Strength = MbQ->strength_hor[0];
 
-        if ((*((int *) Strength))) // only if one of the 16 Strength bytes is != 0
+        if ( Strength[0] != 0 || Strength[1] != 0 || Strength[2] != 0 || Strength[3] != 0 ) // only if one of the first 4 Strength bytes is != 0
         {
           edge_loop_luma_hor( PLANE_Y, imgY, Strength, MbQ, 0, p) ;
 
@@ -1211,7 +1211,7 @@
       {      
         byte *Strength = MbQ->strength_ver[0];
 
-        if ((*((int *) Strength))) // only if one of the 16 Strength bytes is != 0
+        if ( Strength[0] != 0 || Strength[1] != 0 || Strength[2] != 0 || Strength[3] != 0 ) // only if one of the first 4 Strength bytes is != 0
         {
           edge_loop_luma_ver( PLANE_Y, imgY, Strength, MbQ, 0); 
 
@@ -1233,7 +1233,7 @@
         {
           byte *Strength = MbQ->strength_hor[edge];
 
-          if ((*((int *) Strength))) // only if one of the 16 Strength bytes is != 0
+          if ( Strength[0] != 0 || Strength[1] != 0 || Strength[2] != 0 || Strength[3] != 0 ) // only if one of the first 4 Strength bytes is != 0
           {
             edge_loop_luma_hor( PLANE_Y, imgY, Strength, MbQ, edge << 2, p) ;
 
@@ -1260,7 +1260,7 @@
         {      
           byte *Strength = MbQ->strength_ver[edge];
 
-          if ((*((int *) Strength))) // only if one of the 16 Strength bytes is != 0
+          if ( Strength[0] != 0 || Strength[1] != 0 || Strength[2] != 0 || Strength[3] != 0 ) // only if one of the first 4 Strength bytes is != 0
           {
             edge_loop_luma_ver( PLANE_Y, imgY, Strength, MbQ, edge << 2);                
 
@@ -1282,7 +1282,7 @@
       {
         byte *Strength = MbQ->strength_hor[0];
 
-        if ((*((int *) Strength))) // only if one of the 16 Strength bytes is != 0
+        if ( Strength[0] != 0 || Strength[1] != 0 || Strength[2] != 0 || Strength[3] != 0 ) // only if one of the first 4 Strength bytes is != 0
         {
           edge_loop_luma_hor( PLANE_Y, imgY, Strength, MbQ, 0, p) ;
 
@@ -1307,7 +1307,7 @@
         {      
           byte *Strength = MbQ->strength_ver[edge];
 
-          if ((*((int *) Strength))) // only if one of the 16 Strength bytes is != 0
+          if ( Strength[0] != 0 || Strength[1] != 0 || Strength[2] != 0 || Strength[3] != 0 ) // only if one of the first 4 Strength bytes is != 0
           {
             edge_loop_luma_ver( PLANE_Y, imgY, Strength, MbQ, edge << 2);                
 
@@ -1331,7 +1331,7 @@
         {
           byte *Strength = MbQ->strength_hor[edge];
 
-          if ((*((int *) Strength))) // only if one of the 16 Strength bytes is != 0
+          if ( Strength[0] != 0 || Strength[1] != 0 || Strength[2] != 0 || Strength[3] != 0 ) // only if one of the first 4 Strength bytes is != 0
           {
             edge_loop_luma_hor( PLANE_Y, imgY, Strength, MbQ, edge << 2, p) ;
 
@@ -1358,7 +1358,7 @@
         {      
           byte *Strength = MbQ->strength_ver[edge];
 
-          if ((*((int *) Strength))) // only if one of the 16 Strength bytes is != 0
+          if ( Strength[0] != 0 || Strength[1] != 0 || Strength[2] != 0 || Strength[3] != 0 ) // only if one of the first 4 Strength bytes is != 0
           {
             edge_loop_luma_ver( PLANE_Y, imgY, Strength, MbQ, edge << 2);                
 
@@ -1382,7 +1382,7 @@
         {
           byte *Strength = MbQ->strength_hor[edge];
 
-          if ((*((int *) Strength))) // only if one of the 16 Strength bytes is != 0
+          if ( Strength[0] != 0 || Strength[1] != 0 || Strength[2] != 0 || Strength[3] != 0 ) // only if one of the first 4 Strength bytes is != 0
           {
             edge_loop_luma_hor( PLANE_Y, imgY, Strength, MbQ, edge << 2, p) ;
 
loopfilter.diff (16,097 bytes)   

Karsten Suehring

2013-09-02 21:04

administrator   ~0000592

I'm inclined to apply the patch to prefer platform independence over coding efficiency.

The patch seems to be for the decoder only. I guess the same should be done for the encoder.

(ideally the deblocking filter code should be there only once, but that's a different issue)

Karsten Suehring

2013-09-02 21:22

administrator   ~0000593

I applied the patch. It seems there was only one related 64-bit cast in the encoder.

Issue History

Date Modified Username Field Change
2013-07-11 13:33 Dag Ågren New Issue
2013-07-11 13:33 Dag Ågren File Added: loopfilter.diff
2013-09-02 21:04 Karsten Suehring Note Added: 0000592
2013-09-02 21:04 Karsten Suehring Assigned To => Karsten Suehring
2013-09-02 21:04 Karsten Suehring Status new => acknowledged
2013-09-02 21:22 Karsten Suehring Note Added: 0000593
2013-09-02 21:22 Karsten Suehring Status acknowledged => resolved
2013-09-02 21:22 Karsten Suehring Fixed in Version => JM 18.6
2013-09-02 21:22 Karsten Suehring Resolution open => fixed