FreeSWITCH RaspberryPi GCC Compiler Bug

Page content

TL;DR:

VOIP calls to a Raspberry Pi failed where on all other architectures(amd64, i386) worked. GCC at fault, specifically the RTL definition for ARM CPU.

Filed against GCC as bug 61509.

Filed against Raspbian as a bug 1330270

Intro

The github repo that contains all of the output files referenced can be located here: https://github.com/quentusrex/FS-6495 and the FreeSWITCH repo can be cloned from: git://git.freeswitch.org/freeswitch.git

During a FreeSWITCH community conference call, two bugs were brought up for the developers to look at: FS-6494 and FS-6524. Brian and Tony described the issue as replicated on a Raspberry Pi in Brian’s lab, and that from the sip trace and a review of the code, this failure should not be happening. The call scenario was simply to make a call to FreeSWITCH running on the Raspberry Pi, and the call would fail to negotiate media due to no codec matches found even though there was a matching codec in the SIP SDP of the INVITE.

During the next 30-45 minutes, Travis and I dug into gdb and started to trace the ARM assembly instructions to track down the source of the issue, while Mike was able to generate a patch that worked around the problem. Also during the call it was determined that upgrading to gcc-4.7 resolved the bug without a patch, or at least worked around the symptom we were using to detect the bug.

Triage of bug positive and negative case and narrowing search scope

I’ve replicated the issue on one of my R-Pi’s, and have been able to replicate the results for both the positive and negative case. Instead of using Mike’s patch which added a function call, I’ve elected to use a patch that just sets max_bit_rate = 32099. The advantage here is that it won’t change the line numbers of the code, nor does it introduce a function call, but it does prevent the bug from replicating directly for the codec_ms variable. Hopefully the difference in assembly and GCC debug will be be small enough to easily determine the incorrect optimization branch. Now, time to dig into the root cause of the issue.

From the Wednesday call, Travis and I were able to determine that there was an improper assignment being made to both the codec_ms and max_bit_rate variables. The variable max_bit_rate was expected to be 64000 around line 3380, but was instead being assigned the value 32000. Since there were several spots in the code where this could be happening, and to get second confirmation, I changed each of the 32000 values to 32001, 32002, etc. Then after 15 minutes waiting for the Raspberry Pi to rebuild the FS source, I was able to get the second confirmation that the assignment at fault is line switch_core_media.c line 3358.

Here is a snippet of the logs with only the codec_ms and max_bit_rate constant values changed to identify which line was incorrectly doing the assignments:

#!console
EXECUTE sofia/internal/1000@192.168.100.124 hash(insert/192.168.100.124-spymap/1000/256dbe82-e548-11e3-bb16-1d9d0f58e5d7)
EXECUTE sofia/internal/1000@192.168.100.124 hash(insert/192.168.100.124-last_dial/1000/3001)
EXECUTE sofia/internal/1000@192.168.100.124 hash(insert/192.168.100.124-last_dial/global/256dbe82-e548-11e3-bb16-1d9d0f58e5d7)
EXECUTE sofia/internal/1000@192.168.100.124 export(RFC2822_DATE=Tue, 27 May 2014 02:39:38 +0000)
2014-05-27 02:39:38.045324 [DEBUG] switch_channel.c:1246 EXPORT (export_vars) [RFC2822_DATE]=[Tue, 27 May 2014 02:39:38 +0000]
EXECUTE sofia/internal/1000@192.168.100.124 answer()
2014-05-27 02:39:38.065276 [DEBUG] switch_core_media.c:3383 Audio Codec Compare [PCMU:0:8000:39:32001]/[PCMU:0:8000:20:64000]
2014-05-27 02:39:38.065276 [DEBUG] switch_core_media.c:3309 Set telephone-event payload to 101
2014-05-27 02:39:38.065276 [DEBUG] switch_core_media.c:3626 Set 2833 dtmf send/recv payload to 101
2014-05-27 02:39:38.065276 [DEBUG] switch_core_session.c:907 Send signal sofia/internal/1000@192.168.100.124 [BREAK]
2014-05-27 02:39:38.065276 [NOTICE] switch_channel.c:3721 Hangup sofia/internal/1000@192.168.100.124 [CS_EXECUTE] [INCOMPATIBLE_DESTINATION]

In the above log, take careful note of the line in the middle with ‘switch_core_media.c:3383 Audio Codec Compare’ because that print is on line 33 of the following snippet. Of key interest is why does map_bit_rate get set to 32001 when rm_encoding is “PCMU” and not “isac”.

#!c
 if (zstr(map->rm_fmtp)) {
     if (!strcasecmp(map->rm_encoding, "ilbc")) {
         codec_ms = 32;
         map_bit_rate = 13332;
     } else if (!strcasecmp(map->rm_encoding, "isac")) {
         codec_ms = 39;
         map_bit_rate = 32001;
     } else { map_bit_rate = 32099; }
 } else {
     if ((switch_core_codec_parse_fmtp(map->rm_encoding, map->rm_fmtp, map->rm_rate, &codec_fmtp)) 
          == SWITCH_STATUS_SUCCESS) {
         if (codec_fmtp.bits_per_second) {
             map_bit_rate = codec_fmtp.bits_per_second;
         }
         if (codec_fmtp.microseconds_per_packet) {
             codec_ms = (codec_fmtp.microseconds_per_packet / 1000);
         }
         if (codec_fmtp.actual_samples_per_second) {
             fmtp_remote_codec_rate = codec_fmtp.actual_samples_per_second;
         }
     }
 }

 for (i = 0; i < smh->mparams->num_codecs && i < total_codecs; i++) {
     const switch_codec_implementation_t *imp = codec_array[i];
     uint32_t bit_rate = imp->bits_per_second;
     uint32_t codec_rate = imp->samples_per_second;

     if (imp->codec_type != SWITCH_CODEC_TYPE_AUDIO) {
         continue;
     }

     switch_log_printf(SWITCH_CHANNEL_SESSION_LOG(session), SWITCH_LOG_DEBUG, 
                       "Audio Codec Compare [%s:%d:%u:%d:%u]/[%s:%d:%u:%d:%u]\n",
                       rm_encoding, map->rm_pt, (int) remote_codec_rate, codec_ms, map_bit_rate,
                       imp->iananame, imp->ianacode, codec_rate, imp->microseconds_per_packet / 1000, bit_rate);

On the Wednesday call, Travis and I found that while GCC did include the debug symbols in the compiled binary, gdb was not able to reliably match the C source to the assembly output. This is because when GCC optimizes the instructions, at least on an ARM CPU, it isn’t able to align the now optimized assembly to the original source. So let’s just jump directly into the assembly and start reversing the instructions.

objdump -d ./src/libfreeswitch_la-switch_core_media.o > /usr/local/src/FS-6495/objdump.orig.txt

Then apply my patch and repeat:

objdump -d ./src/libfreeswitch_la-switch_core_media.o > /usr/local/src/FS-6495/objdump.patched.txt

Useful background knowledge

ARM Assembly

Wikipedia article about ARM architecture

  • mov
  • moveq: Conditional version of mov
  • movne: Conditional version of mov
  • cmp
  • ldr
  • b

Compilation theory

For some additional research into compiler theory used here, you can check out the lectures here: http://nptel.ac.in/courses/106108052/32#

Analyzing the compiled output

When dealing with assembly, since we know that we’re looking for the constant value 32001, it’ll likely be stored in decimal or hex format. Opening the objdump output file and searching for 32001 turned up nothing, other than landing on 2 random matches for memory addresses(if it were to match in ARM assembly it would look like #32001 as an argument for an instruction). So instead let’s search for the hex value which is 7d01. This does turn up something of value as you can see here:

    dc6c:       00000101        .word   0x00000101
    dc70:       000009ec        .word   0x000009ec
    dc74:       000009dc        .word   0x000009dc
    dc78:       000009e8        .word   0x000009e8
    dc7c:       00000e7f        .word   0x00000e7f
    dc80:       00007d01        .word   0x00007d01
    dc84:       000008e0        .word   0x000008e0
    dc88:       00000870        .word   0x00000870
    dc8c:       00000818        .word   0x00000818
    dc90:       00000804        .word   0x00000804
    dc94:       000007f8        .word   0x000007f8

At offset dc80(line 6) we see a word aligned constant value that is our 32001 magic number. Now it’s just a matter of searching(I am using emacs control+s) to locate where this offset is being used as an argument for an instruction. Lucky for us there is only a single reference to dc80, and that is in the middle of the following code block at offset d364 on line 23.

Unpatched ARM ASM:

#!asm
    d30c:       ebfffffe        bl      0 <switch_core_codec_parse_fmtp>
    d310:       e3500000        cmp     r0, #0
    d314:       158d4078        strne   r4, [sp, #120]  ; 0x78
    d318:       1afffe1d        bne     cb94 <switch_core_media_negotiate_sdp+0x2090>
    d31c:       e59d24a4        ldr     r2, [sp, #1188] ; 0x4a4
    d320:       e59d34a8        ldr     r3, [sp, #1192] ; 0x4a8
    d324:       e3520000        cmp     r2, #0
    d328:       e59d4060        ldr     r4, [sp, #96]   ; 0x60
    d32c:       11a04002        movne   r4, r2
    d330:       e3530000        cmp     r3, #0
    d334:       11a01fc3        asrne   r1, r3, #31
    d338:       159f2894        ldrne   r2, [pc, #2196] ; dbd4 <switch_core_media_negotiate_sdp+0x30d0>
    d33c:       e59dc4a0        ldr     ip, [sp, #1184] ; 0x4a0
    d340:       e58d4060        str     r4, [sp, #96]   ; 0x60
    d344:       10c38392        smullne r8, r3, r2, r3
    d348:       e58dc078        str     ip, [sp, #120]  ; 0x78
    d34c:       10611343        rsbne   r1, r1, r3, asr #6
    d350:       158d1068        strne   r1, [sp, #104]  ; 0x68
    d354:       eafffe0e        b       cb94 <switch_core_media_negotiate_sdp+0x2090>
    d358:       e59dc0dc        ldr     ip, [sp, #220]  ; 0xdc
    d35c:       e59d8060        ldr     r8, [sp, #96]   ; 0x60
    d360:       e35c0000        cmp     ip, #0
    d364:       e59f3914        ldr     r3, [pc, #2324] ; dc80 <switch_core_media_negotiate_sdp+0x317c>
    d368:       e59dc068        ldr     ip, [sp, #104]  ; 0x68
    d36c:       e3a04000        mov     r4, #0
    d370:       01a08003        moveq   r8, r3
    d374:       03a0c027        moveq   ip, #39 ; 0x27
    d378:       e58d4078        str     r4, [sp, #120]  ; 0x78
    d37c:       e58d8060        str     r8, [sp, #96]   ; 0x60
    d380:       e58dc068        str     ip, [sp, #104]  ; 0x68
    d384:       eafffe02        b       cb94 <switch_core_media_negotiate_sdp+0x2090>
    d388:       e59d4084        ldr     r4, [sp, #132]  ; 0x84
    d38c:       e58d4068        str     r4, [sp, #104]  ; 0x68
    d390:       eafffdd0        b       cad8 <switch_core_media_negotiate_sdp+0x1fd4>
    d394:       e59f18e8        ldr     r1, [pc, #2280] ; dc84 <switch_core_media_negotiate_sdp+0x3180>
    d398:       e1a02008        mov     r2, r8
    d39c:       e08f1001        add     r1, pc, r1
    d3a0:       e3a03001        mov     r3, #1
    d3a4:       e5960018        ldr     r0, [r6, #24]
    d3a8:       ebfffffe        bl      0 <switch_channel_set_variable_var_check>
    d3ac:       e59a05f8        ldr     r0, [sl, #1528] ; 0x5f8
    d3b0:       ebfffffe        bl      0 <switch_rtp_ready>

Compiling with the else case added produces assembly output that is ‘close’ to what was produced without the corrective patch. In the patched version our magic number is stored at offset dc8c, and also only shows up once in the assembly below at offset d380 on line 25.

#Patched ARM ASM:

#!asm
    d320:       ebfffffe        bl      0 <switch_core_codec_parse_fmtp>
    d324:       e3500000        cmp     r0, #0
    d328:       158d4074        strne   r4, [sp, #116]  ; 0x74
    d32c:       1afffe1b        bne     cba0 <switch_core_media_negotiate_sdp+0x209c>
    d330:       e59d249c        ldr     r2, [sp, #1180] ; 0x49c
    d334:       e59d34a0        ldr     r3, [sp, #1184] ; 0x4a0
    d338:       e3520000        cmp     r2, #0
    d33c:       e59d405c        ldr     r4, [sp, #92]   ; 0x5c
    d340:       11a04002        movne   r4, r2
    d344:       e3530000        cmp     r3, #0
    d348:       11a01fc3        asrne   r1, r3, #31
    d34c:       159f2888        ldrne   r2, [pc, #2184] ; dbdc <switch_core_media_negotiate_sdp+0x30d8>
    d350:       e59dc498        ldr     ip, [sp, #1176] ; 0x498
    d354:       e58d405c        str     r4, [sp, #92]   ; 0x5c
    d358:       10c38392        smullne r8, r3, r2, r3
    d35c:       e58dc074        str     ip, [sp, #116]  ; 0x74
    d360:       10611343        rsbne   r1, r1, r3, asr #6
    d364:       158d1064        strne   r1, [sp, #100]  ; 0x64
    d368:       eafffe0c        b       cba0 <switch_core_media_negotiate_sdp+0x209c>
    d36c:       e59f1914        ldr     r1, [pc, #2324] ; dc88 <switch_core_media_negotiate_sdp+0x3184>
    d370:       e1a00004        mov     r0, r4
    d374:       e08f1001        add     r1, pc, r1
    d378:       ebfffffe        bl      0 <strcasecmp>
    d37c:       e59dc064        ldr     ip, [sp, #100]  ; 0x64
    d380:       e59f3904        ldr     r3, [pc, #2308] ; dc8c <switch_core_media_negotiate_sdp+0x3188>
    d384:       e59f2904        ldr     r2, [pc, #2308] ; dc90 <switch_core_media_negotiate_sdp+0x318c>
    d388:       e3500000        cmp     r0, #0
    d38c:       11a03002        movne   r3, r2
    d390:       13a00000        movne   r0, #0
    d394:       03a0c027        moveq   ip, #39 ; 0x27
    d398:       e58d305c        str     r3, [sp, #92]   ; 0x5c
    d39c:       e58d0074        str     r0, [sp, #116]  ; 0x74
    d3a0:       e58dc064        str     ip, [sp, #100]  ; 0x64
    d3a4:       eafffdfd        b       cba0 <switch_core_media_negotiate_sdp+0x209c>
    d3a8:       e59d4080        ldr     r4, [sp, #128]  ; 0x80
    d3ac:       e58d4064        str     r4, [sp, #100]  ; 0x64
    d3b0:       eafffdcb        b       cae4 <switch_core_media_negotiate_sdp+0x1fe0>
    d3b4:       e59f18d8        ldr     r1, [pc, #2264] ; dc94 <switch_core_media_negotiate_sdp+0x3190>
    d3b8:       e1a02008        mov     r2, r8
    d3bc:       e08f1001        add     r1, pc, r1
    d3c0:       e3a03001        mov     r3, #1
    d3c4:       e5960018        ldr     r0, [r6, #24]
    d3c8:       ebfffffe        bl      0 <switch_channel_set_variable_var_check>
    d3cc:       e59a05f8        ldr     r0, [sl, #1528] ; 0x5f8
    d3d0:       ebfffffe        bl      0 <switch_rtp_ready>

The major differences here are that the unpatched version is missing the strcasecmp operation. Going back and reading the FreeSWITCH C source code, this would make sense if GCC is converting the if() else if() into an if() else statement based on the behavior we were seeing.

Now to collect the output from the GCC compilation optimization and rtl translation phases for later analysis

NOTE: thanks to parcs on irc.freenode.org #gcc for reminding about the -fdump-tree-all-details option

The quick way to collect the GCC debug logs we need is to build FreeSWITCH once, then make a change to the CFLAGS, then rebuild only the one file(switch_core_media.c) we are looking at. I found that adding the debugging causes the already slow compilation on the Raspberry Pi to take about 8x-10x more time.

Now the SWITCH_AM_CFLAGS line is changed from:

SWITCH_AM_CFLAGS = -I/usr/local/src/freeswitch/src/include
-I/usr/local/src/freeswitch/src/include
-I/usr/local/src/freeswitch/libs/libteletone/src -fPIC
-fvisibility=hidden -DSWITCH_API_VISIBILITY=1 -DHAVE_VISIBILITY=1 -g
-ggdb -DHAVE_OPENSSL

To:

SWITCH_AM_CFLAGS = -I/usr/local/src/freeswitch/src/include
-I/usr/local/src/freeswitch/src/include
-I/usr/local/src/freeswitch/libs/libteletone/src -fPIC
-fvisibility=hidden -fdump-tree-all-details -fdump-rtl-expand-details
-fdump-rtl-all -DSWITCH_API_VISIBILITY=1 -DHAVE_VISIBILITY=1 -g -ggdb
-DHAVE_OPENSSL

Then start the build process, and let it at least finish building the switch_core_media.c file. Then move all of the switch_core_media.c.* files into /usr/local/src/FS-6495/gcc_debug_orig/ and add the else case, and start building FS again. Once this completes move all of the switch_core_media.c.* files to /usr/local/src/FS-6495/gcc_debug_patched/

Now we have the output to start figuring how where during the compilation process the logic goes wrong. Unfortunately this is a small, likely one line, needle in a very large haystack. Since it’s highly likely that most of the generated files will be different only by a single number that will likely reference an index of the instructions, we can use the following to see how many lines are different:

$ cd /usr/local/src/FS-6495/gcc_debug_orig/
$ cp ../gcc_debug_patched/* ./
$ git diff --stat
 90 files changed, 997167 insertions(+), 997123 deletions(-)

So with about 2 million lines changed, that’s a decent size haystack. Now, GCC does output a statistics file which will give some useful counts of optimization decisions. So let’s check out what the differences are:

#!console
~/git/FS-6495/gcc_debug_orig$ git diff --word-diff-regex=.
diff --git a/gcc_debug_orig/switch_core_media.c.224t.statistics b/gcc_debug_orig/switch_core_media.c.224t.statistics
index afac8e8..5e91c7c 100644
--- a/gcc_debug_orig/switch_core_media.c.224t.statistics
+++ b/gcc_debug_orig/switch_core_media.c.224t.statistics
@@ -271,14 +271,14 @@
31 copyprop "Statements deleted" "switch_core_media_toggle_hold" 7
33 cddce "Statements deleted" "switch_core_media_toggle_hold" 1
41 release_ssa "SSA names released" "switch_core_media_toggle_hold" 15
27 ccp "Constants propagated" "switch_core_media_negotiate_sdp" 20[-4-]{+6+}
27 ccp "Statements folded" "switch_core_media_negotiate_sdp" 8[-7-]{+8+}
27 ccp "Statements deleted" "switch_core_media_negotiate_sdp" 41
31 copyprop "Copies propagated" "switch_core_media_negotiate_sdp" 4[-5-]{+4+}
31 copyprop "Statements deleted" "switch_core_media_negotiate_sdp" 12[-2-]{+3+}
33 cddce "Statements deleted" "switch_core_media_negotiate_sdp" 18
33 cddce "PHI nodes deleted" "switch_core_media_negotiate_sdp" 5
41 release_ssa "SSA names released" "switch_core_media_negotiate_sdp" 26[-7-]{+8+}
27 ccp "Constants propagated" "switch_core_media_proxy_remote_addr" 41
27 ccp "Statements folded" "switch_core_media_proxy_remote_addr" 17
27 ccp "Statements deleted" "switch_core_media_proxy_remote_addr" 2

The interesting lines are 10, 11, 13, 14, and 17. That at least will narrow down the haystack a bit. Basically that output means that for instance on line 10, the unpatched version had 204 “Constants Propagated” decisions while the patched version had 206 of them. Now, to copy over only the debug files where the decisions where different.

$ cp ../gcc_debug_patched/*release_ssa ./
$ cp ../gcc_debug_patched/*copyprop* ./                                                              
$ cp ../gcc_debug_patched/*.ccp* ./
$ git diff --stat
 gcc_debug_orig/switch_core_media.c.023t.ccp1        |66671 +++++++++++++++++++++++++++++++++++++++++++++++++-------------------------------------------------
 gcc_debug_orig/switch_core_media.c.027t.copyprop1   |42427 +++++++++++++++++++++++++++++++--------------------------------
 gcc_debug_orig/switch_core_media.c.037t.release_ssa | 9999 ++++++++-------
 gcc_debug_orig/switch_core_media.c.059t.ccp2        |59722 ++++++++++++++++++++++++++++++++++++++++++++--------------------------------------------
 gcc_debug_orig/switch_core_media.c.066t.copyprop2   |48317 +++++++++++++++++++++++++++++++++++------------------------------------
 gcc_debug_orig/switch_core_media.c.087t.ccp3        |66913 +++++++++++++++++++++++++++++++++++++++++++++++++--------------------------------------------------
 gcc_debug_orig/switch_core_media.c.088t.copyprop3   |47234 +++++++++++++++++++++++++++++++++++-----------------------------------
 gcc_debug_orig/switch_core_media.c.097t.copyprop4   |50184 +++++++++++++++++++++++++++++++++++++-------------------------------------
 gcc_debug_orig/switch_core_media.c.103t.copyprop5   |48255 ++++++++++++++++++++++++++++++++++++-----------------------------------
 gcc_debug_orig/switch_core_media.c.224t.statistics  |   10 +-

10 files changed, 219893 insertions(+), 219839 deletions(-)

So now the haystack is reduced by ~80%. Of the remaining files the smallest one that we know has changes due to our patch is the release_ssa file. Before digging into the file, we can keep in mind that there are special magic numbers of 32099 and 32001 which can be used to help zoom in on the correct section of code. We can also see from the log output in the patched case that the codec_ms was still set to 39, but the max_bit_rate was set to 32099 instead of 32001. So this should drastically reduce the amount of code we need to look through, since we’re looking for a code block that sets max_bit_rate to 32001 in the unpatched, and 32099 in the patched version, and in both cases sets codec_ms to 39.

After some searching(quick emacs ctrl+s) for 32001 in the original version:

#!console
<bb 213>:
  D.44889_604 = map_128->rm_encoding;
  D.44901_605 = strcasecmp (D.44889_604, "isac");
  if (D.44901_605 == 0)
    goto <bb 214>;
  else
    goto <bb 222>;

<bb 214>:
  # DEBUG codec_ms => 39
  # DEBUG map_bit_rate => 32001
  goto <bb 222>;

<bb 222>:
  # codec_ms_90 = PHI <codec_ms_88(213), codec_ms_88(215), codec_ms_89(220), codec_ms_89(221), 32(212), 39(214)>
  # fmtp_remote_codec_rate_99 = PHI <0(213), 0(215), 0(220), fmtp_remote_codec_rate_619(221), 0(212), 0(214)>
  # map_bit_rate_132 = PHI <map_bit_rate_1480(213), map_bit_rate_1480(215), map_bit_rate_131(220), map_bit_rate_131(221), 13332(212), 32001(214)>
  # DEBUG map_bit_rate => map_bit_rate_132
  # DEBUG fmtp_remote_codec_rate => fmtp_remote_codec_rate_99
  # DEBUG codec_ms => codec_ms_90
  # DEBUG i => 0
  goto <bb 257>;

and 32099 in the patched version we find:

#!console
<bb 213>:
  D.44889_604 = map_128->rm_encoding;
  D.44901_605 = strcasecmp (D.44889_604, "isac");
  if (D.44901_605 == 0)
    goto <bb 222>;
  else
    goto <bb 214>;

<bb 214>:
  # DEBUG map_bit_rate => 32099
  goto <bb 222>;

<bb 222>:
  # codec_ms_90 = PHI <39(213), codec_ms_88(215), codec_ms_89(220), codec_ms_89(221), 32(212), codec_ms_88(214)>
  # fmtp_remote_codec_rate_99 = PHI <0(213), 0(215), 0(220), fmtp_remote_codec_rate_620(221), 0(212), 0(214)>
  # map_bit_rate_132 = PHI <32001(213), map_bit_rate_1481(215), map_bit_rate_131(220), map_bit_rate_131(221), 13332(212), 32099(214)>
  # DEBUG map_bit_rate => map_bit_rate_132
  # DEBUG fmtp_remote_codec_rate => fmtp_remote_codec_rate_99
  # DEBUG codec_ms => codec_ms_90
  # DEBUG i => 0
  goto <bb 257>;

This is important because we now have the internal instruction reference ’D.44901_605’ for the strcasecmp that disappears. We can now use this reference to start tracking down where this condition gets incorrectly mapped. Now something to keep in mind is that since we saw in the assembly output earlier that the strcasecmp was optimized out, we know that we’re looking for where in the original version GCC decides that it no longer needs the condition ’D.44901_605’.

In the instruction scope element we can see that the variable representation for our code block is codec_ms_90 and map_bit_rate_132. With these three additional items we can more effectively grep the ~200,000 changed lines.

Now, taking a closer look at the unpatched version:

#!console
<bb 213>:
  D.44889_604 = map_128->rm_encoding;
  D.44901_605 = strcasecmp (D.44889_604, "isac");
  if (D.44901_605 == 0)
    goto <bb 214>;
  else
    goto <bb 222>;

<bb 214>:
  # DEBUG codec_ms => 39
  # DEBUG map_bit_rate => 32001
  goto <bb 222>;

<bb 222>:
  # codec_ms_90 = PHI <codec_ms_88(213), codec_ms_88(215), codec_ms_89(220), codec_ms_89(221), 32(212), 39(214)>
  # fmtp_remote_codec_rate_99 = PHI <0(213), 0(215), 0(220), fmtp_remote_codec_rate_619(221), 0(212), 0(214)>
  # map_bit_rate_132 = PHI <map_bit_rate_1480(213), map_bit_rate_1480(215), map_bit_rate_131(220), map_bit_rate_131(221), 13332(212), 32001(214)>
  # DEBUG map_bit_rate => map_bit_rate_132
  # DEBUG fmtp_remote_codec_rate => fmtp_remote_codec_rate_99
  # DEBUG codec_ms => codec_ms_90
  # DEBUG i => 0
  goto <bb 257>;

Line 15 shows the codec_ms_90 variable has the value of a PHI, which is GCC’s way of saying in bb 222, the variable codec_ms_90 is one of the values of the PHI, though GCC is unable to narrow down further the actual value. The values in the PHI reference the basic block that the value is from such as ‘39(214)’ means the constant value 39 could be assigned to codec_ms_90 in the basic block #214.

Next time I deal with a bug like this I’ll remember to check the final optimization debug file earlier in the process. After hours of confirming each of the compilation passes, and auditing the ccp, copyprop, release_ssa phases, it turns out that while the section of code moves around a bit, it’s still present in the compilation pass #143 in the final optimized stage. This would have been found earlier by just grepping the files for the presence of ‘44901_605’ in chunks like switch_coremedia.c.10* then switch_core_media.c.11* etc. Since it’s still present in pass #143, less time could have been spent on the previous passes. See here:

<bb 241>:
  D.44901_605 = strcasecmp (D.44889_600, "isac");
  if (D.44901_605 == 0)
    goto <bb 248>;
  else
    goto <bb 242>;

<bb 242>:
  goto <bb 248>;

<bb 248>:
  # codec_ms_90 = PHI <codec_ms_88(242), codec_ms_88(440), codec_ms_89(247), 39(241), 32(240)>
  # fmtp_remote_codec_rate_99 = PHI <0(242), 0(440), fmtp_remote_codec_rate_618(247), 0(241), 0(240)>
  # map_bit_rate_132 = PHI <map_bit_rate_1480(242), map_bit_rate_1480(440), map_bit_rate_131(247), 32001(241), 13332(240)>
  # DEBUG map_bit_rate => map_bit_rate_132
  # DEBUG fmtp_remote_codec_rate => fmtp_remote_codec_rate_99
  # DEBUG codec_ms => codec_ms_90
  # DEBUG i => 0
  # DEBUG i => 0
  # DEBUG nm_idx => nm_idx_1755
  # DEBUG m_idx => m_idx_792
  # DEBUG remote_codec_rate => remote_codec_rate_596
  # DEBUG match => 0
  D.44471_1501 = smh_186->mparams;
  D.44996_1498 = D.44471_1501->num_codecs;
  D.44997_1496 = D.44996_1498 > 0;
  D.44998_1609 = total_codecs_85 > 0;
  D.44999_1152 = D.44998_1609 & D.44997_1496;
  if (D.44999_1152 != 0)
    goto <bb 249>;
  else
    goto <bb 280>;

Digging into the RTL translation stage

Since GCC has finished optimizing the IR(internal representation) logic, now comes the translation to assembly specific for this architecture. In our case, the Raspberry Pi uses an ARM11 family processor which uses the ARMv6 instruction set. For anyone following along in the GCC source, the RTL definition files are located under /usr/local/src/gcc/gcc/config/arm/ assuming you’ve cloned the GCC repo to /usr/local/src/.

While in the earlier stages we were using the basic block number to track the code, the RTL translation stage uses INSN numbers to track the instructions. The first objective is to locate the insn number for the instruction we want to track, based on the last known basic block number and the internal condition identifier.

If you take a look at the file expand144.orig.txt , I’ve cut out the section for the function switch_core_media_negotiate_sdp from the first RTL translation pass.

Searching for D.44901_605 leads us to these insn blocks:


;; Generating RTL for gimple basic block 241

;; D.44901_605 = strcasecmp (D.44889_600, "isac");

(insn 2726 2725 2727 (set (reg:SI 1670)
        (unspec:SI [
                (const:SI (unspec:SI [
                            (symbol_ref/v/f:SI ("*.LC72") [flags 0x82]  <var_decl 0x41ea40c0 *.LC72>)
                            (const:SI (plus:SI (unspec:SI [
                                            (const_int 1050 [0x41a])
                                        ] 21)
                                    (const_int 8 [0x8])))
                        ] 27))
                (const_int 1050 [0x41a])
            ] 29)) src/switch_core_media.c:3356 -1
     (expr_list:REG_EQUAL (symbol_ref/v/f:SI ("*.LC72") [flags 0x82]  <var_decl 0x41ea40c0 *.LC72>)
        (nil)))

(insn 2727 2726 2728 (set (reg:SI 1669)
        (reg:SI 1670)) src/switch_core_media.c:3356 -1
     (expr_list:REG_EQUAL (symbol_ref/v/f:SI ("*.LC72") [flags 0x82]  <var_decl 0x41ea40c0 *.LC72>)
        (nil)))

(insn 2728 2727 2729 (set (reg:SI 0 r0)
        (reg/f:SI 498 [ D.44889 ])) src/switch_core_media.c:3356 -1
     (nil))

(insn 2729 2728 2730 (set (reg:SI 1 r1)
        (reg:SI 1669)) src/switch_core_media.c:3356 -1
     (nil))

(call_insn/i 2730 2729 2731 (parallel [
            (set (reg:SI 0 r0)
                (call (mem:SI (symbol_ref:SI ("strcasecmp") [flags 0x41]  <function_decl 0x40442200 strcasecmp>) [0 S4 A32])
                    (const_int 0 [0])))
            (use (const_int 0 [0]))
            (clobber (reg:SI 14 lr))
        ]) src/switch_core_media.c:3356 -1
     (expr_list:REG_EH_REGION (const_int 0 [0])
        (nil))
    (expr_list:REG_DEP_TRUE (use (reg:SI 1 r1))
        (expr_list:REG_DEP_TRUE (use (reg:SI 0 r0))
            (nil))))

(insn 2731 2730 0 (set (reg:SI 500 [ D.44901 ])
        (reg:SI 0 r0)) src/switch_core_media.c:3356 -1
     (nil))

;; if (D.44901_605 == 0)

(insn 2732 2731 2733 (set (reg:CC 24 cc)
        (compare:CC (reg:SI 500 [ D.44901 ])
            (const_int 0 [0]))) src/switch_core_media.c:3356 -1
     (nil))

(jump_insn 2733 2732 0 (set (pc)
        (if_then_else (eq (reg:CC 24 cc)
                (const_int 0 [0]))
            (label_ref 0)
            (pc))) src/switch_core_media.c:3356 -1
     (expr_list:REG_BR_PROB (const_int 5000 [0x1388])
        (nil)))


Yes, the simple line ‘if (!strcasecmp(map->rm_encoding, “isac”)) {’ is now translated into 8 different insn blocks. Also, take note that the insn is printed as a doubly linked list where the first number is the identifier for this insn, the second is the previous insn and the third is the next one.

Specifically, the call that was missing from the objdump was the call out to strcasecmp, so we can start grepping for 2730. Luckly the insn doesn’t change for the call out to strcasecmp through each of the passes.

We find that in pass 200 the insn is removed. After a quick check of pass 199 to confirm that the insn hasn’t been changed:

(note 2731 2726 2728 261 NOTE_INSN_DELETED)

(insn 2728 2731 7022 261 (set (reg:SI 0 r0)
        (reg/f:SI 4 r4 [orig:498 D.44889 ] [498])) src/switch_core_media.c:3356 625 {*arm_movsi_vfp}
     (expr_list:REG_DEAD (reg/f:SI 4 r4 [orig:498 D.44889 ] [498])
        (nil)))

(insn 7022 2728 7023 261 (set (reg:SI 1 r1)
        (unspec:SI [
                (const:SI (unspec:SI [
                            (symbol_ref/v/f:SI ("*.LC72") [flags 0x82]  <var_decl 0x41ea40c0 *.LC72>)
                            (const:SI (plus:SI (unspec:SI [
                                            (const_int 1050 [0x41a])
                                        ] 21)
                                    (const_int 8 [0x8])))
                        ] 27))
            ] 3)) src/switch_core_media.c:3356 171 {pic_load_addr_32bit}
     (nil))

(insn 7023 7022 2730 261 (set (reg:SI 1 r1)
        (unspec:SI [
                (reg:SI 1 r1)
                (const_int 8 [0x8])
                (const_int 1050 [0x41a])
            ] 4)) src/switch_core_media.c:3356 174 {pic_add_dot_plus_eight}
     (expr_list:REG_EQUAL (symbol_ref/v/f:SI ("*.LC72") [flags 0x82]  <var_decl 0x41ea40c0 *.LC72>)
        (nil)))

(call_insn/i 2730 7023 6540 261 (parallel [
            (set (reg:SI 0 r0)
                (call (mem:SI (symbol_ref:SI ("strcasecmp") [flags 0x41]  <function_decl 0x40442200 strcasecmp>) [0 S4 A32])
                    (const_int 0 [0])))
            (use (const_int 0 [0]))
            (clobber (reg:SI 14 lr))
        ]) src/switch_core_media.c:3356 236 {*call_value_symbol}
     (expr_list:REG_DEAD (reg:SI 1 r1)
        (expr_list:REG_UNUSED (reg:SI 0 r0)
            (expr_list:REG_EH_REGION (const_int 0 [0])
                (nil))))
    (expr_list:REG_DEP_TRUE (use (reg:SI 1 r1))
        (expr_list:REG_DEP_TRUE (use (reg:SI 0 r0))
            (nil))))

(insn 6540 2730 5637 261 (set (reg:SI 12 ip)
        (mem/c:SI (plus:SI (reg/f:SI 13 sp)
                (const_int 220 [0xdc])) [35 %sfp+-1060 S4 A32])) src/switch_core_media.c:3349 625 {*arm_movsi_vfp}
     (nil))

(insn 5637 6540 5638 261 (set (reg:CC 24 cc)
        (compare:CC (reg:SI 12 ip)
            (const_int 0 [0]))) src/switch_core_media.c:3349 199 {*arm_cmpsi_insn}
     (expr_list:REG_DEAD (reg:SI 12 ip)
        (nil)))

There are several instructions that have been removed and reordered between pass 144 and 199. The only direct references to 2730 are in this block of output around line 10603 in the file csa200.orig.txt:

processing block 2 lr out =  4 [r4] 5 [r5] 6 [r6] 7 [r7] 8 [r8] 9 [r9] 10 [sl] 11 [fp] 13 [sp] 14 [lr]
  Adding insn 201 to worklist
  Adding insn 193 to worklist
  Adding insn 192 to worklist
  Adding insn 191 to worklist
  Adding insn 184 to worklist
  Adding insn 186 to worklist
  Adding insn 185 to worklist
  Adding insn 2 to worklist
  Adding insn 7364 to worklist
  Adding insn 7363 to worklist
DCE: Deleting insn 6036
deleting insn with uid = 6036.
DCE: Deleting insn 6044
deleting insn with uid = 6044.
DCE: Deleting insn 6054
deleting insn with uid = 6054.
DCE: Deleting insn 2730
deleting insn with uid = 2730.
DCE: Deleting insn 7023
deleting insn with uid = 7023.
DCE: Deleting insn 7022
deleting insn with uid = 7022.
DCE: Deleting insn 2728
deleting insn with uid = 2728.


try_optimize_cfg iteration 1

Forwarding edge 68->69 to 479 failed.

Of note, pass #200 in this case is a DCE pass, which is short for Dead Code Elimination, which would produce the bug we started with if for some reason GCC didn’t see any changes between the two back to back strcasecmp operations.

Conclusion

While we’ve identified that GCC is in fact removing the second strcasecmp, and we’ve located where this is happening, digging futher into the why would require rebuilding the gcc-4.6 package with debugging patches. That will have to wait for a future post.

I’ve filed the bug upstream with GCC as 61509.