FreeSWITCH RaspberryPi GCC Compiler Bug
TL;DR:
VOIP calls to a Raspberry Pi failed where on all other architectures(amd64, i386) worked. GCC at fault, specifically the RTL definition for ARM CPU.
Filed against GCC as bug 61509.
Filed against Raspbian as a bug 1330270
Intro
The github repo that contains all of the output files referenced can be located here: https://github.com/quentusrex/FS-6495 and the FreeSWITCH repo can be cloned from: git://git.freeswitch.org/freeswitch.git
During a FreeSWITCH community conference call, two bugs were brought up for the developers to look at: FS-6494 and FS-6524. Brian and Tony described the issue as replicated on a Raspberry Pi in Brian’s lab, and that from the sip trace and a review of the code, this failure should not be happening. The call scenario was simply to make a call to FreeSWITCH running on the Raspberry Pi, and the call would fail to negotiate media due to no codec matches found even though there was a matching codec in the SIP SDP of the INVITE.
During the next 30-45 minutes, Travis and I dug into gdb and started to trace the ARM assembly instructions to track down the source of the issue, while Mike was able to generate a patch that worked around the problem. Also during the call it was determined that upgrading to gcc-4.7 resolved the bug without a patch, or at least worked around the symptom we were using to detect the bug.
Triage of bug positive and negative case and narrowing search scope
I’ve replicated the issue on one of my R-Pi’s, and have been able to replicate the results for both the positive and negative case. Instead of using Mike’s patch which added a function call, I’ve elected to use a patch that just sets max_bit_rate = 32099. The advantage here is that it won’t change the line numbers of the code, nor does it introduce a function call, but it does prevent the bug from replicating directly for the codec_ms variable. Hopefully the difference in assembly and GCC debug will be be small enough to easily determine the incorrect optimization branch. Now, time to dig into the root cause of the issue.
From the Wednesday call, Travis and I were able to determine that there was an improper assignment being made to both the codec_ms and max_bit_rate variables. The variable max_bit_rate was expected to be 64000 around line 3380, but was instead being assigned the value 32000. Since there were several spots in the code where this could be happening, and to get second confirmation, I changed each of the 32000 values to 32001, 32002, etc. Then after 15 minutes waiting for the Raspberry Pi to rebuild the FS source, I was able to get the second confirmation that the assignment at fault is line switch_core_media.c line 3358.
Here is a snippet of the logs with only the codec_ms and max_bit_rate constant values changed to identify which line was incorrectly doing the assignments:
#!console
EXECUTE sofia/internal/1000@192.168.100.124 hash(insert/192.168.100.124-spymap/1000/256dbe82-e548-11e3-bb16-1d9d0f58e5d7)
EXECUTE sofia/internal/1000@192.168.100.124 hash(insert/192.168.100.124-last_dial/1000/3001)
EXECUTE sofia/internal/1000@192.168.100.124 hash(insert/192.168.100.124-last_dial/global/256dbe82-e548-11e3-bb16-1d9d0f58e5d7)
EXECUTE sofia/internal/1000@192.168.100.124 export(RFC2822_DATE=Tue, 27 May 2014 02:39:38 +0000)
2014-05-27 02:39:38.045324 [DEBUG] switch_channel.c:1246 EXPORT (export_vars) [RFC2822_DATE]=[Tue, 27 May 2014 02:39:38 +0000]
EXECUTE sofia/internal/1000@192.168.100.124 answer()
2014-05-27 02:39:38.065276 [DEBUG] switch_core_media.c:3383 Audio Codec Compare [PCMU:0:8000:39:32001]/[PCMU:0:8000:20:64000]
2014-05-27 02:39:38.065276 [DEBUG] switch_core_media.c:3309 Set telephone-event payload to 101
2014-05-27 02:39:38.065276 [DEBUG] switch_core_media.c:3626 Set 2833 dtmf send/recv payload to 101
2014-05-27 02:39:38.065276 [DEBUG] switch_core_session.c:907 Send signal sofia/internal/1000@192.168.100.124 [BREAK]
2014-05-27 02:39:38.065276 [NOTICE] switch_channel.c:3721 Hangup sofia/internal/1000@192.168.100.124 [CS_EXECUTE] [INCOMPATIBLE_DESTINATION]
In the above log, take careful note of the line in the middle with ‘switch_core_media.c:3383 Audio Codec Compare’ because that print is on line 33 of the following snippet. Of key interest is why does map_bit_rate get set to 32001 when rm_encoding is “PCMU” and not “isac”.
#!c
if (zstr(map->rm_fmtp)) {
if (!strcasecmp(map->rm_encoding, "ilbc")) {
codec_ms = 32;
map_bit_rate = 13332;
} else if (!strcasecmp(map->rm_encoding, "isac")) {
codec_ms = 39;
map_bit_rate = 32001;
} else { map_bit_rate = 32099; }
} else {
if ((switch_core_codec_parse_fmtp(map->rm_encoding, map->rm_fmtp, map->rm_rate, &codec_fmtp))
== SWITCH_STATUS_SUCCESS) {
if (codec_fmtp.bits_per_second) {
map_bit_rate = codec_fmtp.bits_per_second;
}
if (codec_fmtp.microseconds_per_packet) {
codec_ms = (codec_fmtp.microseconds_per_packet / 1000);
}
if (codec_fmtp.actual_samples_per_second) {
fmtp_remote_codec_rate = codec_fmtp.actual_samples_per_second;
}
}
}
for (i = 0; i < smh->mparams->num_codecs && i < total_codecs; i++) {
const switch_codec_implementation_t *imp = codec_array[i];
uint32_t bit_rate = imp->bits_per_second;
uint32_t codec_rate = imp->samples_per_second;
if (imp->codec_type != SWITCH_CODEC_TYPE_AUDIO) {
continue;
}
switch_log_printf(SWITCH_CHANNEL_SESSION_LOG(session), SWITCH_LOG_DEBUG,
"Audio Codec Compare [%s:%d:%u:%d:%u]/[%s:%d:%u:%d:%u]\n",
rm_encoding, map->rm_pt, (int) remote_codec_rate, codec_ms, map_bit_rate,
imp->iananame, imp->ianacode, codec_rate, imp->microseconds_per_packet / 1000, bit_rate);
On the Wednesday call, Travis and I found that while GCC did include the debug symbols in the compiled binary, gdb was not able to reliably match the C source to the assembly output. This is because when GCC optimizes the instructions, at least on an ARM CPU, it isn’t able to align the now optimized assembly to the original source. So let’s just jump directly into the assembly and start reversing the instructions.
objdump -d ./src/libfreeswitch_la-switch_core_media.o > /usr/local/src/FS-6495/objdump.orig.txt
Then apply my patch and repeat:
objdump -d ./src/libfreeswitch_la-switch_core_media.o > /usr/local/src/FS-6495/objdump.patched.txt
Useful background knowledge
ARM Assembly
Wikipedia article about ARM architecture
Compilation theory
For some additional research into compiler theory used here, you can check out the lectures here: http://nptel.ac.in/courses/106108052/32#
Analyzing the compiled output
When dealing with assembly, since we know that we’re looking for the constant value 32001, it’ll likely be stored in decimal or hex format. Opening the objdump output file and searching for 32001 turned up nothing, other than landing on 2 random matches for memory addresses(if it were to match in ARM assembly it would look like #32001 as an argument for an instruction). So instead let’s search for the hex value which is 7d01. This does turn up something of value as you can see here:
dc6c: 00000101 .word 0x00000101
dc70: 000009ec .word 0x000009ec
dc74: 000009dc .word 0x000009dc
dc78: 000009e8 .word 0x000009e8
dc7c: 00000e7f .word 0x00000e7f
dc80: 00007d01 .word 0x00007d01
dc84: 000008e0 .word 0x000008e0
dc88: 00000870 .word 0x00000870
dc8c: 00000818 .word 0x00000818
dc90: 00000804 .word 0x00000804
dc94: 000007f8 .word 0x000007f8
At offset dc80(line 6) we see a word aligned constant value that is our 32001 magic number. Now it’s just a matter of searching(I am using emacs control+s) to locate where this offset is being used as an argument for an instruction. Lucky for us there is only a single reference to dc80, and that is in the middle of the following code block at offset d364 on line 23.
Unpatched ARM ASM:
#!asm
d30c: ebfffffe bl 0 <switch_core_codec_parse_fmtp>
d310: e3500000 cmp r0, #0
d314: 158d4078 strne r4, [sp, #120] ; 0x78
d318: 1afffe1d bne cb94 <switch_core_media_negotiate_sdp+0x2090>
d31c: e59d24a4 ldr r2, [sp, #1188] ; 0x4a4
d320: e59d34a8 ldr r3, [sp, #1192] ; 0x4a8
d324: e3520000 cmp r2, #0
d328: e59d4060 ldr r4, [sp, #96] ; 0x60
d32c: 11a04002 movne r4, r2
d330: e3530000 cmp r3, #0
d334: 11a01fc3 asrne r1, r3, #31
d338: 159f2894 ldrne r2, [pc, #2196] ; dbd4 <switch_core_media_negotiate_sdp+0x30d0>
d33c: e59dc4a0 ldr ip, [sp, #1184] ; 0x4a0
d340: e58d4060 str r4, [sp, #96] ; 0x60
d344: 10c38392 smullne r8, r3, r2, r3
d348: e58dc078 str ip, [sp, #120] ; 0x78
d34c: 10611343 rsbne r1, r1, r3, asr #6
d350: 158d1068 strne r1, [sp, #104] ; 0x68
d354: eafffe0e b cb94 <switch_core_media_negotiate_sdp+0x2090>
d358: e59dc0dc ldr ip, [sp, #220] ; 0xdc
d35c: e59d8060 ldr r8, [sp, #96] ; 0x60
d360: e35c0000 cmp ip, #0
d364: e59f3914 ldr r3, [pc, #2324] ; dc80 <switch_core_media_negotiate_sdp+0x317c>
d368: e59dc068 ldr ip, [sp, #104] ; 0x68
d36c: e3a04000 mov r4, #0
d370: 01a08003 moveq r8, r3
d374: 03a0c027 moveq ip, #39 ; 0x27
d378: e58d4078 str r4, [sp, #120] ; 0x78
d37c: e58d8060 str r8, [sp, #96] ; 0x60
d380: e58dc068 str ip, [sp, #104] ; 0x68
d384: eafffe02 b cb94 <switch_core_media_negotiate_sdp+0x2090>
d388: e59d4084 ldr r4, [sp, #132] ; 0x84
d38c: e58d4068 str r4, [sp, #104] ; 0x68
d390: eafffdd0 b cad8 <switch_core_media_negotiate_sdp+0x1fd4>
d394: e59f18e8 ldr r1, [pc, #2280] ; dc84 <switch_core_media_negotiate_sdp+0x3180>
d398: e1a02008 mov r2, r8
d39c: e08f1001 add r1, pc, r1
d3a0: e3a03001 mov r3, #1
d3a4: e5960018 ldr r0, [r6, #24]
d3a8: ebfffffe bl 0 <switch_channel_set_variable_var_check>
d3ac: e59a05f8 ldr r0, [sl, #1528] ; 0x5f8
d3b0: ebfffffe bl 0 <switch_rtp_ready>
Compiling with the else case added produces assembly output that is ‘close’ to what was produced without the corrective patch. In the patched version our magic number is stored at offset dc8c, and also only shows up once in the assembly below at offset d380 on line 25.
#Patched ARM ASM:
#!asm
d320: ebfffffe bl 0 <switch_core_codec_parse_fmtp>
d324: e3500000 cmp r0, #0
d328: 158d4074 strne r4, [sp, #116] ; 0x74
d32c: 1afffe1b bne cba0 <switch_core_media_negotiate_sdp+0x209c>
d330: e59d249c ldr r2, [sp, #1180] ; 0x49c
d334: e59d34a0 ldr r3, [sp, #1184] ; 0x4a0
d338: e3520000 cmp r2, #0
d33c: e59d405c ldr r4, [sp, #92] ; 0x5c
d340: 11a04002 movne r4, r2
d344: e3530000 cmp r3, #0
d348: 11a01fc3 asrne r1, r3, #31
d34c: 159f2888 ldrne r2, [pc, #2184] ; dbdc <switch_core_media_negotiate_sdp+0x30d8>
d350: e59dc498 ldr ip, [sp, #1176] ; 0x498
d354: e58d405c str r4, [sp, #92] ; 0x5c
d358: 10c38392 smullne r8, r3, r2, r3
d35c: e58dc074 str ip, [sp, #116] ; 0x74
d360: 10611343 rsbne r1, r1, r3, asr #6
d364: 158d1064 strne r1, [sp, #100] ; 0x64
d368: eafffe0c b cba0 <switch_core_media_negotiate_sdp+0x209c>
d36c: e59f1914 ldr r1, [pc, #2324] ; dc88 <switch_core_media_negotiate_sdp+0x3184>
d370: e1a00004 mov r0, r4
d374: e08f1001 add r1, pc, r1
d378: ebfffffe bl 0 <strcasecmp>
d37c: e59dc064 ldr ip, [sp, #100] ; 0x64
d380: e59f3904 ldr r3, [pc, #2308] ; dc8c <switch_core_media_negotiate_sdp+0x3188>
d384: e59f2904 ldr r2, [pc, #2308] ; dc90 <switch_core_media_negotiate_sdp+0x318c>
d388: e3500000 cmp r0, #0
d38c: 11a03002 movne r3, r2
d390: 13a00000 movne r0, #0
d394: 03a0c027 moveq ip, #39 ; 0x27
d398: e58d305c str r3, [sp, #92] ; 0x5c
d39c: e58d0074 str r0, [sp, #116] ; 0x74
d3a0: e58dc064 str ip, [sp, #100] ; 0x64
d3a4: eafffdfd b cba0 <switch_core_media_negotiate_sdp+0x209c>
d3a8: e59d4080 ldr r4, [sp, #128] ; 0x80
d3ac: e58d4064 str r4, [sp, #100] ; 0x64
d3b0: eafffdcb b cae4 <switch_core_media_negotiate_sdp+0x1fe0>
d3b4: e59f18d8 ldr r1, [pc, #2264] ; dc94 <switch_core_media_negotiate_sdp+0x3190>
d3b8: e1a02008 mov r2, r8
d3bc: e08f1001 add r1, pc, r1
d3c0: e3a03001 mov r3, #1
d3c4: e5960018 ldr r0, [r6, #24]
d3c8: ebfffffe bl 0 <switch_channel_set_variable_var_check>
d3cc: e59a05f8 ldr r0, [sl, #1528] ; 0x5f8
d3d0: ebfffffe bl 0 <switch_rtp_ready>
The major differences here are that the unpatched version is missing the strcasecmp operation. Going back and reading the FreeSWITCH C source code, this would make sense if GCC is converting the if() else if() into an if() else statement based on the behavior we were seeing.
Now to collect the output from the GCC compilation optimization and rtl translation phases for later analysis
NOTE: thanks to parcs on irc.freenode.org #gcc for reminding about the -fdump-tree-all-details option
The quick way to collect the GCC debug logs we need is to build FreeSWITCH once, then make a change to the CFLAGS, then rebuild only the one file(switch_core_media.c) we are looking at. I found that adding the debugging causes the already slow compilation on the Raspberry Pi to take about 8x-10x more time.
Now the SWITCH_AM_CFLAGS line is changed from:
SWITCH_AM_CFLAGS = -I/usr/local/src/freeswitch/src/include
-I/usr/local/src/freeswitch/src/include
-I/usr/local/src/freeswitch/libs/libteletone/src -fPIC
-fvisibility=hidden -DSWITCH_API_VISIBILITY=1 -DHAVE_VISIBILITY=1 -g
-ggdb -DHAVE_OPENSSL
To:
SWITCH_AM_CFLAGS = -I/usr/local/src/freeswitch/src/include
-I/usr/local/src/freeswitch/src/include
-I/usr/local/src/freeswitch/libs/libteletone/src -fPIC
-fvisibility=hidden -fdump-tree-all-details -fdump-rtl-expand-details
-fdump-rtl-all -DSWITCH_API_VISIBILITY=1 -DHAVE_VISIBILITY=1 -g -ggdb
-DHAVE_OPENSSL
Then start the build process, and let it at least finish building the switch_core_media.c file. Then move all of the switch_core_media.c.* files into /usr/local/src/FS-6495/gcc_debug_orig/ and add the else case, and start building FS again. Once this completes move all of the switch_core_media.c.* files to /usr/local/src/FS-6495/gcc_debug_patched/
Now we have the output to start figuring how where during the compilation process the logic goes wrong. Unfortunately this is a small, likely one line, needle in a very large haystack. Since it’s highly likely that most of the generated files will be different only by a single number that will likely reference an index of the instructions, we can use the following to see how many lines are different:
$ cd /usr/local/src/FS-6495/gcc_debug_orig/
$ cp ../gcc_debug_patched/* ./
$ git diff --stat
90 files changed, 997167 insertions(+), 997123 deletions(-)
So with about 2 million lines changed, that’s a decent size haystack. Now, GCC does output a statistics file which will give some useful counts of optimization decisions. So let’s check out what the differences are:
#!console
~/git/FS-6495/gcc_debug_orig$ git diff --word-diff-regex=.
diff --git a/gcc_debug_orig/switch_core_media.c.224t.statistics b/gcc_debug_orig/switch_core_media.c.224t.statistics
index afac8e8..5e91c7c 100644
--- a/gcc_debug_orig/switch_core_media.c.224t.statistics
+++ b/gcc_debug_orig/switch_core_media.c.224t.statistics
@@ -271,14 +271,14 @@
31 copyprop "Statements deleted" "switch_core_media_toggle_hold" 7
33 cddce "Statements deleted" "switch_core_media_toggle_hold" 1
41 release_ssa "SSA names released" "switch_core_media_toggle_hold" 15
27 ccp "Constants propagated" "switch_core_media_negotiate_sdp" 20[-4-]{+6+}
27 ccp "Statements folded" "switch_core_media_negotiate_sdp" 8[-7-]{+8+}
27 ccp "Statements deleted" "switch_core_media_negotiate_sdp" 41
31 copyprop "Copies propagated" "switch_core_media_negotiate_sdp" 4[-5-]{+4+}
31 copyprop "Statements deleted" "switch_core_media_negotiate_sdp" 12[-2-]{+3+}
33 cddce "Statements deleted" "switch_core_media_negotiate_sdp" 18
33 cddce "PHI nodes deleted" "switch_core_media_negotiate_sdp" 5
41 release_ssa "SSA names released" "switch_core_media_negotiate_sdp" 26[-7-]{+8+}
27 ccp "Constants propagated" "switch_core_media_proxy_remote_addr" 41
27 ccp "Statements folded" "switch_core_media_proxy_remote_addr" 17
27 ccp "Statements deleted" "switch_core_media_proxy_remote_addr" 2
The interesting lines are 10, 11, 13, 14, and 17. That at least will narrow down the haystack a bit. Basically that output means that for instance on line 10, the unpatched version had 204 “Constants Propagated” decisions while the patched version had 206 of them. Now, to copy over only the debug files where the decisions where different.
$ cp ../gcc_debug_patched/*release_ssa ./
$ cp ../gcc_debug_patched/*copyprop* ./
$ cp ../gcc_debug_patched/*.ccp* ./
$ git diff --stat
gcc_debug_orig/switch_core_media.c.023t.ccp1 |66671 +++++++++++++++++++++++++++++++++++++++++++++++++-------------------------------------------------
gcc_debug_orig/switch_core_media.c.027t.copyprop1 |42427 +++++++++++++++++++++++++++++++--------------------------------
gcc_debug_orig/switch_core_media.c.037t.release_ssa | 9999 ++++++++-------
gcc_debug_orig/switch_core_media.c.059t.ccp2 |59722 ++++++++++++++++++++++++++++++++++++++++++++--------------------------------------------
gcc_debug_orig/switch_core_media.c.066t.copyprop2 |48317 +++++++++++++++++++++++++++++++++++------------------------------------
gcc_debug_orig/switch_core_media.c.087t.ccp3 |66913 +++++++++++++++++++++++++++++++++++++++++++++++++--------------------------------------------------
gcc_debug_orig/switch_core_media.c.088t.copyprop3 |47234 +++++++++++++++++++++++++++++++++++-----------------------------------
gcc_debug_orig/switch_core_media.c.097t.copyprop4 |50184 +++++++++++++++++++++++++++++++++++++-------------------------------------
gcc_debug_orig/switch_core_media.c.103t.copyprop5 |48255 ++++++++++++++++++++++++++++++++++++-----------------------------------
gcc_debug_orig/switch_core_media.c.224t.statistics | 10 +-
10 files changed, 219893 insertions(+), 219839 deletions(-)
So now the haystack is reduced by ~80%. Of the remaining files the smallest one that we know has changes due to our patch is the release_ssa file. Before digging into the file, we can keep in mind that there are special magic numbers of 32099 and 32001 which can be used to help zoom in on the correct section of code. We can also see from the log output in the patched case that the codec_ms was still set to 39, but the max_bit_rate was set to 32099 instead of 32001. So this should drastically reduce the amount of code we need to look through, since we’re looking for a code block that sets max_bit_rate to 32001 in the unpatched, and 32099 in the patched version, and in both cases sets codec_ms to 39.
After some searching(quick emacs ctrl+s) for 32001 in the original version:
#!console
<bb 213>:
D.44889_604 = map_128->rm_encoding;
D.44901_605 = strcasecmp (D.44889_604, "isac");
if (D.44901_605 == 0)
goto <bb 214>;
else
goto <bb 222>;
<bb 214>:
# DEBUG codec_ms => 39
# DEBUG map_bit_rate => 32001
goto <bb 222>;
<bb 222>:
# codec_ms_90 = PHI <codec_ms_88(213), codec_ms_88(215), codec_ms_89(220), codec_ms_89(221), 32(212), 39(214)>
# fmtp_remote_codec_rate_99 = PHI <0(213), 0(215), 0(220), fmtp_remote_codec_rate_619(221), 0(212), 0(214)>
# map_bit_rate_132 = PHI <map_bit_rate_1480(213), map_bit_rate_1480(215), map_bit_rate_131(220), map_bit_rate_131(221), 13332(212), 32001(214)>
# DEBUG map_bit_rate => map_bit_rate_132
# DEBUG fmtp_remote_codec_rate => fmtp_remote_codec_rate_99
# DEBUG codec_ms => codec_ms_90
# DEBUG i => 0
goto <bb 257>;
and 32099 in the patched version we find:
#!console
<bb 213>:
D.44889_604 = map_128->rm_encoding;
D.44901_605 = strcasecmp (D.44889_604, "isac");
if (D.44901_605 == 0)
goto <bb 222>;
else
goto <bb 214>;
<bb 214>:
# DEBUG map_bit_rate => 32099
goto <bb 222>;
<bb 222>:
# codec_ms_90 = PHI <39(213), codec_ms_88(215), codec_ms_89(220), codec_ms_89(221), 32(212), codec_ms_88(214)>
# fmtp_remote_codec_rate_99 = PHI <0(213), 0(215), 0(220), fmtp_remote_codec_rate_620(221), 0(212), 0(214)>
# map_bit_rate_132 = PHI <32001(213), map_bit_rate_1481(215), map_bit_rate_131(220), map_bit_rate_131(221), 13332(212), 32099(214)>
# DEBUG map_bit_rate => map_bit_rate_132
# DEBUG fmtp_remote_codec_rate => fmtp_remote_codec_rate_99
# DEBUG codec_ms => codec_ms_90
# DEBUG i => 0
goto <bb 257>;
This is important because we now have the internal instruction reference ’D.44901_605’ for the strcasecmp that disappears. We can now use this reference to start tracking down where this condition gets incorrectly mapped. Now something to keep in mind is that since we saw in the assembly output earlier that the strcasecmp was optimized out, we know that we’re looking for where in the original version GCC decides that it no longer needs the condition ’D.44901_605’.
In the instruction scope element
Now, taking a closer look at the unpatched version:
#!console
<bb 213>:
D.44889_604 = map_128->rm_encoding;
D.44901_605 = strcasecmp (D.44889_604, "isac");
if (D.44901_605 == 0)
goto <bb 214>;
else
goto <bb 222>;
<bb 214>:
# DEBUG codec_ms => 39
# DEBUG map_bit_rate => 32001
goto <bb 222>;
<bb 222>:
# codec_ms_90 = PHI <codec_ms_88(213), codec_ms_88(215), codec_ms_89(220), codec_ms_89(221), 32(212), 39(214)>
# fmtp_remote_codec_rate_99 = PHI <0(213), 0(215), 0(220), fmtp_remote_codec_rate_619(221), 0(212), 0(214)>
# map_bit_rate_132 = PHI <map_bit_rate_1480(213), map_bit_rate_1480(215), map_bit_rate_131(220), map_bit_rate_131(221), 13332(212), 32001(214)>
# DEBUG map_bit_rate => map_bit_rate_132
# DEBUG fmtp_remote_codec_rate => fmtp_remote_codec_rate_99
# DEBUG codec_ms => codec_ms_90
# DEBUG i => 0
goto <bb 257>;
Line 15 shows the codec_ms_90 variable has the value of a PHI, which is GCC’s way of saying in bb 222, the variable codec_ms_90 is one of the values of the PHI, though GCC is unable to narrow down further the actual value. The values in the PHI reference the basic block that the value is from such as ‘39(214)’ means the constant value 39 could be assigned to codec_ms_90 in the basic block #214.
Next time I deal with a bug like this I’ll remember to check the final optimization debug file earlier in the process. After hours of confirming each of the compilation passes, and auditing the ccp, copyprop, release_ssa phases, it turns out that while the section of code moves around a bit, it’s still present in the compilation pass #143 in the final optimized stage. This would have been found earlier by just grepping the files for the presence of ‘44901_605’ in chunks like switch_coremedia.c.10* then switch_core_media.c.11* etc. Since it’s still present in pass #143, less time could have been spent on the previous passes. See here:
<bb 241>:
D.44901_605 = strcasecmp (D.44889_600, "isac");
if (D.44901_605 == 0)
goto <bb 248>;
else
goto <bb 242>;
<bb 242>:
goto <bb 248>;
<bb 248>:
# codec_ms_90 = PHI <codec_ms_88(242), codec_ms_88(440), codec_ms_89(247), 39(241), 32(240)>
# fmtp_remote_codec_rate_99 = PHI <0(242), 0(440), fmtp_remote_codec_rate_618(247), 0(241), 0(240)>
# map_bit_rate_132 = PHI <map_bit_rate_1480(242), map_bit_rate_1480(440), map_bit_rate_131(247), 32001(241), 13332(240)>
# DEBUG map_bit_rate => map_bit_rate_132
# DEBUG fmtp_remote_codec_rate => fmtp_remote_codec_rate_99
# DEBUG codec_ms => codec_ms_90
# DEBUG i => 0
# DEBUG i => 0
# DEBUG nm_idx => nm_idx_1755
# DEBUG m_idx => m_idx_792
# DEBUG remote_codec_rate => remote_codec_rate_596
# DEBUG match => 0
D.44471_1501 = smh_186->mparams;
D.44996_1498 = D.44471_1501->num_codecs;
D.44997_1496 = D.44996_1498 > 0;
D.44998_1609 = total_codecs_85 > 0;
D.44999_1152 = D.44998_1609 & D.44997_1496;
if (D.44999_1152 != 0)
goto <bb 249>;
else
goto <bb 280>;
Digging into the RTL translation stage
Since GCC has finished optimizing the IR(internal representation) logic, now comes the translation to assembly specific for this architecture. In our case, the Raspberry Pi uses an ARM11 family processor which uses the ARMv6 instruction set. For anyone following along in the GCC source, the RTL definition files are located under /usr/local/src/gcc/gcc/config/arm/ assuming you’ve cloned the GCC repo to /usr/local/src/.
While in the earlier stages we were using the basic block number to track the code, the RTL translation stage uses INSN numbers to track the instructions. The first objective is to locate the insn number for the instruction we want to track, based on the last known basic block number and the internal condition identifier.
If you take a look at the file expand144.orig.txt , I’ve cut out the section for the function switch_core_media_negotiate_sdp from the first RTL translation pass.
Searching for D.44901_605 leads us to these insn blocks:
;; Generating RTL for gimple basic block 241
;; D.44901_605 = strcasecmp (D.44889_600, "isac");
(insn 2726 2725 2727 (set (reg:SI 1670)
(unspec:SI [
(const:SI (unspec:SI [
(symbol_ref/v/f:SI ("*.LC72") [flags 0x82] <var_decl 0x41ea40c0 *.LC72>)
(const:SI (plus:SI (unspec:SI [
(const_int 1050 [0x41a])
] 21)
(const_int 8 [0x8])))
] 27))
(const_int 1050 [0x41a])
] 29)) src/switch_core_media.c:3356 -1
(expr_list:REG_EQUAL (symbol_ref/v/f:SI ("*.LC72") [flags 0x82] <var_decl 0x41ea40c0 *.LC72>)
(nil)))
(insn 2727 2726 2728 (set (reg:SI 1669)
(reg:SI 1670)) src/switch_core_media.c:3356 -1
(expr_list:REG_EQUAL (symbol_ref/v/f:SI ("*.LC72") [flags 0x82] <var_decl 0x41ea40c0 *.LC72>)
(nil)))
(insn 2728 2727 2729 (set (reg:SI 0 r0)
(reg/f:SI 498 [ D.44889 ])) src/switch_core_media.c:3356 -1
(nil))
(insn 2729 2728 2730 (set (reg:SI 1 r1)
(reg:SI 1669)) src/switch_core_media.c:3356 -1
(nil))
(call_insn/i 2730 2729 2731 (parallel [
(set (reg:SI 0 r0)
(call (mem:SI (symbol_ref:SI ("strcasecmp") [flags 0x41] <function_decl 0x40442200 strcasecmp>) [0 S4 A32])
(const_int 0 [0])))
(use (const_int 0 [0]))
(clobber (reg:SI 14 lr))
]) src/switch_core_media.c:3356 -1
(expr_list:REG_EH_REGION (const_int 0 [0])
(nil))
(expr_list:REG_DEP_TRUE (use (reg:SI 1 r1))
(expr_list:REG_DEP_TRUE (use (reg:SI 0 r0))
(nil))))
(insn 2731 2730 0 (set (reg:SI 500 [ D.44901 ])
(reg:SI 0 r0)) src/switch_core_media.c:3356 -1
(nil))
;; if (D.44901_605 == 0)
(insn 2732 2731 2733 (set (reg:CC 24 cc)
(compare:CC (reg:SI 500 [ D.44901 ])
(const_int 0 [0]))) src/switch_core_media.c:3356 -1
(nil))
(jump_insn 2733 2732 0 (set (pc)
(if_then_else (eq (reg:CC 24 cc)
(const_int 0 [0]))
(label_ref 0)
(pc))) src/switch_core_media.c:3356 -1
(expr_list:REG_BR_PROB (const_int 5000 [0x1388])
(nil)))
Yes, the simple line ‘if (!strcasecmp(map->rm_encoding, “isac”)) {’ is now translated into 8 different insn blocks. Also, take note that the insn is printed as a doubly linked list where the first number is the identifier for this insn, the second is the previous insn and the third is the next one.
Specifically, the call that was missing from the objdump was the call out to strcasecmp, so we can start grepping for 2730. Luckly the insn doesn’t change for the call out to strcasecmp through each of the passes.
We find that in pass 200 the insn is removed. After a quick check of pass 199 to confirm that the insn hasn’t been changed:
(note 2731 2726 2728 261 NOTE_INSN_DELETED)
(insn 2728 2731 7022 261 (set (reg:SI 0 r0)
(reg/f:SI 4 r4 [orig:498 D.44889 ] [498])) src/switch_core_media.c:3356 625 {*arm_movsi_vfp}
(expr_list:REG_DEAD (reg/f:SI 4 r4 [orig:498 D.44889 ] [498])
(nil)))
(insn 7022 2728 7023 261 (set (reg:SI 1 r1)
(unspec:SI [
(const:SI (unspec:SI [
(symbol_ref/v/f:SI ("*.LC72") [flags 0x82] <var_decl 0x41ea40c0 *.LC72>)
(const:SI (plus:SI (unspec:SI [
(const_int 1050 [0x41a])
] 21)
(const_int 8 [0x8])))
] 27))
] 3)) src/switch_core_media.c:3356 171 {pic_load_addr_32bit}
(nil))
(insn 7023 7022 2730 261 (set (reg:SI 1 r1)
(unspec:SI [
(reg:SI 1 r1)
(const_int 8 [0x8])
(const_int 1050 [0x41a])
] 4)) src/switch_core_media.c:3356 174 {pic_add_dot_plus_eight}
(expr_list:REG_EQUAL (symbol_ref/v/f:SI ("*.LC72") [flags 0x82] <var_decl 0x41ea40c0 *.LC72>)
(nil)))
(call_insn/i 2730 7023 6540 261 (parallel [
(set (reg:SI 0 r0)
(call (mem:SI (symbol_ref:SI ("strcasecmp") [flags 0x41] <function_decl 0x40442200 strcasecmp>) [0 S4 A32])
(const_int 0 [0])))
(use (const_int 0 [0]))
(clobber (reg:SI 14 lr))
]) src/switch_core_media.c:3356 236 {*call_value_symbol}
(expr_list:REG_DEAD (reg:SI 1 r1)
(expr_list:REG_UNUSED (reg:SI 0 r0)
(expr_list:REG_EH_REGION (const_int 0 [0])
(nil))))
(expr_list:REG_DEP_TRUE (use (reg:SI 1 r1))
(expr_list:REG_DEP_TRUE (use (reg:SI 0 r0))
(nil))))
(insn 6540 2730 5637 261 (set (reg:SI 12 ip)
(mem/c:SI (plus:SI (reg/f:SI 13 sp)
(const_int 220 [0xdc])) [35 %sfp+-1060 S4 A32])) src/switch_core_media.c:3349 625 {*arm_movsi_vfp}
(nil))
(insn 5637 6540 5638 261 (set (reg:CC 24 cc)
(compare:CC (reg:SI 12 ip)
(const_int 0 [0]))) src/switch_core_media.c:3349 199 {*arm_cmpsi_insn}
(expr_list:REG_DEAD (reg:SI 12 ip)
(nil)))
There are several instructions that have been removed and reordered between pass 144 and 199. The only direct references to 2730 are in this block of output around line 10603 in the file csa200.orig.txt:
processing block 2 lr out = 4 [r4] 5 [r5] 6 [r6] 7 [r7] 8 [r8] 9 [r9] 10 [sl] 11 [fp] 13 [sp] 14 [lr]
Adding insn 201 to worklist
Adding insn 193 to worklist
Adding insn 192 to worklist
Adding insn 191 to worklist
Adding insn 184 to worklist
Adding insn 186 to worklist
Adding insn 185 to worklist
Adding insn 2 to worklist
Adding insn 7364 to worklist
Adding insn 7363 to worklist
DCE: Deleting insn 6036
deleting insn with uid = 6036.
DCE: Deleting insn 6044
deleting insn with uid = 6044.
DCE: Deleting insn 6054
deleting insn with uid = 6054.
DCE: Deleting insn 2730
deleting insn with uid = 2730.
DCE: Deleting insn 7023
deleting insn with uid = 7023.
DCE: Deleting insn 7022
deleting insn with uid = 7022.
DCE: Deleting insn 2728
deleting insn with uid = 2728.
try_optimize_cfg iteration 1
Forwarding edge 68->69 to 479 failed.
Of note, pass #200 in this case is a DCE pass, which is short for Dead Code Elimination, which would produce the bug we started with if for some reason GCC didn’t see any changes between the two back to back strcasecmp operations.
Conclusion
While we’ve identified that GCC is in fact removing the second strcasecmp, and we’ve located where this is happening, digging futher into the why would require rebuilding the gcc-4.6 package with debugging patches. That will have to wait for a future post.
I’ve filed the bug upstream with GCC as 61509.