BGPD on Gentoo, SPARC64 crashes and shuts ALL BGP sessions if the updates contain too many prepends. The error is: BGP: Received signal 10 at 1187568931 (si_addr 0x16eb2f); aborting... Reproducible: Always Steps to Reproduce: 1. Establish a BGP Session and receive a prefix update with a hundred AS numbers in the path. 2. Using 'debug update' you can see that the sessions fail and BGPD crashes at the prefix prior to the one with the hundred AS numbers in its path. 3. The version is net-misc/quagga-0.98.6-r2 USE="bgpclassless fix-connected-rt pam tcpmd5 -ipv6 -multipath -ospfapi -realms -snmp -tcp-zebra" 0 kB 4. Replicated on two different Servers on two different kernels running 2.6.21-gentoo-sources-r4 and vanilla-sources-2.6.17.14 Actual Results: BGPD drops ALL sessions, no longer outputs anything and needs restarted. Expected Results: We suffered a prolonged outage trying to isolate where the problem was and had to get our upstream provider to run debug at their side too and then filter this /24 prefix with 100-plus ASs. BGPD should have stayed up and ignored the massive number of paths in the update.
Despite filtering the prefix with a hundred ASs in its path upstream this issue arose again 24 hours later and crashed losing all BGP Sessions completely and requiring a restart.
Can you get: a) A backtrace or core dump (required: *unstripped* binaries, use the 'file' command on relevant Quagga libraries and executables) b) The exact AS_PATH causing the problem? If possible, could you try: a) Stock Quagga (presumably TCP-MD5 is a requirement for you) b) Quagga 0.99 (0.99 bgp_aspath.c was half-rewritten, and the unit test contains a test for a 250 ASN AS_SEQUENCE segment in an AS_PATH - i.e. this is handled just fine in 0.99)
Thanks, Paul. We will do so over the coming days. We can confirm that i686 Quagga of the same version is not affected by this bug and we have rolled out a new platform until this is resolved.
Ok, interesting. I wonder is it alignment (but Linux on SPARC64 traps and fixes alignment faults right? unlike Solaris and BSD on SPARC)
Please see some of the requested info below. The AS path that caused the problem was: 3257 18747 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 27803 I make this 130 AS numbers long. This caused SIGBUS error (signal 10), probably due to the BGP path attribute (AS path in this case) having the extended length bit set and using bytes 3 & 4 of the attribute to hold the length information. Process dead after this of course. This is actually a legal path in BGP however, despite the length, though obviously it has no legitimate real world use. Quagga should have parsed this correctly without crashing (the SIGBUS causing a complete crash is never a desirable outcome), and perhaps should have a feature similar to the Cisco IOS command ‘bgp maxas-limit’ which discards paths longer than a set value. For an independent confirmation the Team Cymru Internet Monitor (http://www.cymru.com/BGP/max_aspath_len.html) recorded this path overnight on the 20th of August.
We didnÂ’t take a core in the end. The problem was that the binary contained no debugging symbols (i.e. was stripped) when it was ran inside gdb. Is there any one that can please assist ?
SIGBUS? So an alignment fault rather than a crash, it seems, and yes it's likely due to the extended length (the 130 ASNs themselves aren't a problem, other than they need extended length in the attribute header - the extra byte throws off the alignment. I'm not sure where or how the old code does unaligned reads though - my vague memory is that it did byte reads via a char * always..). (Also weird that Linux didn't fixup - I thought Linux always did on all platforms, where possible, unlike most other *NIXes. Linux did trap and fixup on AXP anyway). Can you test 0.99? Compile it and cd to to the 'tests' directory, run make there and aspathtest. If it doesn't fault, and reports no errors, I would expect 0.99 bgpd should be immune to the bug below.
Thanks, Paul. We'll have to leave it for a few days because as you can imagine our customers still aren't too happy about the downtime we had. Even out of hours testing will need left for a few days. If any one can simulate a 100 plus AS path length test in the meantime then please do ! I can't believe we are the only ISP running Gentoo SPARC64 globally !
Hmm, you can run 'aspathtest' without affecting existing services. It's a standalone unit-test of the 0.99 bgp_aspath.c code. That would at least confirm this issue does not affect 0.99.
Hi Paul, The results from quagga-0.99.8 are looking good. I'll leave it to your expert eye to confirm the output before marking the bug as resolved though. Thanks for your help, Chris Server1 tests # ./aspathtest seq1: seq(8466,3,52737,4096) aspath: 8466 3 52737 4096 validating...: OK empty prepend seq1: seq(8466,3,52737,4096) aspath: 8466 3 52737 4096 OK seq2: seq(8722) seq(4) aspath: 8722 4 validating...: OK empty prepend seq2: seq(8722) seq(4) aspath: 8722 4 OK seq3: seq(8466,3,52737,4096,8722,4) aspath: 8466 3 52737 4096 8722 4 validating...: OK empty prepend seq3: seq(8466,3,52737,4096,8722,4) aspath: 8466 3 52737 4096 8722 4 OK seqset: seq(8482,51457) set(5204) aspath: 8482 51457 {5204} validating...: OK empty prepend seqset: seq(8482,51457) set(5204) aspath: 8482 51457 {5204} OK seqset2: seq(8467, 59649) set(4196,48658) set(17322,30745) aspath: 8467 59649 {4196,48658} {17322,30745} validating...: OK empty prepend seqset2: seq(8467, 59649) set(4196,48658) set(17322,30745) aspath: 8467 59649 {4196,48658} {17322,30745} OK multi: seq(6435,59408,21665) set(2457,61697,4369), seq(1842,41590,51793) aspath: 6435 59408 21665 {2457,4369,61697} 1842 41590 51793 validating...: OK empty prepend multi: seq(6435,59408,21665) set(2457,61697,4369), seq(1842,41590,51793) aspath: 6435 59408 21665 {2457,4369,61697} 1842 41590 51793 OK confed: confseq(123,456,789) aspath: (123 456 789) validating...: OK empty prepend confed: confseq(123,456,789) aspath: (123 456 789) OK confed2: confseq(123,456,789) confseq(111,222) aspath: (123 456 789) (111 222) validating...: OK empty prepend confed2: confseq(123,456,789) confseq(111,222) aspath: (123 456 789) (111 222) OK confset: confset(456,123,789) aspath: [123,456,789] validating...: OK empty prepend confset: confset(456,123,789) aspath: [123,456,789] OK confmulti: confseq(123,456,789) confset(222,111) seq(8722) set(4196,48658) aspath: (123 456 789) [111,222] 8722 {4196,48658} validating...: OK empty prepend confmulti: confseq(123,456,789) confset(222,111) seq(8722) set(4196,48658) aspath: (123 456 789) [111,222] 8722 {4196,48658} OK seq4: seq(8466,2,52737,4096,8722,4) aspath: 8466 2 52737 4096 8722 4 validating...: OK empty prepend seq4: seq(8466,2,52737,4096,8722,4) aspath: 8466 2 52737 4096 8722 4 OK tripleseq1: seq(8466,2,52737) seq(4096,8722,4) seq(8722) aspath: 8466 2 52737 4096 8722 4 8722 validating...: OK empty prepend tripleseq1: seq(8466,2,52737) seq(4096,8722,4) seq(8722) aspath: 8466 2 52737 4096 8722 4 8722 OK someprivate: seq(8466,64512,52737,65535) aspath: 8466 64512 52737 65535 validating...: OK empty prepend someprivate: seq(8466,64512,52737,65535) aspath: 8466 64512 52737 65535 OK allprivate: seq(65534,64512,64513,65535) aspath: 65534 64512 64513 65535 validating...: OK empty prepend allprivate: seq(65534,64512,64513,65535) aspath: 65534 64512 64513 65535 OK long: seq(8466,3,52737,4096,34285,<repeated 49 more times>) aspath: 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34 285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 5 2737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 409 6 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34 285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 5 2737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 409 6 34285 validating...: OK empty prepend long: seq(8466,3,52737,4096,34285,<repeated 49 more times>) aspath: 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34 285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 5 2737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 409 6 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34 285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 5 2737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 409 6 34285 OK seq1extra: seq(8466,3,52737,4096,3456) aspath: 8466 3 52737 4096 3456 validating...: OK empty prepend seq1extra: seq(8466,3,52737,4096,3456) aspath: 8466 3 52737 4096 3456 OK empty: <empty> aspath: validating...: OK empty prepend empty: <empty> aspath: OK redundantset: seq(8466,3,52737,4096,3456) set(7099,8153,8153,8153) aspath: 8466 3 52737 4096 3456 {7099,8153,8153,8153} validating...: OK empty prepend redundantset: seq(8466,3,52737,4096,3456) set(7099,8153,8153,8153) aspath: 8466 3 52737 4096 3456 {7099,8153,8153,8153} OK prepend seq1: seq(8466,3,52737,4096) to seq2: seq(8722) seq(4) aspath: 8466 3 52737 4096 8722 4 OK prepend seq2: seq(8722) seq(4) to seqset: seq(8482,51457) set(5204) aspath: 8722 4 8482 51457 {5204} OK prepend seqset: seq(8482,51457) set(5204) to seqset2: seq(8467, 59649) set(4196,48658) set(17322,30745) aspath: 8482 51457 {5204} 8467 59649 {4196,48658} {17322,30745} OK prepend seqset2: seq(8467, 59649) set(4196,48658) set(17322,30745) to multi: seq(6435,59408,21665) set(2457,61697,4369), seq(1842,41590,51793) aspath: 8467 59649 {4196,48658} {17322,30745} 6435 59408 21665 {2457,4369,61697} 1842 41590 51793 OK prepend multi: seq(6435,59408,21665) set(2457,61697,4369), seq(1842,41590,51793) to confed: confseq(123,456,789) aspath: 6435 59408 21665 {2457,4369,61697} 1842 41590 51793 (123 456 789) OK prepend confed: confseq(123,456,789) to confed2: confseq(123,456,789) confseq(111,222) aspath: (123 456 789) (123 456 789) (111 222) OK prepend confed2: confseq(123,456,789) confseq(111,222) to confset: confset(456,123,789) aspath: (123 456 789) (111 222) [123,456,789] OK prepend confset: confset(456,123,789) to confmulti: confseq(123,456,789) confset(222,111) seq(8722) set(4196,48658) aspath: [123,456,789] (123 456 789) [111,222] 8722 {4196,48658} OK prepend confmulti: confseq(123,456,789) confset(222,111) seq(8722) set(4196,48658) to confset: confset(456,123,789) aspath: (123 456 789) [111,222] 8722 {4196,48658} [123,456,789] OK prepend long: seq(8466,3,52737,4096,34285,<repeated 49 more times>) to tripleseq1: seq(8466,2,52737) seq(4096,8722,4) seq(8722) aspath: 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34 285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 5 2737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 409 6 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34 285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 5 2737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 4096 34285 8466 3 52737 409 6 34285 8466 2 52737 4096 8722 4 8722 OK aggregate seq1: seq(8466,3,52737,4096) with seq2: seq(8722) seq(4) aspath: {3,4,4096,8466,8722,52737} OK aggregate seq1: seq(8466,3,52737,4096) with seq3: seq(8466,3,52737,4096,8722,4) aspath: 8466 3 52737 4096 {4,8722} OK aggregate seq3: seq(8466,3,52737,4096,8722,4) with seq1: seq(8466,3,52737,4096) aspath: 8466 3 52737 4096 {4,8722} OK aggregate seq3: seq(8466,3,52737,4096,8722,4) with seq4: seq(8466,2,52737,4096,8722,4) aspath: 8466 {2,3,4,4096,8722,52737} OK aggregate seq4: seq(8466,2,52737,4096,8722,4) with seq3: seq(8466,3,52737,4096,8722,4) aspath: 8466 {2,3,4,4096,8722,52737} OK left cmp seq1: seq(8466,3,52737,4096) and seq2: seq(8722) seq(4) OK left cmp seq1: seq(8466,3,52737,4096) and seq3: seq(8466,3,52737,4096,8722,4) OK left cmp seq1: seq(8466,3,52737,4096) and tripleseq1: seq(8466,2,52737) seq(4096,8722,4) seq(8722) OK left cmp seq1: seq(8466,3,52737,4096) and seq1extra: seq(8466,3,52737,4096,3456) OK left cmp seq1: seq(8466,3,52737,4096) and empty: <empty> OK left cmp seq2: seq(8722) seq(4) and tripleseq1: seq(8466,2,52737) seq(4096,8722,4) seq(8722) OK left cmp confed: confseq(123,456,789) and confed2: confseq(123,456,789) confseq(111,222) OK left cmp confed: confseq(123,456,789) and confset: confset(456,123,789) OK left cmp confed2: confseq(123,456,789) confseq(111,222) and confset: confset(456,123,789) OK left cmp seq2: seq(8722) seq(4) and confmulti: confseq(123,456,789) confset(222,111) seq(8722) set(4196,48658) OK left cmp seq1: seq(8466,3,52737,4096) and confmulti: confseq(123,456,789) confset(222,111) seq(8722) set(4196,48658) OK left cmp seqset: seq(8482,51457) set(5204) and confmulti: confseq(123,456,789) confset(222,111) seq(8722) set(4196,48658) OK left cmp seq1: seq(8466,3,52737,4096) and confed: confseq(123,456,789) OK left cmp seq2: seq(8722) seq(4) and confed: confseq(123,456,789) OK left cmp seq1: seq(8466,3,52737,4096) and confset: confset(456,123,789) OK left cmp seq2: seq(8722) seq(4) and confset: confset(456,123,789) OK left cmp tripleseq1: seq(8466,2,52737) seq(4096,8722,4) seq(8722) and confed: confseq(123,456,789) OK left cmp tripleseq1: seq(8466,2,52737) seq(4096,8722,4) seq(8722) and confed2: confseq(123,456,789) confseq(111,222) OK left cmp tripleseq1: seq(8466,2,52737) seq(4096,8722,4) seq(8722) and confset: confset(456,123,789) OK left cmp confmulti: confseq(123,456,789) confset(222,111) seq(8722) set(4196,48658) and confed: confseq(123,456,789) OK left cmp confmulti: confseq(123,456,789) confset(222,111) seq(8722) set(4196,48658) and confed2: confseq(123,456,789) confseq(111,222) OK left cmp confmulti: confseq(123,456,789) confset(222,111) seq(8722) set(4196,48658) and confset: confset(456,123,789) OK empty_get_test, as: OK failures: 0 aspath count: 0
That strongly suggests 0.99 ought to be immune.
It looks like 0.99 will resolve this bug.