Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Airbus A320 Fly by wire corrupted by radiation in flight (viewfromthewing.com)
113 points by JohannMac 12 days ago | hide | past | favorite | 66 comments




This is highly unusual, so there may be something more to it than only speculation about radiation. The emergency AD says:

> Before next flight after the effective date of this AD, replace or modify each affected ELAC with a serviceable ELAC in accordance with the instructions of the AOT. > > A ferry flight (up to 3 Flight Cycles, non-ETOPS, no passengers) is permitted to position the aeroplane to a location where the replacement or modification can be accomplished.

That's a very limiting AD. The "before the next flight" part is unusual, ADs often have a limit to the next inspection or X flight hours or similar, not immediately.


Is some major new solar activity imminently expected that explains the urgency?


According to the airworthiness directive:

  Affected ELAC: Elevator aileron computer (ELAC) ELAC B L104

  Serviceable ELAC: ELAC B L103+
So it's a regression that affects decades old aircraft. Of course Airbus is now also meddling with "AI":

https://www.airbus.com/en/innovation/digital-transformation/...

Obviously there no direct connection here, but it seems that destabilizing perfectly working aircraft could be the product of a culture shift.


> Obviously there no direct connection here, but it seems that destabilizing perfectly working aircraft could be the product of a culture shift.

A culture shift following a fad in the last couple of years that caused "a regression" (whatever you mean by that) in an aircraft that was made years before, and that was designed years before again? How would that work? They can stop selling aircrafts if they have a time machine.


The regression is obviously ELAC B L104 vs. ELAC B L103+. The culture shift (if there was one) would obviously have affected the latest L104.

I begin to revise my opinion about LLMs. An LLM would not have misunderstood the comment that badly.


Can you expand on that? Do you have any knowledge about those parts except their part numbers?

From what I have heard of how much Airbus pay people . . “Much” is the wrong word.

https://en.wikipedia.org/wiki/Single-event_upset

Apparently it has happened to an Airbus once before.



Why do I get the feeling both of these are a diagnosis of exclusion, and there's not particularly any hard evidence that radiation is causing these uncommanded manoeuvres? How unlikely is it for Airbus models specifically to be vulnerable to radiation-induced data corruption twice in almost twenty years? Are there similar incidents recorded for Boeing jets?

So it's called the Icarus bug, right? Please tell me they're calling it the Icarus bug.

I can understand why this would be priority for Airbus even without the incident of a flight losing altitude - recently read that a major philosophical difference between Airbus and Boeing is that Airbus prioritises the safety of the Aircraft through controls (hardware and software) where as Boeing believes a human should always be the final decision maker and be able to over ride any control. An Airbus rarely allows a pilot to over ride any warning or allow the pilot to exceed the specifications of the aircraft, whereas a Boeing will warn the pilot about an unsafe action but will allow the pilot to over ride it. It will be interesting to see how things change when AI tech creeps into Aviation tech.

It’s an oversimplification to position them as opposites. Airbus uses higher level contol, more a flight path than a control surface movement. But pilots can revert to direct law and have full contol authority when required. Boeing aims for a more traditional control feel, you move control surfaces instead of commanding an outcome. But with layers of substantial augmentation on top of it, up to and including for example the 737 MAX MCAS.

In practice, both approaches blend automation and pilot authority rather than strict philosophical extremes. And the practical difference at the controls is also not as extreme as some people think it is.


There is no oversimplifying happening here. There is no documented procedure to switch to direct law in an Airbus.

In fact, the only way to get into direct law on a fully functional plane is to start pulling circuit breakers for the (redundant) flight computers and inertial reference units.


People might talk about it that way but it's not really the case. It's mechanical limits versus software limits, the later being required because it's a software driven system. The limits may seem artificial and set my humans, but they're no different to "natural" mechanical limitations, in the sense they both stop the pilot from commanding what the machine cannot deliver.

Didn’t workout with the Boeing 737 Max, did it?

The issue could have been avoided if MCAS was made properly redundant (and not rely on single sensor, wtf) and pilots were trained on it. It was all about the money. The airframe is fine.

That was basically the first Boeing that went in the Airbus direction.

The 777 and 787 before it are true fly-by-wire designs like the Airbus in question here; the 737 MAX isn't and never was. It just had a computer that was supposed to add artificial inputs under a very specific condition, so it could continue to fly like the older models under the same type certificate and not require extra pilot training. It turns out that the condition could be triggered erroneously, and the logic to determine the artificial inputs was deeply flawed.

The 737NG already had computer controlled feedback to the control columns, the MAX added computer controlled spoiler deployment (like the 757 and 767) and elevator trim.

Crucially, while trying to convince everyone that it’s basically the same old 737 to save on pilot retraining costs, nevermind the significantly larger engines.

Airbus is much more transparent about its automation. Pilots even learn about the procedure to fly an A320 with a complete fly-by-wire outage using only mechanical emergency elevator controls and differential thrust.


> Airbus is much more transparent about its automation.

Airbus is OK, but could be better. There is a long history of Airbus crews facing unexpected corner cases in their flight control laws, and fortunately only a few of them have had fatal outcomes. While there are only a few "major" modes, there are a surprisingly large number of edge cases that can be encountered.


That wasn't just automating... That was hiding the features from the manuals to avoid having to re train the flight crews.

No it wasn't. See also: MD-11, 737NG, 777, 787.

The title sounds like speculative clickbait.

From https://www.airbus.com/en/newsroom/press-releases/2025-11-ai...:

  Analysis of a recent event involving an A320 Family aircraft has revealed that intense solar radiation may corrupt data critical to the functioning of flight controls.
This is different from the core claim that the incident was caused by radiation. What are the prior probabilities that the system was exposed to "intense radiation"? Vs some other mundane cause such as a faulty wire or mechanical issues? And what is the evidence supporting the former hypothesis?

> What are the prior probabilities

100% for electronics operating at altitude. Also on the ground, but we mostly act like it doesn't happen and are usually ignorant of the root cause when it does.


100% of what? Those things have ECC and redundancy to the hilt. The data corruption odds are real and higher than one would expect but still not very high.

Look at pag 138...and correlation with altitude...: https://www.atsb.gov.au/sites/default/files/media/3532398/ao...

Yes, look at page 135 as well. They don't know, they have educated guesses. It could be SW, it could be hw.

EMI causing bugs is the equivalent of "bad juju".


My take was initially similar to yours but I have just updated it considering the affected units did not in fact have EDAC on them. With ECC, I would say the odds of having such an error decrease dramatically. But without it... yeah, I can believe it.

Where did you see that it didn't have ECC?

"As noted in section 3.5.2, the CPU module on units 4167 and 4122 did not incorporate EDAC, nor was it required by the aircraft manufacturer’s specification"

It was apparently added in a later HW revision

"The LTN-101 ADIRU’s CPU module was later redesigned to reduce costs and to include error detection and correction (EDAC). EDAC is used for detecting and correcting single-bit errors in RAM chips to give protection from single event effects (SEEs, see section 3.6.6). This change was a significant redesign and resulted in a new CPU module part number (466871-01). The EDAC was performed by a new ASIC, and all of the RAM chips used on the CPU module were replaced with a different chip.13"

https://www.atsb.gov.au/sites/default/files/media/3532398/ao...


Thanks; hmm but that means the version with EDAC has been in use since 2002, so is less likely related to todays update? (hth did they design stuff without protection back upto that point?)

It's not _only_ that, it's also the fact that the failure mitigation was not catching it, so even if they "fix" the random spikes issue, they still need to consider the "what if" issue if the thing _still_ happens. My money is that they will mitigate this:

"There was a limitation in the algorithm used by the A330/A340 flight control primary computers for processing angle of attack (AOA) data. This limitation meant that, in a very specific situation, multiple AOA spikes from only one of the three air data inertial reference units could result in a nose-down elevator command. [Significant safety issue]"

It's the only one marked "Significant safety issue" so my money is on that.


That's applicable for one specific model of ADIRU (basically determines where the aircraft is in 3d space in terms of position, rotation, velocities, and accelerations) from a single manufacturer (Litton). These aircraft have dozens of computers for different functions, many of them with multiple manufacturer options. There are at least 4 different ADIRU makers that airlines have been able to specify at different times including the Litton.

The ELACs (controlling the elevator and aileron actuators according to the demands computed by other functions) are made by Thales specifically for this aircraft type and probably have a quite different design.


The airworthiness directive replaces ELAC B L104 with ELAC B L103+, without giving a reason. Unless L103+ happens to have better shielding, it looks like another issue.

> What are the prior probabilities that the system was exposed to "intense radiation"?

I suggest trying to fly with a Geiger counter. At the cruise altitude you have something like 15-20x the normal background level, when flying over the pole it can rise to 30x.

It's actually not caused by the solar radiation, it's too weak to reach the flight level. It is caused by cosmic rays, and the solar activity modulates how much of the cosmic radiation reaches the lower levels of the atmosphere.


How can a ~bit flip can cause something that bad? It would mean everything else like you mention would also be that bad. Bad ram, bad hard disk, loose wire, bird hits the plane, everyone jumps at once, leaking military jamming.

Radiation should be covered under normal safety, along with they already shield for it.

People often wrongly blame things on radiation, bit flips etc. when they don't know the real cause. A well known pattern.

There is a Hacker News item that was on high repeat where they eventually they solved the ~'cosmic radiation bug' as they first called it. Cannot remember the link.

It will not be true no matter what the, I know 'interesting' facts, 'I have a wiki link', crowd tell you. Real life is boring (and amazing). See Heisenbug's - https://en.wikipedia.org/wiki/Heisenbug



Caused uncommanded pitch down, could exceed structural integrity of aircraft. There are redundant units - unknown why this can happen given redundancy.

Also not very clear how they attributed the failure to solar radiation.

In my humble opinion, whenever someone dropped the idea... "Maybe it's solar radiation" it never was solar radiation. There was a subtle bug in the system or something. It's just such a cop-out to attribute it to, solar radiation, it's our profession's variant of magic.


I can't find any further information on this intel testing like what altitude they perform the tests at.

AMD has perform testing at data centers of different altitudes and there is some statistical significance in SRAM error rates. And that is typically only around 5000-6000 ft msl.

https://dl.acm.org/doi/10.1145/2503210.2503257

Planes are much higher than that in operation so get larger amounts of unfiltered solar flux.

This may be one of the causes of higher cancer rates in pilots, but eliminating other environmental causes may be difficult

https://www.cdc.gov/niosh/aviation/prevention/aircrew-cancer...


These have specific error/data spike patterns. This document as of page 133 as good example of such investigation and conclusions: https://www.atsb.gov.au/sites/default/files/media/3532398/ao...

It's an interesting document, but I am unconvinced that data spikes mean environmental radiation driven data corruption. The fact that they have a certain pattern suggests it's not random.

They certainly do put a chapter with potential triggers down there, and it's a good take, you can't just discard the possibility. But above, they also have SW bugs as a potential trigger, so... Essentially, they don't know for sure yet.


> But above, they also have SW bugs as a potential trigger, so

They also did extensive tests and analyses and came to the conclusion that a bug was highly unlikely (they would never say that something is impossible, but it still is exceedingly improbable).

> Essentially, they don't know for sure yet.

That’s not a really fair assessment. Their conclusion is that they could not estimate the likelihood of a radiation effect, so in that sense they don’t know for sure. But they still eliminated a lot of options. Almost all of them, actually.


I think it's a quite fair assessment. It's not an indictment of their engineering or anything, but they can't say for sure what caused the issue and the analyzed all they could. The conclusion is "we don't know, we have some guesses". Probably it irks me the most because "cosmic rays" are impossible to prove. It's the perfect scapegoat. If I had a penny every time that someone put it out as the possible cause of a bug... I'd still be poor, but... well, I'd have a couple of pennies.

EDIT: On a deeper read, I am inclined to be a bit more charitable to this theory because, to my surprise: "As noted in section 3.5.2, the CPU module on units 4167 and 4122 did not incorporate EDAC"

I did not consider these units were _this_ old, so they did not have error correction on them. Nowadays, most every MCU has ECC on them. So, yes, without ECC the odds are quite larger that they DID get a "bit flip"


CGMthrowaway has an interesting comment on the other thread about this subject, that it's likely not solar radiation. "failing solid state relay or contactor on the shared avionics power bus" [1] Related to the previous 2008 incident on Qantas 72 that had similar characteristics.

[1] https://news.ycombinator.com/item?id=46083560

  > On the Qantas 72 flight (2008), the ATSB report showed the same power spike that upset the ADIRU also left tidy 1-word corruptions in the flight data recorder. Those aligned with the clock cycle, shared the same amplitude and were confined to single ARINC words. That is pretty much exactly the signature of a failing solid state relay or contactor on the shared avionics power bus (upstream of both FDR and fly by wire).

  > Radation-driven bit flips would be Poisson distributed in time and energy. So that is one way to find out

I don’t think they did. Their analysis indicates that it could, and this analysis happened as part of an investigation of an incident, but they don’t say that was definitely the cause of that incident.

Presumably the same affected units are part of the redundancy package. The article essentially says as much.

So the patch will be physical, I imagine.

("Apply this very expensive special tape from (e.g.) 3M here and here.")


I've read the EASA emergency airworthiness directive, it's only a software change.

A software change that remedies solar radiation issues? Cool.

Easy enough to imagine. Something improperly was judged to not need concencus, running the calculation twice, or some other software mitigation. Revise process to include mitigation.

"just run the calculation twice. surely the darned solar radiation will take care not to hit random parts of the memory or any critical registers in the cpu"

What experience do you have with firmware that you're basing all this on?

It's okay to not be nationalistic all of the time. Airbus will survive this extreme edge case. None of these comments matter in the bigger picture.

Based on your ad hominem of a reply I suppose it's safe to assume you don't have the experience, then

Instead of 3 copies make 7 for majority vote?

Best explanation I've seen (and claims to be from a published but not public report on it), is that their 3 way consensus didn't smooth over repeated wildly wrong outputs correctly. Concestency problem strikes again.

I believe most A320s can do OTA software updates (including dowgrades)

OTA? That sounds extremely implausible.

It's a thing, but it's not universal on the A320.

Original 1984 critical hardware: the box has an EEPROM module, you swap it on the plane.

FMS (which requires monthly nav data updates) and all modern hardware: the box can be updated over the ARINC 429 serial bus or Ethernet (newer systems/planes), called dataloading

Dataloading had different methods. A320s through the 2000's, most airlines had a 3.5 floppy disk drive on board (Airbus FDDU), and a mechanic fed floppies in. It was slow. Evolution of that was a USB port that took a flash drive.

Most current planes of older models just got rid of on-board dataloading. The mechanic uses a laptop with a cable or purpose-built tablet and plugs into a port. The mechanic can download the software via Wi-Fi or cellular onto the device: https://www.teledynecontrols.com/products/hardware-systems/p...

Airlines can indeed buy a on-board box that connects to Wi-Fi and LTE at the gate which downloads software. This is standard for the latest models that produce more data (A350, 787), but optional for older models. The mechanic still needs to go to the plane and push the buttons to tell it to load.

https://www.teledynecontrols.com/products/dataloading/eadl-x... https://www.teledynecontrols.com/products/hardware-systems/g...


Wow, I stand corrected. I thought it was still all floppy disks (or emulators thereof) for software, if not for navigation data etc.

Thank you!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: