Finally glad to get my machine that is providing social media services stable, it took going through three power supplies, seven motherboards, and two CPUs to finally get a stable fully functional hardware platform again.
The main problem revolved around the fact that the i9-10980xe chip is six years old, but the i9-109xx family is the only non-Xeon Intel chip capable of addressing 256GB of RAM, and the AMD chips capable of doing same are way beyond my budgetary capabilities. And the i9-10980xe is an excellent chip if and only if you can provide for it's ridiculous power and heat removal requirements. Although rated at 165 watts dissipation which isn't all that bad, that's at 3.6Ghz and an average work load. Run all cores at 4.8Ghz and that can go to 540 watts (I measured it), and there in lies the challenge.
I did not want to go with water cooling because the machine is in a co-lo facility that has wiring in the floor and so getting a leak and burning the building down seemed like a bad thing so went with a Noctua 15D cooler that I had replaced the stock 1500 RPM fans with some 3000 RPM fans. This combination allowed it to dissipate that 540 watts continuously at home but that was with an ambient temperature of 60F or so, the co-lo facility has an ambient around 80F and that 20F reduced the power dissipation to about 450 watts, so I had to scale the motherboard back to 450 watt throttling which allows the CPU to operate at 4.5Ghz all cores for most workloads, it will slow to 4.1-4.3 Ghz during kernel compiles using all 36 threads but that is about the only workload where it throttles.
Originally, I had an Asus Prime X299 A II motherboard, but the machine became unstable. At first I thought
it was the power supply and after replacing it seemed to be stable for about a month and then became
unstable again. This time I replaced the power supply with a Seasonic 1200 watt unit. Still unstable so thought
motherboard must have gone south.
Tried two more Asus boards, both had bad memory channels, so went with a Gigabyte X299 Aorus and that worked well for several months, then it died.
So I tried an Asrock motherboard, it definitely was not up to the current demands of the CPU and melted the solder where the power connected to the CPU socket.
So tried another Gigabyte board but this was a Game Master 3 used, it arrived dead out of the box and
would not even post.
Finally found another Gigabyte X299 Aorus but this was a revision 1.1, the old was 1.0, and it worked. And
what was really nice about this board is that it has memory bank and rank interleaving. When you're feeding
an 18 core processor this makes a major difference. It took the time it took to compile a bindeb-pkg kernel
page from about 18 minutes to 7 minutes and 50 seconds. It also made the CPU draw a lot more power at a given clock frequency since the cores are spending more time doing useful work instead of waiting on memory, hence the need to run at 4.5 Ghz instead of 4.8 Ghz.
The CPU I had was damaged when the Asrock melted down and it would cause it to just hang after a few hours of operation, so I had to replace it. The replacement CPU has some funky voltage differences. The old CPU would idle at .8volts stable but needed 1.45v when fully cranked to remain stable. The new CPU will not idle at .8v, I have it at 1.0v even for it to be stable, but it will run as high as 5Ghz on just 1.25v stable but too much heat to run that fast.
Anyway new machine is running well, so we're back.