Today I am pretty happy with it. LLMs are finally good enough (fast enough with MTP+MoE, but also just much better in capability) that I can fit local ones into real tasks, and I've used image generation with invokeAI to do some genuinely useful things like rendering concepts for a renovation.
I mostly use lemonade-server and invokeAI for my workloads, previously I used llama-swap, but lemonade is just an easier to manage system. ROCm is finally usable.
Up until end of Q1 2026 it felt like a total waste of money largely due to AMD. ROCm was unusable all of last yera; there was an entire month where PyTorch crashed just trying to multiply two matrices due to AMD Linux driver issues. kyuz0's toolboxes were the only way to do anything really on the machine.
Thankfully things are in a good state now, finally.
I probably actually only need ~64gb of ram. There aren't a ton of high parameter count MoE with a small enough active set that it feels nice to use. But it is nice I can have many models or different modalities in memory at the same time, which is what the LMX Omni "models" do.
The numbers in the article for gptOSS feel a little irrelevant now. Prompt processing is definitely an issue, and diffusion is very very slow. PP speed hits hard you if you run an agent and try to compact context. Realistically most files are not large enough that it's a huge deal, but it does make large-scale agentic work slow.
An underrated factor of the Strix Halo is it's also just a "normal" x86 PC. If all you want to do is run AI then these CPU+RAM based solutions tend to take a ton of $ to get lackluster performance. It does work, but you have to really know it's how it's going to work ahead of time. No driver/OS/ARM-vs-x86 compatibility to worry about hacking to get it to do anything else if you don't need only local LLM 100% of the time - just a typical small workstation.
I've been pretty disappointed with how horrifically memory-bound the spark is. By all rights it feels like it should blow strix halo and apple hardware out of the water, but it's completely hobbled by the low memory bandwidth.
Wow the timing, I spent a few hours trying to get my head around these three choices last night. Got to roughly similar conclusions. And I still don't know what to do.
I have more money than sense. I don't even know (yet) how silly it is, but maybe a Strix Halo, with 2x 5090s with p2p pcie patch? I'd do more 5090s but the power consumption and need to water cool is too much for me.
I'm also itchy because it seems like Ryzen Pro 495 is coming soon with even more unified RAM... (thoughts very much appreciate on any of this...)
AMD ROCm has come a long long long ways since last year, but you'll probably be happier not dealing with AMD's software.
I posted another comment with my experience with AMD 395+, I am overall happy and it's usable now, but it's only useful for models under 64gb of vram due to the active parameter counts on larger MoEs.
If you add 2x 5090s, do you actually need the base system?
I have the Framework Desktop with 395+ 128gb RAM
Today I am pretty happy with it. LLMs are finally good enough (fast enough with MTP+MoE, but also just much better in capability) that I can fit local ones into real tasks, and I've used image generation with invokeAI to do some genuinely useful things like rendering concepts for a renovation.
I mostly use lemonade-server and invokeAI for my workloads, previously I used llama-swap, but lemonade is just an easier to manage system. ROCm is finally usable.
Up until end of Q1 2026 it felt like a total waste of money largely due to AMD. ROCm was unusable all of last yera; there was an entire month where PyTorch crashed just trying to multiply two matrices due to AMD Linux driver issues. kyuz0's toolboxes were the only way to do anything really on the machine.
Thankfully things are in a good state now, finally.
I probably actually only need ~64gb of ram. There aren't a ton of high parameter count MoE with a small enough active set that it feels nice to use. But it is nice I can have many models or different modalities in memory at the same time, which is what the LMX Omni "models" do.
The numbers in the article for gptOSS feel a little irrelevant now. Prompt processing is definitely an issue, and diffusion is very very slow. PP speed hits hard you if you run an agent and try to compact context. Realistically most files are not large enough that it's a huge deal, but it does make large-scale agentic work slow.
An underrated factor of the Strix Halo is it's also just a "normal" x86 PC. If all you want to do is run AI then these CPU+RAM based solutions tend to take a ton of $ to get lackluster performance. It does work, but you have to really know it's how it's going to work ahead of time. No driver/OS/ARM-vs-x86 compatibility to worry about hacking to get it to do anything else if you don't need only local LLM 100% of the time - just a typical small workstation.
I believe Strix Halo can do much better than these numbers.
The first Spark sw update improved performance significantly. Maybe AMD software team can get their act together and do the same? :)
Ive been looking for a bench like this thanks for sharing!
I've been pretty disappointed with how horrifically memory-bound the spark is. By all rights it feels like it should blow strix halo and apple hardware out of the water, but it's completely hobbled by the low memory bandwidth.
Wow the timing, I spent a few hours trying to get my head around these three choices last night. Got to roughly similar conclusions. And I still don't know what to do.
I have more money than sense. I don't even know (yet) how silly it is, but maybe a Strix Halo, with 2x 5090s with p2p pcie patch? I'd do more 5090s but the power consumption and need to water cool is too much for me.
I'm also itchy because it seems like Ryzen Pro 495 is coming soon with even more unified RAM... (thoughts very much appreciate on any of this...)
AMD ROCm has come a long long long ways since last year, but you'll probably be happier not dealing with AMD's software.
I posted another comment with my experience with AMD 395+, I am overall happy and it's usable now, but it's only useful for models under 64gb of vram due to the active parameter counts on larger MoEs.
If you add 2x 5090s, do you actually need the base system?