Sponsored by

The Electrification of Heavy Machinery Has a Ground Floor

Tesla did it to cars. Now the same shift is coming for excavators, forklifts, cranes, and military equipment. The difference is that nobody has owned this moment yet — until RISE Robotics.

Their technology strips hydraulics out of heavy machinery entirely and replaces it with a patented electric actuator. No fluid. Full digital control. Built for the autonomous machines that are coming whether the industry is ready or not. The Pentagon is already a customer.

Last Round Oversubscribed. $9.7M in revenue already on the board. Dylan Jovine of ‘Behind the Markets’ spotted it early. The Wefunder community round lets anyone invest alongside institutional backers.

Dear Sentinels

This week, we’re diving into what I like to call “The Silicon Synthesis”, the third instalment in our five-part adventure into the world of Field-Programmable Gate Arrays (FPGAs). But before we get our hands dirty with silicon, there’s a bit of drama to address. Anthropic, in a move worthy of a soap opera, announced its Mythos AI system, then promptly decided it was too dangerous for public consumption. Cue the usual scare stories and the inevitable “it’s just hype” brigade. Honestly, your guess is as good as mine. Meanwhile, Marcus Hutchins has a video where he talks about a 27-year-old zero-day found in an OS. Yes, it’s real, but it’s not quite the end of the world some headlines would have you believe. If you fancy a watch, the video is called “Is Cybersecurity Over”.

FPGAs are those clever, reconfigurable bits of hardware that let you build custom digital circuits from scratch, ideal if you fancy high-performance or real-time computing. They’re the Swiss army knives of the digital world, popping up everywhere from hulking data centres to tiny edge devices. Why? Because they can handle tricky jobs like cryptography and AI inference at speeds and energy levels that would make your average CPU blush. That’s the focus of our investigative article this week, which will also set the stage for an academic paper. The paper introduces MERINDA (Model Recovery in Dynamic Architecture), an FPGA-powered framework for real-time model recovery in dynamic systems, think medical wearables or autonomous vehicles. Impressively, it slashes energy use by 114 times compared to GPU setups, so your battery-powered gadgets can keep learning without needing a nap every five minutes.

But first, let’s see what treasures the web has delivered to us this week!

News from around the web!

Hardware-Accelerated Artificial Intelligence

The world of computing is changing fast, mostly because everyone wants more speed and less waiting around. Enter FPGAs: once the preserve of hardware enthusiasts, now the backbone of modern AI infrastructure. As AI pops up everywhere from your phone to your fridge, the need for hardware that can adapt is more important than ever. Unlike your standard CPU, which plods along fixed paths and does its best, FPGAs offer something called deterministic latency. In plain English, that means they’re reliably on time, crucial if you’re running real-time AI and can’t afford a hiccup. At their heart, FPGAs are all about programmable logic, letting you build any circuit you fancy. This low-level control means you can create custom hardware for things like cryptography or random number generation, and do it faster and more efficiently than a regular processor. All these theoretical perks are now being put to work in a world where big companies are snapping up the competition left, right and centre.

The push for FPGAs is coming from a market where a handful of big players call the shots, shaping what’s available from fancy data centres to tiny edge devices. Companies like Oracle keep banging on about distributed inference as the future, which is a posh way of saying ‘let’s do the clever stuff closer to where the data lives’. Building adaptive silicon takes years of tinkering and a lot of brainpower. No surprise, then, that the old guard is still firmly in charge of how fast things move.

The FPGA world has seen its fair share of corporate chess moves. Intel snapped up Altera, hoping for a boost, but Altera seems to have lost a bit of its mojo inside the Intel machine. Meanwhile, AMD’s acquisition of Xilinx has been a roaring success. Xilinx was already the gold standard for adaptive logic, think of it as the GPU of the FPGA world. By bringing Xilinx into the fold, AMD has built a pretty solid moat, mixing Xilinx logic into CPUs and rolling out AI-ready PCs. All these boardroom manoeuvres set the scene for engineers who now have to get to grips with ever more complicated systems.

Switching from software-based AI to custom hardware acceleration usually starts in the hallowed halls of academia or at some high-stakes competition, perhaps one run by a defence research organisation, if you’re feeling fancy. These projects, like speeding up neural networks on RISC-V chips, are a crash course in hardware-software teamwork. When you’re working professionally, building AI-on-FPGA systems is all about delivering the goods, on time and up to scratch. Moving from theory to real hardware means wrestling with the gritty details of manual integration and data wrangling. To really master these tools, you’ll need to rethink the maths behind machine learning from the ground up.

Before you can get a neural network running on an FPGA, you have to face the cold, hard truth: hardware has limits, and it’s not shy about reminding you. The usual software approach assumes you’ve got endless memory and processing power, but on a chip, you’re working with what you’ve got. Success means squeezing your neural network into the available logic gates and memory blocks. At the heart of it all is the Multiply-Accumulate (MAC) operation, which sounds simple, but it’s a real resource hog when you use floating-point maths, as software likes to do.

Floating-point units are greedy little things in hardware, gobbling up silicon just to keep track of fractions and big numbers. The fix? Quantisation. That’s where you swap out fractions for good old integers, silicon’s favourite language. By sticking to integers, you cut down on resource use without sacrificing too much accuracy. Getting quantisation right is key, and it’s what connects your fancy model design to the specialised software you’ll need for hardware synthesis.

High-level synthesis tools are the unsung heroes that turn your Python-based AI models into something an FPGA can actually understand. Without them, you’d be stuck translating neural networks by hand, which is not my idea of a good time. In the Xilinx world, FINN and Brevitas do the heavy lifting. Brevitas handles quantisation-aware training, making sure your model is ready for integer logic before it ever sets foot on the hardware. FINN acts as your translator, with a handy Python API that turns quantised models into neat hardware layers. This setup helps you keep your classifiers under control and lays the groundwork for fancier, less conventional AI projects. Still, even with all these clever tools, you’ll often find yourself knee-deep in the world of System on Chip architectures, where automation meets the stubborn realities of actual hardware.

The real headaches start when you’re trying to get models running on hardware that isn’t exactly standard. Here, you’re in charge of the whole subsystem, usually wrangling SoC platforms where ARM processors have to play nicely with programmable logic. That means writing custom firmware and making sure data flows smoothly from one bit to another. The main battle? Getting the FINN-generated model interface to talk to the co-processor that handles memory sometimes feels like refereeing a family argument.

To resolve these compatibility issues, a systems architect must develop custom logic blocks to bridge the interface gap and implement communication protocols such as UART to facilitate interaction with the human user. This process of manual block development and firmware writing ensures that the system is robust enough to operate outside the pre-configured guardrails of automated frameworks. By taking this "hard way," the engineer gains complete control over the data flow, ultimately allowing for a precise and authoritative evaluation of the project’s performance metrics in a real-world environment.

When it comes to embedded systems, you have to juggle speed, power efficiency, and data security. High-end GPUs are great for raw power, but they’re not much use at the edge, they guzzle energy and want everything sent back to HQ. FPGAs, on the other hand, are perfect for production lines and privacy-focused IoT gadgets where you need instant results. By keeping AI close to where the data lives (edge computing, if you want to sound fancy), you get full control and sidestep the delays and risks of sending everything to the cloud.

The numbers speak for themselves: a custom FPGA at the edge can hit 650,000 frames per second and shift 510 megabytes every second, all while using just 20% of its hardware muscle. And it does this, sipping a mere 0.5 to 2 watts, try getting that from your average processor. FPGAs really are the Swiss army knives of digital computing. Sure, building ‘shady’ or non-standard AI setups is still a challenge (the tools aren’t quite there yet), but custom hardware acceleration is where the next wave of AI is heading.

Summary

MERINDA (Model Recovery (MR) in Dynamic Architecture) is an FPGA-accelerated framework developed to advance Physical AI at the edge. It replaces computationally intensive Neural Ordinary Differential Equation (ODE) components with highly parallelised Gated Recurrent Unit layers. MERINDA achieves a 114-fold reduction in energy consumption and a 28-fold decrease in memory footprint compared to GPU implementations, while preserving model recovery accuracy.

"This paper presents MERINDA... an FPGA-accelerated framework specifically designed to enable physical AI at the edge."

Background

Physical AI is transforming autonomous systems by enabling real-time interpretation of physical dynamics on resource-constrained devices. This capability is essential for mission-critical autonomous systems (MCAS), including automated insulin delivery and robotic platforms, which require high efficiency in both energy and latency. These systems often necessitate decision-making within milliseconds, rendering traditional cloud-based AI solutions unsuitable for edge deployments.

State-of-the-art model recovery methods, such as EMILY and PINN+SR, depend on Neural Ordinary Differential Equations (NODEs) that require iterative solvers. These solvers present a significant bottleneck, consuming substantial energy and resisting hardware acceleration due to their sequential processing. Although FPGAs provide a promising reconfigurable hardware platform, most existing ODE accelerators are incompatible with learning-based approaches where system dynamics evolve during training. MERINDA overcomes these limitations by substituting iterative ODEs with parallelizable neural flow architectures.

"MR... extracts governing equations from sensor data, is critical for safe and explainable monitoring in mission-critical autonomous systems... operating under severe time, compute, and power constraints."

Use-case

A principal application of MERINDA is Automated Insulin Delivery (AID), where wearable pumps must continuously adapt to personalised glucose-insulin dynamics in individuals with Type 1 Diabetes. These devices require model updates every five minutes to accommodate physiological changes, meals, and physical activity, making energy efficiency critical. Conventional GPU-based solutions would exceed the energy capacity of a typical smartwatch battery, making on-device learning infeasible. MERINDA reduces energy consumption per update to 261.79 J, enabling approximately fifteen training cycles per battery charge.


The framework is also applicable to other MCAS that operate under stringent constraints, including autonomous vehicles and robotic monitoring systems. Beyond physical systems, the authors propose that these optimisation principles may be generalised to enhance Large Language Models (LLMs). Implementing hardware-aware transformations in LLMs could substantially decrease the energy consumption and latency of on-device language AI, facilitating responsive and low-carbon language models suitable for edge hardware.

"A GPU-based solution... would require more than seven times the energy stored in a typical smartwatch battery... In contrast, MERINDA... [enables] approximately fifteen full training cycles per battery charge."

Conclusion

MERINDA demonstrates that hardware-algorithm co-design can achieve significant efficiency improvements by replacing iterative solvers with hardware-compatible, parallelizable alternatives. The authors suggest that these principles may be extended to domains such as LLMs, where replacing sequential decoding with speculative or parallel schemes could substantially reduce power consumption. The study concludes that the strategic selection of tasks and hardware, guided by latency, power, and accuracy requirements, is essential for adaptive AI deployment. This research establishes a foundation for efficient, physics-informed learning in real-time, resource-constrained autonomous applications.

"Strategic selection of task and hardware... enables adaptive and efficient deployment of AI in resource-constrained, real-time applications."

The report can be found here.

Keep Reading