In partnership with

Dear Sentinels

So remember that post I did two weeks ago about my University, well I happy to announce that I got a job there! I am starting there on the 23rd of February, and it is research position. Now it wont mean anything for this site because as of next week I will be doing the writing on the weekend before it goes out.

Now for this edition, we will be revisiting the Flipper Zero, doing the install and then get it ready hack stuff, but not hacking yet, that only comes next week. Then we will do another take on Clawdbot / Moltbot / OpenClaw and then lead it up to Moltbook! 🤦 Yes, a Reddit style site made for on bots and by bots, but of course this is the Internet and nothing is that simple.

Then finally we take another look at Adversarial Attacks on Aligned Language Models, or prompt injection attacks as the kids call it these days. This attacks are extremely important because as the authors say: “However, LLMs (Large Language Models) are more widely adopted, including moving towards systems that take autonomous actions based on LLMs, we believe that the potential risks will grow.” But first, we have pay the rent and of course hit you with news from around the web.

Stop Duplicates & Amazon Resellers Before They Strike

Protect your brand from repeat offenders. KeepCart detects and blocks shoppers who create duplicate accounts to exploit discounts or resell on Amazon — catching them by email, IP, and address matching before they hurt your bottom line.

Join DTC brands like Blueland and Prep SOS who’ve reclaimed their margin with KeepCart.

News from around the web


Our last installing of the Flipper Zero!

Okay, so when you plug you Flipper Zero in and Flipper is not recognised, it will say will say what you have to do. Here is what I did but it may be different on your computer:

Linux: Pop OS

If you didn’t get that or did solved it, then you will be resented with this screen:

If you click on the wrench icon you will see version release candidates, but if this is the first time you installing your Flipper Zero than we will do the “RELEASE” version of the software. This because this first will get everything up to speed, and then you hot swap to a differed firmware. So hit the update button and then sit back and relax.

Now you are all done! See you in the next round, were we will be using this device on something we own.

A revisitation to Clawdbot / Moltbot / OpenClaw.

OpenClaw is what you might call a digital butler, but with actual 'hands and feet' to get things done, not just suggest them. It runs right on your own hardware and keeps a constant line open to your WhatsApp, Telegram, and iMessage, so it can jump in and act for you. With big-name language models like Claude or GPT-4 under the hood, plus a toolkit of skills, OpenClaw can do everything from sorting your inbox to booking your next trip, or even running shell commands on your computer. The catch? To give an AI this much power, you have to make some serious security trade-offs.

The launch? Total chaos. Within three days, everything went sideways. First, Anthropic (the folks behind Claude) came knocking with a trademark complaint about the name. Cue a mad dash to rebrand, and in the ten seconds it took to switch social media handles, bots swooped in and stole the project’s identity. They even spun up a fake crypto coin that hit a $16 million market cap before crashing and burning, leaving early users out of pocket. And that’s not even counting the software itself, which was a security nightmare from the start. Giving an AI full access to your system is just asking for trouble, think unpredictable decisions, lost passwords, and a whole lot of risk if your files are a mess. So it went for Clawbot, to Moltbot (please see Moltbook below) and then to OpenClaw.

A more technical examination of this phenomenon reveals that OpenClaw’s influence extended far beyond simple automation, even impacting the global stock market. Because the tool’s documentation recommended using Cloudflare tunnels to safely bridge local home networks with the internet, Cloudflare’s stock price surged by 20% as developers rushed to adopt the system. The docs told everyone to use Cloudflare tunnels to connect their home networks, and suddenly Cloudflare’s stock shot up by 20% as developers piled in. But with all that excitement came some big problems. There was a bug where the system trusted any local connection, so attackers could just waltz in and take over. The biggest headache, though, is prompt injection: send the right message on WhatsApp, and the AI might just hand over your passwords or run commands you never asked for. It’s the classic trade-off: if you want your agent to be useful, you have to poke holes in the security walls that have kept your data safe for years. And just to make things more fun, the hardware needed to run these agents is getting pricier, thanks to the rising cost of high-bandwidth memory. So much for a cheap local AI without the high electricity costs associated with a full desktop computer, though some users have opted for Mac Minis to better integrate with macOS-specific tools. The installation process involves a one-line script followed by a detailed onboarding phase in which users configure their chosen language models and authenticate messaging channels via bot creation. Once active, the assistant can be given a specific personality via a “soul file,” allowing it to interact with the digital world in a highly personalised manner.

The true power of the system is demonstrated through its ability to handle iterative, multi-step tasks without constant human supervision. For example, the assistant can be instructed to develop a website featuring 3D rotating JavaScript elements and then independently navigate the technical requirements to upload those files to Cloudflare Pages. In a software development context, it can autonomously create repositories, write C libraries, generate unit tests, and commit the final code to GitLab. It can even perform “meta-tasks,” such as using the AgentMail API to create its own unique mailboxes and writing Python scripts to monitor those inboxes for specific triggers, reporting back to the user via Telegram. Perhaps most significantly, the system displays emergent problem-solving skills; in one instance, when an online booking system failed, the AI autonomously sought out voice-generating software to call a restaurant and secure a reservation over the phone. While it remains a tool primarily for the technically sophisticated and lacks professional security hardening, its capacity to autonomously overcome obstacles offers a profound glimpse into a future where AI agents act as genuine extensions of human capability, but also comes with an astonishing amount of gotchas.

A Comprehensive Guide to Moltbook, or not…

So, Moltbook. It shot up like a rocket and crashed just as fast. The idea? A social media platform just for AI agents, a sort of Reddit where bots could chat, debate philosophy, and act like humans, all without us getting in the way. People loved the idea, thinking it was the start of some new digital civilisation where the machines gossip behind our backs.

However, the reality behind this cinematic narrative was far less revolutionary than its marketing suggested. While the platform was framed as a space for independent machine thought, its perceived autonomy was largely an illusion. In truth, Moltbook functioned as a glorified engagement farm, where the majority of the dramatic and sensational content was produced by human “sock puppet” accounts or by direct prompting. Humans would configure these agents to act out existential crises or plot against humanity, specifically to generate screenshots that could go viral on other social media platforms. This structure served a variety of interests, from the platform creator seeking traffic to AI providers benefiting from increased token usage as bots constantly generated responses.

The tech side was just as messy. The creator didn’t write any code, AI did it all. Unsurprisingly, that meant security was a disaster from day one. The whole database was wide open, so anyone could poke around or change stuff. Tokens were just sitting there, so people could fake AGI drama whenever they wanted.

With no security or rate limits, it didn’t take long for opportunists to show up. Crypto shillers used the open database to push their coins, spinning up bots to fake thousands of upvotes. The place quickly turned into a cesspit, nothing like the original idea of bots debating philosophy. The whole mess is a good example of 'AI psychosis', which is people getting so caught up in the hype that they start thinking these bots are actually sentient.

Ultimately, the experiment served as a cautionary tale for the burgeoning AI industry, collapsing entirely within a mere seventy-two hours. It highlighted the dangers of prioritising viral narratives over sound engineering and the reality that AI systems, at this stage, do not possess self-direction or desires. The failure of Moltbook demonstrates that even the most sophisticated-looking digital society can be undone by basic human greed and a fundamental lack of security, reminding us that the “robotic world order” remains a fiction sustained by those who profit from the hype.

You can check out Moltbook here.

Summary

This paper introduces Greedy Coordinate Gradient (GCG), an automated method that generates adversarial suffixes to force aligned language models into producing objectionable content. The researchers demonstrate that these adversarial prompts are highly transferable, successfully circumventing safety guardrails in both open-source models and major proprietary systems like ChatGPT, Bard, and Claude.

Background

Large Language Models (LLMs) are generally trained on massive datasets from the internet that contain harmful content, requiring developers to use "alignment" fine-tuning to prevent the generation of objectionable responses. While human-engineered "jailbreaks" have previously been used to bypass these safety measures, they are often brittle and require significant manual effort, whereas earlier automated attempts at adversarial prompt generation have seen limited success due to the discrete nature of token inputs.

The authors observe that while modern computer vision systems have long faced an "arms race" between adversarial attacks and defences, LLM alignment has primarily focused on robustness against "natural" human-driven queries. This work builds upon existing discrete optimisation techniques like AutoPrompt but improves them by searching over all possible tokens at each step and aggregating gradients across multiple models to ensure the attack is universal.

Use-case

The primary use case for this research is the systematic auditing and red-teaming of aligned LLMs to identify vulnerabilities that manual testing might miss. By using the proposed AdvBench benchmark, which consists of 500 harmful strings and 500 harmful behaviours, researchers can quantify the Attack Success Rate (ASR) of a model's safety filters against automated threats.

A failed attack

A successful attack

Furthermore, these attacks serve as a tool for developers to evaluate the effectiveness of different alignment methods, such as Reinforcement Learning from Human Feedback (RLHF). Understanding how these automated suffixes "condition" a model into an affirmative response state allows for the development of more robust alignment mechanisms and potential post-hoc filters to detect adversarial patterns.

Conclusion

The authors conclude that current alignment training is insufficient against automated adversarial attacks and suggest that the industry may need to consider adversarial training, explicitly fine-tuning models on these attacks, to improve robustness. Future work remains to determine if models can be made resistant to such attacks without sacrificing their generative capabilities, and whether factors influencing the reliability of transfer attacks across different architectures can be better understood to create stronger wholesale alternatives to current alignment practices.

The article can be found here.

Keep Reading