In partnership with

Dear Sentinels

Welcome to the new year! This year, we are going to have surprises for you! But not yet... 😉

This year, we're kicking it off with an investigative report on "Vibe Coding". Vibe is a derogatory term that was intended to describe AI code assistants, but instead, the term got stuck in the collective consciousness! Now it looks like we are on the verge of an AI git-festation, where every other line of code is riddled with credential-stuffing attacks, buffer overflows, or whatever else. Luckily, there is more to AI takeover than would at first glance be seen to be the case. For instance, FlowViz categorises the MITRE ATT&CK framework.

In the academic article, we examine DistilBERT, a smaller, more agile version of the BERT model. It 40% smaller than the LLM it is based on, but maintains 97% of its accuracy. But first, it is off on your tour around the web!

News from around the web

The Future of Shopping? AI + Actual Humans.

AI has changed how consumers shop by speeding up research. But one thing hasn’t changed: shoppers still trust people more than AI.

Levanta’s new Affiliate 3.0 Consumer Report reveals a major shift in how shoppers blend AI tools with human influence. Consumers use AI to explore options, but when it comes time to buy, they still turn to creators, communities, and real experiences to validate their decisions.

The data shows:

  • Only 10% of shoppers buy through AI-recommended links

  • 87% discover products through creators, blogs, or communities they trust

  • Human sources like reviews and creators rank higher in trust than AI recommendations

The most effective brands are combining AI discovery with authentic human influence to drive measurable conversions.

Affiliate marketing isn’t being replaced by AI, it’s being amplified by it.

The Alluring Simplicity of 'Vibe Coding'

In the evolving technological landscape, the recent explosion in AI-assisted development demands a new understanding! At the forefront of this shift is an impressive and seductive new style of programming dubbed "Vibe Coding." This approach is characterised by a relaxed, conversational interaction where a developer engages with a Large Language Model (LLM) to generate a functional application (barely functional or not at all, depending on who you ask). This approach, which OpenAI founder Andrej Karpathy described as a process where "I just see stuff, say stuff, run stuff, and copy and paste stuff," can make a working system materialise from a simple chat. This conversational model dramatically lowers the barrier to entry, creating the powerful illusion that anyone can now become a developer. However, this alluring simplicity conceals profound risks, transforming a tool of creation into a potential vector for systemic insecurity and automated malice.

The Unseen Costs of Conversational Code

The generative AI revolution has been celebrated for its speed, but this velocity is a strategic trap. Beneath the surface of conversational code lies a series of liabilities. This exploration delves into the unseen costs of conversational code, from the weaponisation of its core principles by malicious actors to the fundamental insecurity of the code it produces and the long-term degradation of the AI models themselves.

The Automation of Malice: The Rise of AI Hackbots

The same principles that make Vibe Coding accessible to aspiring developers also make it a powerful force multiplier for malicious actors. This has given rise to the AI hackbot: an autonomous agent, or a swarm of agents, designed to hack applications and infrastructure on a previously unimaginable scale. These bots operate through a layered, agentic structure where an "overseer" agent coordinates specialised sub-agents. These sub-agents are tasked with running known security tools, parsing the output, and chaining together attacks in a fully automated pipeline.

This model allows attackers to easily circumvent the security controls built into commercial LLMs. A direct request to "make me ransomware" will be refused. However, an attacker can deconstruct the malicious goal into a series of seemingly benign prompts. They can first ask the AI to write an application that encrypts a single file. Once complete, they can follow up with a request to apply that function to every file on a hard drive. The resulting product is functionally identical to ransomware, yet no single prompt violated the AI's safety protocols. This represents a monumental shift not in the novelty of cyberattacks, but in their scale and accessibility. The barrier to entry for sophisticated, automated attacks has effectively collapsed, creating a threat landscape where the very quality of the code we deploy becomes our primary line of defence.

A Foundation of Flaws

While the accessibility of AI code generation is a concern, the fundamental quality of the code itself presents an even greater one. According to a comprehensive study by application security firm Veracode, while the syntactic correctness of AI-generated code has improved dramatically, its security posture has dangerously stagnated. Over 90% of code generated by modern LLMs compiles without error, a massive leap from less than 20% just a year prior. Yet, only 55% of that same code passes security scans.

Perhaps more surprisingly, the research reveals that the size of an LLM does not correlate with better security outcomes. Small models with fewer than 20 billion parameters perform just as well (or as poorly) as massive models with over 100 billion. The study also highlighted a severe underperformance in the generation of secure Java code, which achieved a dismal 29% security pass rate. Researchers theorise this is because LLMs are trained on a vast legacy of publicly available code, much of which was created before modern security practices were common, effectively baking old vulnerabilities into new applications. This reliance on flawed training data points to an even more insidious long-term risk: the systemic degradation of the AI models themselves.

The Compounding Dangers of Model Collapse

The long-term reliability of AI systems is threatened by a phenomenon known as "Model Collapse," where training an AI on AI-generated content causes mistakes to compound over time. A now-famous example occurred years ago when users discovered that feeding garbage text into Google Translate could produce bizarre, religious doomsday prophecies. The reason was not a sentient AI foretelling the apocalypse, but a flaw in its training data. For many smaller, less common languages, the Bible was the most frequently translated text available. When the AI became confused by the nonsensical input, it defaulted to the prophetic language that dominated its limited training set for those language pairs.

AI image generators more visibly demonstrate this principle. When trained on a dataset polluted with other AI-generated images, they began producing humans with too many fingers, amplifying an initial flaw with each successive generation. The same degradation occurs in text and code, even if it is less immediately apparent. As more AI-generated content floods the internet, future models will be trained on this synthetic, and often flawed, data. This creates a feedback loop that threatens the core reliability of the very systems we are becoming increasingly dependent on, a risk that exposes the profound irresponsibility of the simplistic 'Vibe' driving this new wave of development.

Decoding the Vibe

When Andrej Karpathy, a founder of OpenAI, coined the term "Vibe Coding," it was widely interpreted as an endorsement of a new, simplified programming paradigm. However, a closer reading of his original statement suggests it was likely intended as a sarcastic or cautionary observation. He described it as a "naive toy town approach to programming" and critiqued the dangerous idea that one can simply "forget that the code even exists." His concluding remark, "it's not really coding," should be read not as a celebration, but as a warning. It highlights the peril of abandoning the precision, structure, and deep problem solution that are the hallmarks of true software engineering.

AI as Augmentation with FlowViz

The strategic error of the current AI discourse is the obsession with replacement. The true paradigm shift lies in augmentation, forging tools that amplify, rather than supplant, human expertise. This is not a theoretical ideal; it is an operational reality, perfectly exemplified by the cyber threat intelligence tool FlowViz. It is an open-source tool that not only visualises attack flows but also integrates directly into the CTI ecosystem by exporting to the industry-standard STIX 2.1 format.

From Static Lists to Dynamic Stories

FlowViz addresses a core problem in threat intelligence: the static, non-temporal nature of conventional threat modelling. Frameworks like the MITRE ATT&CK matrix are revolutionary for cataloguing adversary techniques, but they flatten causality. They present an attack like a "list of ingredients without the recipe," showing what techniques were used but not how they were sequentially connected. The operational reality of a cyberattack is a flow, a sequence of actions with loops, branches, and critical choke points. Identifying these choke points requires hours of manual analysis, a latency that CTI teams cannot afford.

FlowViz Screenshot

FlowViz solves this by using LLMs to automatically read unstructured threat reports and visualise the flow of an attack. It extracts the causal chain of events, mapping the temporal relationships between techniques. This represents a fundamental leap from simply knowing which ingredients an attacker used to understanding the exact recipe they followed to achieve their objective.

The Human-in-the-Loop Imperative

Crucially, FlowViz is designed for human augmentation, not analyst replacement. Its operational philosophy is to serve as a force multiplier. An analyst provides a threat report via URL or raw text, and in less than 60 seconds, the tool generates what can be considered an "80% complete rough draft" of the attack flow. This draft would typically take hours to create manually. The tool's capabilities are enhanced by advanced features, such as using vision models to analyse screenshots for additional context and a "Story Mode" that transforms a complex graph into a cinematic animation. This feature allows analysts to play the attack like a movie, making it an invaluable tool for briefing executives and other non-technical stakeholders. By automating the most tedious aspects of analysis, FlowViz empowers human experts to operate at a strategic level, though this robust partnership is not without its own inherent limitations.

The Reality of Hallucination and Inference

While powerful, FlowViz is not infallible, and its limitations underscore its design as a human-in-the-loop tool. There is a non-zero risk of LLM hallucination, where the tool might invent a technique that was not present in the source report. Furthermore, it faces the challenge of "edge inference." Threat reports often imply causality rather than stating it explicitly, forcing the AI to infer the links (the "edges" in the graph) between events, an inference that is not always perfect. These limitations reinforce the necessity of expert human judgment to verify the output, confirming that the AI is a partner in the analytical process, not its replacement.

From Vibe to Vigilance

Our journey began with the seductive allure of "Vibe Coding," a paradigm that promised to democratise software development but is fraught with hidden dangers. From automating malware creation with AI hackbots to producing fundamentally insecure code and contributing to the long-term degradation of AI models through model collapse. We saw how the term's originator likely intended it as a critique, a warning against abandoning rigour for conversational ease. In stark contrast, we explored the responsible, strategically sound application of AI embodied by FlowViz, a tool that augments rather than replaces human intelligence. The future of effective AI integration, in complex fields like software engineering and cybersecurity, lies not in a vague "Vibe" but in a clear-eyed partnership. The goal is to build tools that handle mechanical labour, freeing human experts to focus on verification, creativity, and strategic decision-making.

Summary

DistilBERT is introduced as a significantly smaller and faster language representation model, reducing the size of the large BERT model by 40% while maintaining 97% of its language understanding capabilities. This efficiency is achieved by leveraging knowledge distillation during the pre-training phase, incorporating a novel triple loss function for effective knowledge transfer.

Background

The last two years have been dominated by the rise of Transfer Learning in Natural Language Processing, establishing large-scale pre-trained language models like BERT as fundamental tools across many tasks. Models such as BERT often involve several hundred million parameters, and researchers are continuously pushing toward training even larger models, based on the principle that size correlates with performance gains. This scaling trend is exemplified by models like MegatronLM, which boasts 8,300 million parameters. However, this massive scaling trend introduces serious issues, including the high environmental cost due to the exponential rise in computational requirements, as highlighted in related studies. Moreover, the high computational and memory demands restrict the ability to operate these powerful models on-device in real-time, which is essential for enabling novel language processing applications. Thus, while large models offer improved performance, their growing requirements may ultimately prevent their broad adoption.

To address these concerns, this work introduces DistilBERT, a smaller general-purpose language representation model pre-trained using knowledge distillation. The goal is to achieve comparable performance on various downstream tasks while drastically reducing the model's size, inference time, and computational training budget. Knowledge distillation is a compression technique where a compact "student" model is trained to reproduce the behaviour of a larger "teacher" model. DistilBERT achieves a 40% reduction in BERT's size and makes it 60% faster at inference time. The training process utilises a triple loss function that combines language modelling loss, distillation loss, and a cosine-distance loss to effectively transfer the inductive biases learned by the larger teacher model. This specialised pre-training approach results in a lighter, faster model that retains the flexibility required for fine-tuning across a wide range of tasks.

Use-case

"Our compressed models are small enough to run on the edge, e.g. on mobile devices."

A primary use case for DistilBERT is enabling real-time language processing applications on resource-constrained platforms, often referred to as "on-the-edge" devices. Since the model is significantly lighter and faster, it successfully addresses the issues of memory and computational limitations that larger models pose for mobile environments. The authors demonstrated this capability by conducting a proof-of-concept experiment and a comparative study focused on on-device computations. For instance, testing a mobile application for question answering on an iPhone 7 Plus revealed that DistilBERT is 71% faster than the original BERT-base model when excluding the tokenisation step. Furthermore, the distilled model has a small memory footprint, weighing only 207 MB, which could be reduced even further through quantisation techniques. This strong performance and small size make DistilBERT a compelling option for integrating sophisticated NLP capabilities directly into mobile and edge applications.

DistilBERT is designed as a general-purpose pre-trained language representation model, ensuring it remains highly flexible for fine-tuning across a variety of traditional NLP tasks, similar to its larger counterpart, BERT. Its overall performance was rigorously assessed using the General Language Understanding Evaluation (GLUE) benchmark, which comprises a collection of nine different natural language understanding datasets [13 (Experiments)]. Results showed that DistilBERT consistently performed on par with or better than the ELMo baseline and successfully retained 97% of BERT's overall macro-score performance. Beyond GLUE, the model demonstrated robust capabilities on specific downstream tasks, including achieving high test accuracy on the IMDb sentiment classification benchmark.

Conclusion

The conclusion confirms that DistilBERT successfully introduced a general-purpose pre-trained version of BERT that is 40% smaller and 60% faster, while retaining 97% of the original model's language understanding capabilities. The work validates that general-purpose language models can be effectively trained using distillation and underscores DistilBERT's viability for edge applications. While the paper did not strictly define explicit future research avenues, it noted that the current model's size (207 MB) could be further optimised through the application of orthogonal compression techniques like quantisation. The authors explicitly state that pruning and quantisation are separate lines of study, suggesting these methods could be integrated with DistilBERT in subsequent work to achieve even greater compression.

You can download the article here.

Keep Reading