Techobras

The new DarkBert AI was trained using dark web data from hackers and cybercriminals

Security

Following the success of OpenAI's ChatGPT, Microsoft's Bing Chat, and Google Bard, researchers have created new AI models with a darker twist.

While the large-scale language models (LLMs) that drive ChatGPT and Google Bard were trained on data from the open web, DarkBERT was trained solely on data from the dark web. Yes, this new AI model was trained using data from hackers, cybercriminals, and other fraudsters.

A team of Korean researchers published a paper (PDF) detailing how they created DarkBERT using data from the Tor network, which is commonly used to access the Dark Web. By crawling the Dark Web and filtering the raw data, they were able to create a Dark Web database.

Remarkably, DarkBERT has already successfully outperformed other large-scale language models, despite being trained on data from very unlikely places.

DarkBERT is a new AI model, but it is actually based on the RoBERTa architecture, an AI approach developed by Facebook researchers in 2019, according to our sister site Tom's Hardware.

In a research paper detailing the inner workings of RoBERTa, Meta AI is an improvement on Google's 2018 release of BERT (Bidirectional Encoder Representations from Transformers), "a natural language processing ( It describes itself as a "robustly optimized method for pre-training NLP systems". Because the search giant open-sourced BERT, Facebook researchers were able to improve its performance in a replication study.

Thanks to Facebook's optimized methodology, Facebook released RoBERTa, which was able to produce state-of-the-art results on the General Language Understanding Evaluation (GLUE) NLP benchmark.

Now, however, the Korean researchers behind DarkBERT have shown that RoBERTa can do even more, as they were under-trained when it was first released. Researchers were able to create DarkBERT by feeding RoBERTa data from the dark web over two datasets (one raw, one preprocessed) for almost 16 days.

Fortunately, the researchers have no plans to release DarkBERT to the public. However, according to Dexerto, they are accepting requests for academic purposes. Still, DarkBERT will provide law enforcement and researchers with a deeper understanding of the dark web as a whole.

As with any software or online service, caution should be exercised when using AI chatbots, as they can be infected with malware from the fake ChatGPT app or leak sensitive data, as Samsung employees recently did.

For this reason, when using these popular AI chatbots, you should make sure you are actually going to the correct website: if you are looking for ChatGPT, Bing Chat, or Google Bard apps, OpenAI, Microsoft, and Google have not yet released official apps for their AI chatbots, so you won't find them yet.

Similarly, you should not click on links in suspicious emails that direct you to AI chatbots or offer immediate access. Scammers are well aware of the current AI chatbot craze and are using it for attacks right now. At the same time, ads about AI chatbots should also be avoided, as cybercriminals often exploit Google ads and other advertising services to direct unsuspecting users to phishing sites.

When trying AI chatbots, using one of the best antivirus software on a PC, the best Mac antivirus software on a Mac, and the best Android antivirus app on a smartphone for additional protection is for additional protection. That way, even if the link to the AI chatbot leads to malware, the antivirus software will catch the malware before your device is infected.

DarkBERT may represent the future of AI models that are trained and more specialized in certain areas. Given its popularity to date, it would not be surprising to see similar AI models developed in this manner in the future.