PewDiePie Unveils Extensive AI Experiment Claiming It Outperforms ChatGPT

PewDiePie has recently disclosed that he took on an ambitious project to develop his own AI model, claiming that it outperformed ChatGPT in a coding benchmark. This journey, chronicled in a detailed video, showcases months of rigorous testing, setbacks, and technical challenges.

Initially embarked upon as a personal learning endeavor, PewDiePie’s project was not about creating an AI from scratch. Instead, he focused on fine-tuning an already established large language model, utilizing custom datasets and specific coding benchmarks to enhance its capabilities.

His goal was to improve the model’s performance in a coding format heavily used by artificial intelligence coding agents. Initially facing hurdles, the model recorded disappointing scores below leading competitors. However, through persistent efforts involving multiple retraining cycles and modifications to the datasets, PewDiePie noticed gradual improvements.

Training Methodology and Benchmark Insights

In his video, PewDiePie elaborated on the coding benchmarks employed to gauge the model’s efficacy. He compared its performance against renowned models, including DeepSeek, Meta’s Llama, and ChatGPT.

Starting with a mere 8% score on the benchmark, PewDiePie diligently adjusted the format, which escalated the performance to 16%.After incorporating reasoning data and additional fine-tuning efforts, he revealed that one particular iteration reached an impressive 19.6%, temporarily surpassing ChatGPT’s results during that period.

Despite these achievements, PewDiePie discovered contamination within his dataset, where training data had overlapped with benchmark questions, ultimately invalidating the results. This issue necessitated a comprehensive retraining of the model.

After focusing on a coding-specific adaptation of the base model, PewDiePie reported further success, with scores climbing to 36% after addressing the benchmark contamination and ultimately reaching 39.1% with additional post-training adjustments.

https://www.youtube.com/watch?v=aV4j5pXLP-I

Throughout the development process, PewDiePie encountered numerous technical difficulties, including software crashes, overheating, and hardware failures. At one point, a GPU malfunction halted training, and he had to grapple with power supply issues due to the demanding computational requirements.

He described his setup as extensively modified and often had to rebuild components to ensure continued training progress. Despite these hurdles, PewDiePie’s experience enriched his understanding of machine learning workflows, data preparation, and model training intricacies.

In discussing the benchmark results, PewDiePie exercised caution, highlighting that excelling in a single benchmark does not guarantee overall superiority. He indicated plans to validate the model further against additional coding benchmarks before determining its public release potential.

Moreover, he acknowledged the emergence of newer models, like Qwen 3, which have registered better scores on the same benchmark, suggesting that ongoing development would be crucial to maintain competitive advantage.

Concluding the video, PewDiePie emphasized that the primary objective of this venture was experiential learning through trial and error, leaving open the possibility of either continuing with the model’s development or pursuing new projects in the future.

Source & Images