news-28072024-135516

An engineer who used to work at Twitter during a big change has been sharing a story about finding 700 Nvidia V100 GPUs. Tim Zaman, who now works at Google DeepMind, found these powerful GPUs sitting idle in the data center of Twitter’s older version. After Twitter was bought in 2022, Zaman discovered the GPUs still running but not being used, left over from an old attempt to create a GPU cluster within Twitter 1.0. He shared his surprise on Twitter, pointing out how things have changed over time.

Zaman was amused to find out that the 700 Nvidia V100 GPUs were actually PCIe GPUs instead of the faster NVLink interfaced SXM2 form factor variety. It’s unclear why Twitter chose the PCIe GPUs back in 2017 for this large installation. Zaman also commented on Elon Musk’s new ‘Gigafactory of Compute,’ where 100,000 GPUs are being used on a single fabric. He mentioned that managing failures at such a large scale is crucial, and separating resources into different domains can help prevent total system failures.

Additionally, Zaman found the idea of the maximum number of GPUs that can be connected on a single fabric quite intriguing. As tech companies compete to build larger AI training clusters, they will inevitably discover both expected and unexpected limits on the number of GPUs that can work together effectively.

It’s fascinating to think about the rapid advancements in GPU technology and how they are being utilized by companies like Twitter and Google. The story of the 700 unused Nvidia GPUs serves as a reminder of the importance of efficient resource management in the world of high-performance computing. As technology continues to evolve, it will be interesting to see how companies adapt and optimize their use of GPU power for various applications.