The Apple community is still waiting for the release of an Apple Mac Pro based on the Apple M line of chips. So what should we expect from Apple?
Siri will need to learn a lot to face chatGPT’s level of quality in it’s answers. Apple Glasses will need to learn a lot for their object recognition. Such training is done in AI-data centers.
AI is an ever more important USPs for Apple in the mobile device in Pro computing and in Apple's as well as the data centers of customers. The current Mac Pro comes in a 17” rack edition. Apple started to note the improvement of machine learning next to the improvements of CPU performance and GPU performance. Facebook, Tesla and Google have 5, 6 and 9 ExaOperations-AI respectively planned for 2025.
Apple AI professional Hardware
Apple needs at least two Exaflop computers in house to train Apple Car and to get Siri to a level of an improved chatGPT-3. They like vertical integration. It makes sense for Apple to use their own dog-food as-in chips. On top it just makes financial sense to get second use out of the needed investment in AI-datacenters by developing their M-Series chips and their Mac lineup on the way. So how could that look.
AI is very dependent on
specific computing as seen in the neural engine cores,
low power consumption per operation and
access to a lot of RAM to hold big models.
All those points are already there in the M line of chips as seen in M1 Ultra.
Which level does Apple have to play in?
Nvidia has the DGX-A100 rack module with 3.6 Peta-opsAI.
Tesla makes rack modules out of their Tiles with 54 Peta-opsAI.
Cerebras CS-2 has 62.5 Peta-opsAI in the size of a half high rack.
The M1 Ultra Chip’s internal communication interface resembles the communication bandwidth of Tesla's D1 processors. PCIe Gen6 16x provides 128 GB/s connectivity speed. The Mac Pro already has 4 slots of the older generation. Two such slots are Apple special capability MPX Slots. The currently best Mac Pro add-on duo graphic card provides 30 Tflops32 in one of the two MPX slots.
The natural step for Apple would be to make a M2 Mac Pro, that is great for video and also rack-mountable. On top it would provide extendability for amazing AI. Let’s call it Mac Pro AI for the sake of this article.
How would a Mac Pro AI look like?
A quadruple chip M2 (sometimes called M2 extreme) optimised for AI could perform as high as 60 Tflops32 or e.g. 0.9 Peta-opsAI. It may fit nicely on a MPX card. Apple could choose to put one M2 extreme on the motherboard and have up to 5 PCIe 16x slots.
The result would be an (imaginary) Mac Pro AI with up to 5 M2 extreme MPX cards. 5.4 Peta-opsAI. This is an order of magnitude bigger than what the Mac Studio offers today and enough reason why the wait for a new Mac Pro is a bit longer than initially expected.
Apple's products would profit from the M-line advantages like low power consumption, a very large on-chip memory and low cost per Peta-opsAI. Considering the us$ 200’000 of a Nvidia DGX-A100 Apple would also start to play in the datacenter league.
What you can do?
Play with AI computing e.g. chatGPT and MidJourney.
Ask google full language questions and look at the quality of the google snippets vs. chatGPT.
Understand what augmented reality object recognition is already present in iPhone Pro and iPad Pro.
Check the 3D-video technology behind Avatar 2.
Such a Mac Pro AI would be the future video machine with lots of AI power for 3D-video creation and a great AI training machine for the desktop or on-premise small datacenters.
AI as a technology will be growing topic in the Apple user story. As new topics as well as Mac Pros are often introduced at the Apple Developer conference we may expect such a Mac Pro AI in July 2023.
- 25 D1 processors (each 22.6 Tflops32/360 TopsAI) fit in a Dojo-Tile.
- 6 Tiles (each 565 Tflops32/9 PopsAI) fit into a node/rack-modul.
- Two rack-modules (each 3.4 Pflops32/54 PopsAI) fit into a cabinet
- 10 cabinets (each 6.8 Pflops32/108 PopsAI) make an ExaOpsAI computer
- 16 AI cores (each 19.2 Tflops32) fit into one A100.
- 8 A100 (each 312 Tflops32) go into a DGX node/rack-modul.
- 5 DGX-A100 nodes (2.5 Pflops32/3.6 PopsAI) go into a cabinet.
- 50 cabinets (each 12.5 Pflops32/18 PopsAI) make an ExaOpsAI computer.
Flotingpoint operations per second "flops" (FP32); (shortcut in this text Tflops32) is about the single precision, 32 bit, performance which is most relevant for graphics work. This is the most significant value of GPUs before chips got optimised for AI work.
Operations per second "ops" (BF16/CFP8); (shortcut in this text EopsAI, TopsAI, PopsAI) is optimised for AI work using only 16 bit. TopsAI are in some systems roughly 15 times of FP32 numbers.