Transformers have enabled Gen AI, which leverages transformer models to generate new data, such as text, images, or even music, based on learned patterns. The ability of transformers to understand and generate complex data has made them the backbone of AI applications such as ChatGPT. These models demand incredible processing power and massive data manipulation. While the cloud offers all of these capabilities, it is not the ideal place to run these technologies. On of the reasons for this is latency. Applications like autonomous driving, real-time translation, and voice assistants require ins