In the realm of artificial intelligence, the deployment of Large Language Models (LLM) and Generative AI (GenAI) applications on personal desktop environments marks a transformative shift towards democratizing AI technology. By harnessing the power of NVIDIA’s GPU cards and TensorRT-LLM open source library, GenAI applications, traditionally confined to cloud-based platforms are now being tailored for local desktop deployment. In this post, the detailed steps are explained for deploying and inferencing Llama 2 13B model on a Windows desktop installed with a NVIDIA GTX 4090 GPU card.
Read the full story on Medium: link
Every individual is afforded the opportunity to access these technologies, possesses the skills to control them, and holds the right to decide where and how their generative models are stored and utilized, particularly concerning their personal data.
from the article
Happy reading!