.. _installation_neuron: Installation with Neuron ======================== vLLM 0.3.3 onwards supports model inferencing and serving on AWS Trainium/Inferentia with Neuron SDK. At the moment Paged Attention is not supported in Neuron SDK, but naive continuous batching is supported in transformers-neuronx. Data types currently supported in Neuron SDK are FP16 and BF16. Requirements ------------ * OS: Linux * Python: 3.8 -- 3.11 * Accelerator: NeuronCore_v2 (in trn1/inf2 instances) * Pytorch 2.0.1/2.1.1 * AWS Neuron SDK 2.16/2.17 (Verified on python 3.8) Installation steps: - :ref:`Build from source ` - :ref:`Step 0. Launch Trn1/Inf2 instances ` - :ref:`Step 1. Install drivers and tools ` - :ref:`Step 2. Install transformers-neuronx and its dependencies ` - :ref:`Step 3. Install vLLM from source ` .. _build_from_source_neuron: Build from source ----------------- Following instructions are applicable to Neuron SDK 2.16 and beyond. .. _launch_instances: Step 0. Launch Trn1/Inf2 instances ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Here are the steps to launch trn1/inf2 instances, in order to install `PyTorch Neuron ("torch-neuronx") Setup on Ubuntu 22.04 LTS `_. - Please follow the instructions at `launch an Amazon EC2 Instance `_ to launch an instance. When choosing the instance type at the EC2 console, please make sure to select the correct instance type. - To get more information about instances sizes and pricing see: `Trn1 web page `_, `Inf2 web page `_ - Select Ubuntu Server 22.04 TLS AMI - When launching a Trn1/Inf2, please adjust your primary EBS volume size to a minimum of 512GB. - After launching the instance, follow the instructions in `Connect to your instance `_ to connect to the instance .. _install_drivers: Step 1. Install drivers and tools ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The installation of drivers and tools wouldn't be necessary, if `Deep Learning AMI Neuron `_ is installed. In case the drivers and tools are not installed on the operating system, follow the steps below: .. code-block:: console # Configure Linux for Neuron repository updates . /etc/os-release sudo tee /etc/apt/sources.list.d/neuron.list > /dev/null <`_ will be the backend to support inference on trn1/inf2 instances. Follow the steps below to install transformer-neuronx package and its dependencies. .. code-block:: console # Install Python venv sudo apt-get install -y python3.10-venv g++ # Create Python venv python3.10 -m venv aws_neuron_venv_pytorch # Activate Python venv source aws_neuron_venv_pytorch/bin/activate # Install Jupyter notebook kernel pip install ipykernel python3.10 -m ipykernel install --user --name aws_neuron_venv_pytorch --display-name "Python (torch-neuronx)" pip install jupyter notebook pip install environment_kernels # Set pip repository pointing to the Neuron repository python -m pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com # Install wget, awscli python -m pip install wget python -m pip install awscli # Update Neuron Compiler and Framework python -m pip install --upgrade neuronx-cc==2.* --pre torch-neuronx==2.1.* torchvision transformers-neuronx .. _install_vllm: Step 3. Install vLLM from source ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Once neuronx-cc and transformers-neuronx packages are installed, we will be able to install vllm as follows: .. code-block:: console $ git clone https://github.com/vllm-project/vllm.git $ cd vllm $ pip install -U -r requirements-neuron.txt $ VLLM_TARGET_DEVICE="neuron" pip install . If neuron packages are detected correctly in the installation process, ``vllm-0.3.0+neuron212`` will be installed.