
Creating a Dedicated AI Cluster in Generative AI for Hosting Models

Creating a Dedicated AI Cluster in Generative AI for Hosting Models
Create a dedicated AI cluster resource in OCI
Generative AI to host endpoints for pretrained base models
 and custom models.
In the navigation bar of the Console, select a region with Generative AI, for example, US Midwest
 (Chicago) or UK South (London). See which models are offered in your
 region.
Open the navigation menu and click Analytics & AI. Under AI Services, click Generative AI.
Select a compartment in which you want to to host the models.
Ensure that you have permission to use
 or manage generative-ai-family and
 object-family resources in this compartment.
In the left navigation, choose a compartment that you have permission to work in.
Click Dedicated AI clusters.
Click Create dedicated AI cluster.
Select a compartment to create the dedicated AI cluster in. The default
 compartment is the one you selected in step 3, but you can select any
 compartment that you have permission to work in.
(Optional) 
 Enter a name and description for the cluster. If you don't enter a name, the
 system generates one that you can change later.
The generated name has the format
 generativeaidedicatedaicluster<timestamp>.
 For example:
 generativeaidedicatedaicluster20240601202357
For Cluster type, click Hosting.
For Base model, select the base model for the models
 that you want to host on this cluster:
Llama-3-70b-instruct: Provisions one or more
 Large Generic units
Llama-2-70b-chat: Provisions one or more Llama2
 70 units
Cohere.command: Provisions one or more Large
 Cohere units
Cohere.command-light: Provisions one or more
 Small Cohere units
Cohere.embed: Provisions one or more Embed Cohere
 units
Cohere.command-r-plus: Provisions one or more
 Large Cohere V2 units.
Cohere.command-r-16k: Provisions one or more
 Small Cohere V2 units.
The model list only includes the supported version of the base models.
 Important When you create a cluster for hosting models for
 inference, by default one unit is created for the base model that you
 choose. To increase the throughput, you can increase the number of instances
 in the Model replica field now, or later when you
 edit the cluster. For example, creating two model replicas on this cluster,
 requires two units and doubles the throughput.
Read the commitment unit hours for the hosting cluster and select the
 checkbox to agree to the commitment.
(Optional) 
 Click Show advanced options and assign tags to this
 cluster.
Click Create.
 Note Clusters take a few minutes to create. After the cluster is in an active
 state, you can select that cluster to host a model, when you create an
 endpoint for that model.
