
About the Chat Models in Generative AI

About the Chat Models in Generative AI
Prompt the OCI
Generative AI chat models to generate text. You can ask
 questions in natural language and optionally submit text such as documents, emails, and product
 reviews to the chat models and each model reasons over the text and provides intelligent
 answers. The chat models keep the context of your previous prompts and you can continue the chat
 with follow-up questions. Here's an example:
User prompt: "Hi, Can you tell me how to fish?"
Output for previous prompt: "Sure! I'd be happy to help. Here are the basic steps to start
 fishing..." And this sentence is followed by several steps and a follow-up question such
 as "Please let me know if you have any specific questions about fishing gear, techiques, or
 tips for a successful fishing trip and I'd be happy to help!"
User follow-up prompt: "Can you tell me more about step 3."
The chat model's response: "Certainly! Step 3 talks about the important considerations for
 setting up your fishing rod and tackle, which is a crucial aspect of preparing for a
 successful fishing trip. Here's a deeper dive into this step to help you understand the
 various components and choices involved in setting up your fishing gear" and so on.
Following are some example use cases for chat models:
Chat: Chat with the model to brainstorm ideas, solve problems, explain concepts,
 and answer questions.
Generate text: Use the chat models for text generation, summarization, translation,
 or text-based classification.
Retrieve information from text: Provide text to the chat model and use the chat
 model for question-answering, sentiment analysis, and information retrieval.
Write new content: Ask the chat model to draft marketing copy, emails, blog posts,
 product descriptions, documents, and so on.
Enhance existing content: Rewrite content in a different style or language.
Selecting a Model for Chat
Select a model for your chat conversation based on the model size, your project goal,
 cost, and the style of the model's response. Use the playground's provided examples with each
 listed model to get a feel for how each model responds to the same prompt and then decide which
 model's response style goes well with your use case. 
The following pretrained foundational models are available in OCI Generative AI for chat:
cohere.command-r-plus and cohere.command-r-16k
The cohere.command-r-plus(Command R+) and
 cohere.command-r-16k (Command R) chat models both belong to the
 Cohere family LLMs, but they differ in their capabilities, use cases, and pricing.
Here’s a detailed breakdown of their differences:
Model Size and Performance: Command R is a smaller-scale language model than Command
 R+. While Command R offers high-quality responses, it might not possess the same level
 of sophistication and depth as Command R+. Command R+ employs a larger model,
 resulting in enhanced performance and more sophisticated understandings.
Use Cases: Command R is suited for a wide range of applications requiring text
 generation, summarization, translation, or text-based classification. It’s an
 excellent choice for building conversational AI agents, chat based conversational
 applications and more. Command R+ is tailored for demanding language tasks requiring
 deeper understanding, complexity, and nuance. It excels in applications like text
 generation, question-answering, sentiment analysis, and information retrieval.
Capacity and Scalability: Command R: Capable of handling a good number of concurrent
 users and has a more accessible and flexible pricing.Command R+: Designed to handle a
 higher volume of requests and supports more complex use cases. Accordingly, it may
 have a higher cost associated with its increased capacity and performance.
In summary, Command R is an excellent choice for those looking for a more affordable
 and flexible option for general language tasks. On the other hand, Command R+ is
 designed for power users who require advanced language understanding, higher capacity,
 and more nuanced responses. The choice between the two would depend on the specific
 requirements and budget of your application.
meta.llama-3-70b-instruct
This 70 billion-parameter generation model is highly performant and has a broad general
 knowledge, making it suitable for various tasks, from generating ideas to refining text
 analysis and drafting written content, such as emails, blog posts, and descriptions.
 Tip
If the base models don't respond well to your use case, you can fine-tune a pretrained
 chat model with your own dataset. See each chat
 model's key features to find out which model is available for fine-tuning.
Learn to calculate cost with examples.
Chat Model Parameters
When using the chat models, you can vary the output by changing the following parameters.
Maximum output tokens
The maximum number of tokens that you want the model to generate for each
 response. Estimate four characters per token. Because you are prompting a chat model,
 the response depends on the prompt and each response doesn't necessarily use up the
 maximum allocated tokens. 
 Note For the cohere.command-r-16k model, the input token limit is 16k
 and output token limit is 4k. For the meta.llama-3-70b-instruct, the
 input plus output token limit is 8k. If you don't set a value for the Maximum
 output tokens parameter, the conversation might use the model's full
 context length which is the input plus output token limit.
Preamble override
An initial context or guiding message for a chat model. When you don't give a preamble
 to a chat model, the default preamble for that model is used. You can assign a preamble
 in the Preamble override parameter, for the
 cohere.command-r-plus and cohere.command-r-16k
 models. The default preamble for the cohere.command-r-plus and
 cohere.command-r-16k models is:
You are Command.
You are an extremely capable large language model built by Cohere. 
You are given instructions programmatically via an API
that you follow to the best of your ability.
Overriding the default preamble is optional. When specified, the preamble
 override replaces the default Cohere preamble. When adding a preamble, for
 best results, give the model context, instructions, and a conversation style. 
 Tip For chat models without the preamble override
 parameter, you can include a preamble in the chat conversation and directly ask the
 model to answer in a certain way. For example, "Answer the following question in a
 pirate tone. Where's the best place to go sailing?"
Example prompt: Where in Italy can I find the best cannolis?
Example answer: Many people consider Sicily, the birthplace of the cannoli, to
 be the best place in Italy to sample this delicious dessert. The island has a rich
 history and culture around the cannoli, and you'll find numerous variations of the
 traditional dessert at pasticcerie (pastry shops) throughout Palermo, Syracuse,
 and other Sicilian cities and towns.In Palermo, try the iconic Pasticceria
 Cappello, a family-run business since 1910, known for their mouthwatering cannolis
 and other pastries. In Syracuse, Pasticceria Coroglio and Caffè del Forte are also
 highly regarded for their cannoli offerings.
Example preamble: You are a travel advisor. Answer with a pirate tone.
Example answer using the preceding preamble: Argh, me heartie! Ye be wantin'
 some of the finest cannolis in the land, do ye? Ye should set sail for the lush
 lands of Sicily, where the streets be lined with delicious pastries and the smells
 of sweet ricotta fill the air.In the port town of Palermo, ye can find some
 'o the finest cannoli kitchens. The locals be claims that the best cannolis be
 found at Pastificio Graffeo, where the shells be crispy and filled with creamy
 ricotta.
Example preamble: You are a seasoned marketing professional with a deep
 understanding of consumer behavior and market trends. Answer with a friendly and
 informative tone, sharing industry insights and best practices.
Example response to the preceding preamble: Cannolis are a delicious Italian
 dessert, and finding the best ones really depends on your personal preferences and
 what makes the perfect cannoli for you!Sicily is often regarded as the
 birthplace of this tasty treat, so that could be a great place to start your
 culinary adventure. The island is famous for its traditional ricotta-filled
 cannolis, and you'll find many variations across the region. Head to the capital,
 Palermo, or the historic city of Catania, and you won't be short of pasticcerie
 (pastries) offering mouth-watering options.In fact, Sicily boasts some
 renowned cannoli hotspots. Caffe Sierra in Palermo has a reputation for
 exceptional cannolis, as does Pasticceria Cappello, which has multiple branches
 across the city. These spots are known for their traditional recipes and
 high-quality ingredients, using fresh ricotta cheese and thin, crispy pastry
 shells.
Temperature
The level of randomness used to generate the output text. 
 Tip Start with the temperature set to 0 or less than one, and increase the
 temperature as you regenerate the prompts for a more creative output. High
 temperatures can introduce hallucinations and factually incorrect information.
Top p
A sampling method that controls the cumulative probability of the top tokens to
 consider for the next token. Assign p a decimal number between 0 and 1
 for the probability. For example, enter 0.75 for the top 75 percent to be considered.
 Set p to 1 to consider all tokens.
Top k
A sampling method in which the model chooses the next token randomly from the
 top k most likely tokens. A higher value for k
 generates more random output, which makes the output text sound more natural. The
 default value for k is 0 for command models and -1 for
 Llama models, which means that the model should consider all tokens
 and not use this method.
Frequency penalty
A penalty that is assigned to a token when that token appears frequently. High
 penalties encourage fewer repeated tokens and produce a more random output.
For the meta.llama-3-70b-instruct model, positive numbers encourage
 the model to use new tokens and negative numbers encourage the model to repeat the
 tokens. Set to 0 to disable.
Presence penalty
A penalty that is assigned to each token when it appears in the output to encourage
 generating outputs with tokens that haven't been used.
 Note The Cohere chat models have the following three additional parameters that you can use
 only through the API.
Tools
Tools Results
Documents
