
About the Generation Models in Generative AI

About the Generation Models in Generative AI
Prompt the OCI
Generative AI generation models to generate text. 
 Important The text generation feature will be removed from the OCI
Generative AI playground, API, and CLI when the
 cohere.command v15.6 and cohere.command-light v15.6 models
 are retired. Instead, you can use the chat models. For retirement dates see Retiring the Models.
You can ask questions in natural language and optionally submit text such as documents,
 emails, and product reviews to the generation models and each model reasons over the text and
 provides intelligent answers.
Prompt style: Write an email to Susan thanking her for…
Output style for previous prompt: Dear Susan, Thanks for…
 Tip
Unlike the chat models, the text generation models don't keep the context of previous
 prompts. For follow-up questions in the generation models, you can include the previous
 responses in the next prompt.
Following are some example use cases for text generation models:
Copy generation: Draft marketing copy, emails, blog posts, product descriptions,
 documents, and so on.
Ask questions: Ask the models to explain concepts, brainstorm ideas, solve
 problems, and answer questions on information that the models have been trained on.
Stylistic conversion: Edit your text or rewrite content in a different style or
 language.
Selecting a Generation Model
Select a model to generate text based on the model size, your project goal, cost, and
 the model's response. Use the playground's provided examples with each listed model to get a
 feel for how each model responds to the same prompt and then decide which model's response style
 goes well with your use case.
cohere.command
A highly performant generation model with 50 billion parameters and a great general
 knowledge of the world. Use this model from brainstorming to optimizing for accuracy
 such as text extraction and sentiment analysis, and for complex instructions to draft
 your marketing copies, emails, blog posts, and product descriptions, and then review and
 use them.
cohere.command-light
A quick and light generation model. Use this model for tasks that require a basic
 knowledge of the world and simple instructions, when speed and cost is important. For
 best results, you must give the model clear instructions. The more specific your prompt,
 the better this model performs. For example, instead of the prompt, "What is the
 following tone?", write, "What is the tone of this product review? Answer with
 either the word positive or negative.". 
meta.llama-2-70b-chat
This 70 billion parameter model was trained on a dataset of 1.2 trillion tokens, that
 includes texts from the internet, books, and other sources. Use this model for text
 generation, language translation, summarization, question answering based on the content
 of a given text or topic, and content generation such as articles, blog posts, and
 social media updates.
 Tip
If the generation models don't respond well to your use case, you can fine-tune a
 pretrained generation models with your own dataset. See each generation model's key features to find out
 which model is available for fine-tuning.
Learn to calculate cost with examples.
Generation Model Parameters
When using the generate models, you can vary the output by changing the following
 parameters.
Maximum output tokens
The maximum number of tokens that you want the the model to generate for each response.
 Estimate four characters per token.
Temperature
The level of randomness used to generate the output text. 
 Tip Start with the temperature set to 0 or less than one, and increase the
 temperature as you regenerate the prompts for a more creative output. High
 temperatures can introduce hallucinations and factually incorrect information.
Top k
A sampling method in which the model chooses the next token randomly from the
 top k most likely tokens. A higher value for k
 generates more random output, which makes the output text sound more natural. The
 default value for k is 0 for command models and -1 for
 Llama models, which means that the models should consider all tokens
 and not use this method.
Top p
A sampling method that controls the cumulative probability of the top tokens to
 consider for the next token. Assign p a decimal number between 0 and 1
 for the probability. For example, enter 0.75 for the top 75 percent to be considered.
 Set p to 1 to consider all tokens.
Stop sequences
A sequence of characters—such as a word, a phrase, a newline (\n), or
 a period—that tells the model when to stop the generated output. If you have more than
 one stop sequence, then the model stops when it reaches any of those sequences.
Frequency penalty
A penalty that is assigned to a token when that token appears frequently. High
 penalties encourage fewer repeated tokens and produce a more random output.
Presence penalty
A penalty that is assigned to each token when it appears in the output to encourage
 generating outputs with tokens that haven't been used.
Show likelihoods
Every time a new token is to be generated, a number between -15 and 0 is assigned to
 all tokens, where tokens with higher numbers are more likely to follow the current
 token. For example, it's more likely that the word favorite is followed by the
 word food or book rather than the word zebra. This parameter is
 available only for the cohere models.
