
Concepts for Generative AI

Concepts for Generative AI
To help you underatand OCI
Generative AI, review some concepts and terms related to the
 service.
Generative AI Model
An AI model trained on large amounts of data which takes inputs that it hasn't seen before and generates new content.
Retrieval-Augmented Generation (RAG)
A program that retrieves data from given sources and augments large language model (LLM)
 responses with the given information to generate grounded responses.
Prompts and Prompt Engineering
Prompts
Strings of text in natural language used to instruct or extract information from a large
 language model. For example,
What is the summer solstice?
Write a poem about trees swaying in the breeze.
Rewrite the previous text in a lighter tone.
Prompt Engineering
The iterative process of crafting specific requests in natural language for extracting
 optimized prompts from a large language model (LLM). Based on the exact language used, the
 prompt engineer can guide the LLM to provide better or different outputs.
Inference
The ability of a large language model (LLM) to generate a response based on instructions and context provided by the user in the prompt. An LLM can generate new data, make predictions, or draw conclusions based on its learned patterns and relationships in the training data, without having been explicitly programmed.
Inference is a key feature of natural language processing (NLP) tasks such as question answering, summarizing text, and translating. You can use the foundational models in Generative AI for inference.
Streaming
Generation of content by a large language model (LLM) where the user can see the tokens being
 generated one at a time instead of waiting for a complete response to be generated before
 returning the response to the user.
Embedding
A numerical representation that has the property of preserving the meaning of a piece of text. This text can be a phrase, a sentence, or one or more paragraphs. The Generative AI embedding models transform each phrase, sentence, or paragraph that you input, into an array with 384 or 1024 numbers, depending on the embedding model that you choose. You can use these embeddings for finding similarity in phrases that are similar in context or category. Embeddings are typically stored in a vector database. Embeddings are mostly used for semantic searches where the search function focuses on the meaning of the text that it's searching through rather than finding results based on keywords. To create the embeddings, you can input phrases in English and other languages.
Playground
An interface in the Oracle Cloud Console for exploring the
 hosted pretrained and custom models without writing a single line of code. Use the playground
 to test your use cases and refine prompts and parameters. When you're happy with the results,
 copy the generated code or use the model's endpoint to integrate Generative AI into your applications.
Custom Model
A model that you create by using a pretrained model as a base and using your own dataset to
 fine-tune that model.
Tokens
A token is a word, part of a word, or a punctuation. For example, apple is one token
 and friendship is two tokens (friend and ship), and don’t is two
 tokens (don and ‘t). When you run a model in the playground, you can set the
 maximum number of output tokens. Estimate four characters per token.
Temperature
The level of randomness used to generate the output text. To generate a similar output for a
 prompt every time that you run that prompt, use 0. To generate a random new text for that
 prompt, increase the temperature.
 Tip Start with the temperature set to 0 and increase the temperature as you
 regenerate the prompts to refine the output. High temperatures cant introduce
 hallucinations and factually incorrect information.
Top k
A sampling method in which the model chooses the next token randomly from the top
 k most likely tokens. A higher value for k generates more random
 output, which makes the output text sound more natural. The default value for k is 0 for
 command models and -1 for Llama models, which means that
 the models should consider all tokens and not use this method.
Top p
A sampling method that controls the cumulative probability of the top tokens to consider for
 the next token. Assign p a decimal number between 0 and 1 for the
 probability. For example, enter 0.75 for the top 75 percent to be considered. Set
 p to 1 to consider all tokens.
Frequency Penalty
A penalty that is assigned to a token when that token appears frequently. High penalties
 encourage fewer repeated tokens and produce a more random output.
Presence Penalty
A penalty that is assigned to each token when it appears in the output to encourage
 generating outputs with tokens that haven't been used.
Likelihood
In the output of a large language model (LLM), how likely it is for a token to follow the
 current generated token. When an LLM generates a new token for the output text, a likelihood
 is assigned to all tokens, where tokens with higher likelihoods are more likely to follow the
 current token. For example, it's more likely that the word favorite is followed by the
 word food or book rather than the word zebra. Likelihood is defined by a
 number between -15 and 0 and the more negative the number,
 the less likely it is that the token follows the current token.
Preamble
An initial context or guiding message for a chat model. When you don't give a preamble to a
 chat model, the default preamble for that model is used. The default preamble for the
 cohere.command-r-plus and cohere.command-r-16k models
 is:
You are Command.
You are an extremely capable large language model built by Cohere. 
You are given instructions programmatically via an API that you follow to the best of your ability.
It's optional to give a preamble. If want to use your own preamble, for best results, give
 the model context, instructions, and a conversation style. Here are some examples:
You are a seasoned marketing professional with a deep understanding of consumer behavior
 and market trends. Answer with a friendly and informative tone, sharing industry insights
 and best practices.
You are a travel advisor that focuses on fun activities. Answer with sense of humor and a
 pirate tone.
 Note You can also include a preamble in a chat conversation and directly ask the model to
 answer in a certain way. For example, "Answer the following question in a marketing tone.
 Where's the best place to go sailing?"
Model Endpoint
A designated point on a dedicated AI cluster where a large language model (LLM) can accept
 user requests and send back responses such as the model's generated text.
In OCI
Generative AI, you can create endpoints for ready-to-use
 pretrained models and custom models. Those endpoints are listed in the playground for testing
 the models. You can also reference those endpoints in applications.
Content Moderation
A feature that removes biased, toxic, violent, abusive, derogatory, hateful, threatening,
 insulting, and harassing phrases from generated responses in large language models (LLMs). In
 OCI
Generative AI, content moderation is divided into the
 following four categories. 
Hate and harassment, such as identity attacks, insults, threats of violence, and sexual
 aggression
Self-inflicted harm, such as self-harm and eating-disorder promotion
Ideological harm, such as extremism, terrorism, organized crime, and misinformation
Exploitation, such as scams and sexual abuse
By default, OCI
Generative AI does not add a content moderation layer on top
 of the ready-to-use pretrained models. However, pretrained models have some level of content
 moderation that filter the output responses. To incorporate content moderation into models,
 you must enable content moderation when creating an endpoint for a pretrained or a fine-tuned
 model. Learn more about Creating an Endpoint in Generative AI.
By default, OCI
Generative AI's pretrained ready-to-use models don't include
 this feature. However, pretrained models might have some level of content moderation that
 filters the output responses. To incorporate content moderation into models, you must enable
 content moderation when creating an endpoint for a pretrained or a fine-tuned model. Learn
 more about Creating an Endpoint in Generative AI.
Dedicated AI Clusters
Compute resources that you can use for fine-tuning custom models or for hosting endpoints for pretrained and custom models. The clusters are dedicated to your models and not shared with other customers.
