Rayan Mostovoi 8a8e7afa96 small fix on examples/python-simplechat/client.py to actually get a streamed response and get tokens printed as we receive it (#4671) 7 months ago
..
client.py 8a8e7afa96 small fix on examples/python-simplechat/client.py to actually get a streamed response and get tokens printed as we receive it (#4671) 7 months ago
readme.md e8aaea030e Update 'llama2' -> 'llama3' in most places (#4116) 8 months ago
requirements.txt 5a85070c22 Update readmes, requirements, packagejsons, etc for all examples (#1452) 1 year ago

readme.md

Simple Chat Example

The chat endpoint is one of two ways to generate text from an LLM with Ollama, and is introduced in version 0.1.14. At a high level, you provide the endpoint an array of objects with a role and content specified. Then with each output and prompt, you add more of those role/content objects, which builds up the history.

Running the Example

  1. Ensure you have the llama3 model installed:
   ollama pull llama3
  1. Install the Python Requirements.
   pip install -r requirements.txt
  1. Run the example:
   python client.py

Review the Code

You can see in the chat function that actually calling the endpoint is done simply with:

r = requests.post(
  "http://0.0.0.0:11434/api/chat",
  json={"model": model, "messages": messages, "stream": True},
)

With the generate endpoint, you need to provide a prompt. But with chat, you provide messages. And the resulting stream of responses includes a message object with a content field.

The final JSON object doesn't provide the full content, so you will need to build the content yourself.

In the main function, we collect user_input and add it as a message to our messages and that is passed to the chat function. When the LLM is done responding the output is added as another message.

Next Steps

In this example, all generations are kept. You might want to experiment with summarizing everything older than 10 conversations to enable longer history with less context being used.