Quick Llama

A Python wrapper for Ollama that simplifies managing and interacting with LLMs on colab with multi model and reasoning model support.

QuickLlama automates server setup, model management, and seamless interaction with LLMs, providing an effortless developer experience.

🚀 Colab-Ready: Easily run and experiment with QuickLlama on Google Colab for hassle-free, cloud-based development!

Note: Don’t forget to use a GPU if you actually want it to perform well!

Installtion

pip install quick-llama

Serve a model

from quick_llama import QuickLlama
model = 'gemma3'
quick_llama = QuickLlama(model_name=model,verbose=True)

quick_llama.init()

from quick_llama import QuickLlama

from ollama import chat
from ollama import ChatResponse

# Defaults to gemma3
model = 'gemma3'
quick_llama = QuickLlama(model_name=model,verbose=True)

quick_llama.init()

response: ChatResponse = chat(model=model, messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])
print(response['message']['content'])
# or access fields directly from the response object
print(response.message.content)

quick_llama.stop()

MultiModels

import requests
import os
from ollama import chat
from quick_llama import QuickLlama

model = 'gemma3'
quick_llama = QuickLlama(model_name=model,verbose=True)

quick_llama.init()

# Step 1: Download the image
img_url = "https://raw.githubusercontent.com/nuhmanpk/quick-llama/main/images/llama-image.webp" # quick llama cover photo
img_path = "temp_llama_image.webp"

with open(img_path, "wb") as f:
    f.write(requests.get(img_url).content)

# Step 2: Send the image to the model
response = chat(
    model=model,
    messages=[
        {
            "role": "user",
            "content": "Describe what you see in this photo.",
            "images": [img_path],
        }
    ]
)

# Step 3: Print the result
print(response['message']['content'])

# Step 4: Clean up the image file
os.remove(img_path)

from quick_llama import QuickLlama


from ollama import chat
from ollama import ChatResponse

# Defaults to gemma3
quick_llama = QuickLlama(model_name="gemma3")

quick_llama.init()

response: ChatResponse = chat(model='gemma3', messages=[
  {
    'role': 'user',
    'content': 'what is 6 times 5?',
  },
])
print(response['message']['content'])

print(response.message.content)

Use with Langchain

from quick_llama import QuickLlama
from langchain_ollama import OllamaLLM

model_name = "gemma3"

quick_llama = QuickLlama(model_name=model_name,verbose=True)

quick_llama.init()

model = OllamaLLM(model=model_name)
model.invoke("Come up with 10 names for a song about parrots")

Use custom Models

quick_llama = QuickLlama()  # Defaults to mistral
quick_llama.init()

# Custom Model
# Supports all models from https://ollama.com/search
quick_llama = QuickLlama(model_name="custom-model-name")
quick_llama.init()

List Models

quick_llama.list_models()

Stop Model

quick_llama.stop_model("gemma3")

Stop Server

quick_llama.stop()

Made with ❤️ by Nuhman. Happy Coding 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.github/workflows		.github/workflows
images		images
quick_llama		quick_llama
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Quick Llama

Installtion

Serve a model

MultiModels

Use with Langchain

Use custom Models

List Models

Stop Model

Stop Server

About

Uh oh!

Releases 10

Uh oh!

Languages

License

nuhmanpk/quick-llama

Folders and files

Latest commit

History

Repository files navigation

Quick Llama

Installtion

Serve a model

MultiModels

Use with Langchain

Use custom Models

List Models

Stop Model

Stop Server

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 10

Uh oh!

Languages