vision-models

Star

Here are 26 public repositories matching this topic...

LMMMEng / OverLoCK

Star

[CVPR 2025 Oral] OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels

convolutional-neural-networks dynamic-convolution vision-models top-down-attention cvpr2025

Updated Jun 12, 2025
Python

MDGrey33 / pyvisionai

Star

The PyVisionAI Official Repo

python open-source ocr computer vision openai llama vlm vision-models localllm ollama claude-3-5-sonnet

Updated Mar 6, 2025
Python

itsqyh / Awesome-LMMs-Mechanistic-Interpretability

Star

A curated collection of resources focused on the Mechanistic Interpretability (MI) of Large Multimodal Models (LMMs). This repository aggregates surveys, blog posts, and research papers that explore how LMMs represent, transform, and align multimodal information internally.

generative-model generative paperlist vision-models large-language-models mechanistic-interpretability large-vision-language-models large-multimodal-models vision-foundation-model

Updated Jun 18, 2025

afondiel / computer-vision-challenge

Sponsor

Star

A hands-on collection of computer vision projects for everyone.

computer-vision image-processing cnn image-classification image-generation image-detection lvm vlm computer-vision-algorithms computer-vision-tools computer-vision-opencv computer-vision-datasets vision-models vision-transformer computer-vision-python computer-vision-projects computer-vision-hello-world cv-challenge computer-vision-challenge

Updated Nov 1, 2024
Jupyter Notebook

D2I-Group / awesome-vision-time-series

Star

This is an official repository for "Harnessing Vision Models for Time Series Analysis: A Survey".

time-series vision-models large-multimodal-models vision-language-models large-vision-models

Updated Jun 12, 2025
Python

kyegomez / VisionLLaMA

Sponsor

Star

Implementation of VisionLLaMA from the paper: "VisionLLaMA: A Unified LLaMA Interface for Vision Tasks" in PyTorch and Zeta

ai deep-learning vit multi-modal vision-models vision-transformers

Updated Nov 11, 2024
Python

The-Swarm-Corporation / swarm-models

Star

A simple to use package to call various model providers such as openai, anthropic, and others with utmost reliability, security, and performance.

library ai computer-vision tool ml usage production-ready swarms agents enterprise-grade vision-models llms

Updated Apr 4, 2025
Python

Pavansomisetty21 / Image-Caption-Generation-using-LLMs-GEMINI-

Sponsor

Star

we generate captions to the images which are given by user(user input) using prompt engineering and Generative AI

Updated Aug 24, 2024
Jupyter Notebook

ksm26 / Prompt-Engineering-for-Vision-Models

Star

Enhance your skills in prompt engineering for vision models. Learn to effectively prompt, fine-tune, and track experiments for models like SAM, OWL-ViT, and Stable Diffusion 2.0 to achieve precise image generation, segmentation, and object detection.

machine-learning sam image-generation object-detection image-segmentation hyperparameter-tuning comet-library fine-tuning diffusion-models vision-models in-painting prompt-engineering stable-diffusion dreambooth owl-vit visual-workflows

Updated May 13, 2024
Jupyter Notebook

AstraZeneca / PerfCam

Star

PoC Code for PerfCam: Digital Twinning for Production Lines Using 3D Gaussian Splatting and Vision Models

computer-vision yolo object-detection 3d-reconstruction digital-twin vision-models gaussian-splatting

Updated May 13, 2025
Jupyter Notebook

kyegomez / Midas

Sponsor

Star

Implementation of Midas from [Towards Robust Monocular Depth Estimation] in Pytorch and Zeta

python ai tensorflow parallel ml pytorch artificial-intelligence multi-modal vision-models

Updated Mar 11, 2024
Shell

EthanBnntt / tinygrad-gmlp

Star

An implementation of gated MLPs in tinygrad, as an alternative to transformers.

machinelearning vision-models tinygrad gmlp

Updated Sep 6, 2024
Python

iBz-04 / reeltek

Star

A small VLM that sees everything

python ocr gpu-acceleration scene-understanding real-time-detection local-models tts-js huggingface vision-models vision-language-model llamacpp llm-inference vlms smolvlm

Updated Jun 2, 2025
HTML

major196512 / vistem

Star

General Vision Model Training Template

pytorch vision-models

Updated Nov 12, 2020
Python

The-Swarm-Corporation / DART

Star

DART (Diffusion-Autoregressive Recursive Transformer) is a novel hybrid architecture that combines diffusion-based and autoregressive approaches for text generation.