-
IDEA
- Shenzhen, China
-
18:34
(UTC -12:00) - https://rentainhe.github.io/
- @Tianhe_Ren
Lists (10)
Sort Name ascending (A-Z)
Detection Transformer
detrex extension
detrex extension works
Generation
GAN, Diffusion, etc.IDEA-CVR work
Label-Convert-Tools
Convert the label into different format: yolo2coco, coco2yoloLLM-Engineer
Large Language ModelingOpen Vocabulary
Language
Sort by: Recently starred
Starred repositories
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
[ICML 2024] Selecting High-Quality Data for Training Language Models
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness
Run Segment Anything Model 2 on a live video stream
[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?
Python Library to evaluate VLM models' robustness across diverse benchmarks
This repository compiles a list of papers related to the application of video technology in the field of robotics! Star⭐ the repo and follow me if you like what you see🤩.
A curated list of video object segmentation (vos) papers, datasets, and projects.
[ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs
Text-to-video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement
A Collection on Large Language Models for Optimization
Run PyTorch LLMs locally on servers, desktop and mobile
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO and SAM 2
Official inference repo for FLUX.1 models
Official implementation of OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion
[AAAI2024] FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning
MINT-1T: A one trillion token multimodal interleaved dataset.
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
[ECCV2024] Adaptive Parametric Activation
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception