Projects

AIBA — Attention-based Instrument Band Alignment

  • Conference: NeurIPS 2025 Workshop (AI for Music)
  • What it does: Aligns attention maps from text-to-audio diffusion models with instrument frequency bands, enabling interpretable control.
  • Links: Workshop · arXiv · PDF

Jamendo-QA — Large-Scale Music Question Answering Dataset

  • Conference: Submitted to ICASSP 2026
  • What it does: Builds a large-scale QA dataset from Jamendo music tracks, combining captions, tags, and questions for training multimodal LLMs.
  • Links: arXiv · PDF · HuggingFace

MINO (Music-DINO)

  • Status: Research in progress
  • What it does: Adapts DINO self-distillation to music by using CQT spectrograms, harmonic-aware positional encoding, and dual-axis attention to capture pitch, harmony, and tempo.
  • Links: (coming soon: code + paper)

Let Triggers Control — Frequency-aware Dropout

  • Status: Under Review
  • What it does: Introduces a frequency-aware dropout method for token control, enabling better handling of trigger tokens in generative models.
  • Links: (preprint link coming soon)

Illustrious — Open Advanced Illustration Model

  • Status: Technical Report (2024)
  • What it does: Large-scale illustration generation model with open release for research and creative use.
  • Links: arXiv · PDF · HuggingFace · CivitAI

Contrastive Adapter Training (CAT) — Personalized Image Generation

  • Conference: CVPR 2024 Workshop (Generative Models for Computer Vision)
  • What it does: Proposes a contrastive adapter training strategy to personalize diffusion models.
  • Links: Workshop · arXiv · PDF