Basic video guide

Microsoft Lens in ComfyUI — Basic Video Guide

A quick, high-points guide to the Veteran AI video about Microsoft Lens: what it is, the main model/file names, the ComfyUI settings mentioned, and the test scenes shown.

1

Overview

Microsoft Lens is presented as a small but capable text-to-image model. The video’s main point is that Lens has only 3.8B parameters, but can still produce detailed images, high resolutions, and flexible aspect ratios inside ComfyUI.

Simple summary: Lens is not pitched as the biggest or most powerful image model. It is pitched as an efficient model that can create surprisingly detailed images for its size.
2

Key names, tools, and locations

Model and files

  • Model: Microsoft Lens
  • Model size: 3.8B parameters
  • Dataset mentioned: Lens-800M
  • Main download location: Comfy-Org/Lens on Hugging Face
  • Versions: normal Lens and Turbo Lens

Workflow/tools mentioned

  • ComfyUI — local node workflow
  • RunningHub — online ComfyUI workflow platform
  • UNETLoader — loads the main model
  • GPT-OSS — text encoder
  • FLUX2 VAE — VAE used in the workflow
3

Basic ComfyUI settings mentioned

  • Use a recent ComfyUI build, because Lens support appears in newer versions.
  • Set CLIP type to Lens; otherwise encoding may not work correctly.
  • Normal Lens: about 20 sampling steps.
  • Turbo Lens: about 4 steps for faster preview-style output.
  • Sampler/scheduler used: Euler sampler and simple scheduler.
  • Example CFG: 5.0.
  • Example denoise: 1.0.
  • Square test resolution: 1440 × 1440.
  • Vertical test resolution: 1024 × 1536.
  • Important nodes to watch: ModelSamplingFlux and CFGNorm.
Easy-to-miss point: when changing image size, do not only change EmptyLatentImage. The width and height also affect ModelSamplingFlux, so the workflow should keep those values connected consistently.
4

Test scenes and locations from the video

  • Realistic photo test: a small independent watch repair shop at midnight, with an old watchmaker, tiny golden gears, rain outside, and red/blue neon reflections.
  • Chinese prompt/location test: a rainy night in Chongqing, China, with wet stairs, old residential buildings, street food stalls, lanterns, neon signs, a delivery rider, an electric scooter, river lights, and ferries.
  • English text test: a travel notebook cover with readable title text, hand-drawn map elements, seashells, watercolor tape, and sunlight.
  • Product text test: a black perfume bottle with a minimal label, reflective tabletop, and soft spotlighting.
  • Fantasy large-scene test: an ancient floating harbor above the clouds with wooden docks, sailing ships, mechanical cranes, travelers, and whale-like airships.
  • Sci-fi concept art test: a futuristic data cathedral with magnetic panels, glowing data pillars, glass floors, engineers in black robes, holograms, and optical fibers.
  • Object-count test: an overhead desktop scene with exactly four pencils, two ceramic cups, one silver laptop, and one square sketchbook.
5

Main takeaways

  • Lens seems strong at detail density and long prompt understanding for a small model.
  • It handles flexible aspect ratios better than square-only workflows.
  • English text generation is described as strong.
  • Chinese prompts can work for overall atmosphere, but English prompts are still recommended when possible.
  • Chinese text inside generated images is not reliable.
  • Exact object counts are still unreliable, which is common for text-to-image models.
  • The normal model is slower but higher quality; Turbo is the faster option.
6

Sources

Source notes sidecar: microsoft-lens-comfyui-basic-guide-2026-05-28.sources.md