Data Types
Before looking at individual nodes, it's helpful to know how the data they work with looks like, and how they behave.
Also refer to Comfy docs:
Basic Types
int |
float / double |
string |
bool |
|
|---|---|---|---|---|
| Semantic Meaning | Whole numbers like 12345 |
Decimals like 3.14 |
Text like "I like trains" |
True or False |
The most complex out of these is float - besides being a specific type, it also encompasses the idea of the "floating point format/number", which is a smart scheme to store a very wide range of numbers with a fixed bit budget.
Usually, float actually means a 32-bit floating point number, while double means a 64-bit floating point number. You may also see some people / programming languages use other naming, e.g. float16 float32 float64 where the number is how many bits.
Floating Point Numbers
A number stored in the floating point format is split into 2 parts, the "mantissa" and the "exponent," in base 2 because computers.
For demonstration, I'll show what that may look like using our every day normal numbers (in other words, base 10):
\(12345=\underbrace{1.2345}_{\text{mantissa}}\times\underbrace{10}_{\text{base}}\!\!\!\!\!\!^{\overbrace{4}^\text{exponent}}\)
Assume we have a fixed 32-bit budget. To represent massive / tiny numbers, you can allocate more of those 32 bits to the exponent, giving it greater range at the cost of accuracy (usually a worthwhile trade).
JavaScript, the programming language powering ComfyUI (and most of the internet's) interactivity, uses double to store all numbers. This leads to potentially surprising behavior. For example, a double starts to lose the ability to distinguish between \(n\) and \(n+1\) at \(n=2^{53}=9007199254740992.\)
To see this in action, take a Int (utils > primitive) node and try inputting 9007199254740993. Alternatively, input 9007199254740992 and try to increase it by pressing the arrow.
Tensors
Tensors probably won't be the direct input or output of nodes, but understanding them will nevertheless prove immensely helpful imo.
| 0D Tensor | 1D Tensor | 2D Tensor | 3D Tensor | 4D Tensor | |
|---|---|---|---|---|---|
| Also known as | scaler | vector | matrix | ||
| Explanation | 1 number | a list of numbers | a grid of numbers | a cube of numbers | |
| ComfyUI example | seed, cfg |
sigmas (represents noise levels) |
pooled conditioning |
raw conditioning |
image, latent |
In other words, a tensor is simply a multi-dimensional array of numbers (including 0D, a single number).
Images, Latents and Masks
image and latent are both 4D tensors, while mask is a 3D tensor. The semantic meanings of the dimensions are:
N (batch size): How many images/latents there are.C (channels): How many channels. For example, an image may have 3 channels, RGB.H (height)W (width)
latents are in the "channel first" format - NCHW, meaning the first dimension is N, the second is C, the third is H, and the last is W.
images are in "channel last" - NHWC.
mask doesn't have the channel dimension - NHW.
"Frequency"
For something more familiar like audio, the frequency is the number of vibrations with respect to time. Higher frequency = more vibrations = sounds like higher pitch.
For images (and by extension, latents), it's the change in pixel values with respect to distance (X Y coordinates).
- High frequency = rapid change in color/brightness/etc in a small area; Corresponds to fine details.
- Low frequency = gradual change in color/brightness/etc in a large area; Corresponds to image structure.
Common Errors
KSampler - Input type (double) and bias type (float) should be the same:
- Try switching your Preview method from
TAESDto something else.