module-attribute  ¶
  If a key maps to a value of None, the corresponding weight is ignored.
 module-attribute  ¶
   
 Helper class to load weights into a torch.nn.Module. It is able to automatically detect child modules and parameters while iterating over the weights only once.
The weight loading logic for individual modules can be overridden by defining a load_weights method.
Similarly, the weight loading logic for individual parameters can be overridden by defining a weight_loader method.
Detailed weight loading information can be viewed by setting the environment variable VLLM_LOGGING_LEVEL=DEBUG.
Source code in vllm/model_executor/models/utils.py
 | 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 |  | 
 class-attribute instance-attribute  ¶
 ROTARY_EMBEDS_UNUSED_WEIGHTS = [
    "rotary_emb.inv_freq",
    "rotary_emb.cos_cached",
    "rotary_emb.sin_cached",
]
 instance-attribute  ¶
   instance-attribute  ¶
   
 __init__(
    module: Module,
    *,
    skip_prefixes: list[str] | None = None,
    skip_substrs: list[str] | None = None,
    ignore_unexpected_prefixes: list[str] | None = None,
    ignore_unexpected_suffixes: list[str] | None = None,
) -> None
Source code in vllm/model_executor/models/utils.py
  
  Add tensor names that are not in the model params that may be in the safetensors, e.g., batch normalization stats.
Source code in vllm/model_executor/models/utils.py
  
    
    
    
 _groupby_prefix(
    weights: Iterable[tuple[str, Tensor]],
) -> Iterable[tuple[str, Iterable[tuple[str, Tensor]]]]
Source code in vllm/model_executor/models/utils.py
  
 _load_module(
    base_prefix: str,
    module: Module,
    weights: Iterable[tuple[str, Tensor]],
) -> Iterable[str]
Source code in vllm/model_executor/models/utils.py
  
 _load_param(
    base_prefix: str,
    param: Parameter,
    weights: Iterable[tuple[str, Tensor]],
) -> Iterable[str]
Source code in vllm/model_executor/models/utils.py
  
 load_weights(
    weights: Iterable[tuple[str, Tensor]],
    *,
    mapper: WeightsMapper | None = None,
) -> set[str]
Source code in vllm/model_executor/models/utils.py
  
  Bases: Identity
A placeholder layer for missing layers in a pipeline parallel model.
Source code in vllm/model_executor/models/utils.py
  dataclass  ¶
 Maps the name of each weight if they match the following patterns.
Source code in vllm/model_executor/models/utils.py
  class-attribute instance-attribute  ¶
 orig_to_new_prefix: WeightsMapping = field(
    default_factory=dict
)
 class-attribute instance-attribute  ¶
 orig_to_new_substr: WeightsMapping = field(
    default_factory=dict
)
 class-attribute instance-attribute  ¶
 orig_to_new_suffix: WeightsMapping = field(
    default_factory=dict
)
 
 __init__(
    orig_to_new_substr: WeightsMapping = dict(),
    orig_to_new_prefix: WeightsMapping = dict(),
    orig_to_new_suffix: WeightsMapping = dict(),
) -> None
 
 __or__(other: WeightsMapper) -> WeightsMapper
Combine two WeightsMappers by merging their mappings.
Source code in vllm/model_executor/models/utils.py
  
  Source code in vllm/model_executor/models/utils.py
  
    
    
 _embedding_count_expression(
    embeddings: NestedTensors,
) -> str
Constructs a debugging representation of the number of embeddings in the NestedTensors.
Source code in vllm/model_executor/models/utils.py
  
 _flatten_embeddings(embeddings: NestedTensors) -> Tensor
Recursively flattens and concatenates NestedTensors on all but the last dimension.
Source code in vllm/model_executor/models/utils.py
  
 _merge_multimodal_embeddings(
    inputs_embeds: Tensor,
    multimodal_embeddings: NestedTensors,
    is_multimodal: Tensor,
) -> Tensor
Merge multimodal_embeddings into inputs_embeds by overwriting the positions in inputs_embeds corresponding to placeholder tokens in input_ids.
Note
This updates inputs_embeds in place.
Source code in vllm/model_executor/models/utils.py
  
  Source code in vllm/model_executor/models/utils.py
  
  Extract the layer index from the module name. Examples: - "encoder.layers.0" -> 0 - "encoder.layers.1.self_attn" -> 1 - "2.self_attn" -> 2 - "model.encoder.layers.0.sub.1" -> ValueError if num_attn_module == 1
Source code in vllm/model_executor/models/utils.py
  
  Optimized topk implementation that uses torch.max for k=1 case.
This function provides better performance for the common case of k=1 by using torch.max instead of the more general torch.topk.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| values | Tensor | Input tensor to find top-k values from | required | 
| topk | int | Number of top values to return (k). Must be > 0. | required | 
| dim | int | Dimension along which to compute topk | required | 
Returns:
| Type | Description | 
|---|---|
| Tensor | Tuple of (values, indices) where values are the top-k values | 
| Tensor | and indices are their corresponding indices in the input tensor | 
Source code in vllm/model_executor/models/utils.py
  
 Flatten the B and N dimensions of batched multimodal inputs.
The input tensor should have shape (B, N, ...).
Source code in vllm/model_executor/models/utils.py
  
  Get the names of the missing layers in a pipeline parallel model.
Source code in vllm/model_executor/models/utils.py
  
 init_vllm_registered_model(
    vllm_config: VllmConfig,
    *,
    prefix: str = "",
    hf_config: PretrainedConfig | None = None,
    architectures: list[str] | None = None,
) -> Module
Helper function to initialize an inner model registered to vLLM, based on the arguments passed to the outer vLLM model.
Source code in vllm/model_executor/models/utils.py
  
  Check if a parameter is missing in a pipeline parallel model.
Source code in vllm/model_executor/models/utils.py
  
  Source code in vllm/model_executor/models/utils.py
  
  Source code in vllm/model_executor/models/utils.py
  
 make_layers(
    num_hidden_layers: int, layer_fn: LayerFn, prefix: str
) -> tuple[int, int, ModuleList]
Make a list of layers with the given layer function, taking pipeline parallelism into account.
Source code in vllm/model_executor/models/utils.py
  
  Source code in vllm/model_executor/models/utils.py
  
  Add a prefix to a name if the prefix is non-empty.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| prefix | str | The prefix to add. If empty, no prefix will be added. | required | 
| name | str | The name to potentially prefix. | required | 
Returns:
| Type | Description | 
|---|---|
| str | The string "prefix.name" if prefix was non-empty, otherwise just "name". | 
Source code in vllm/model_executor/models/utils.py
  
 merge_multimodal_embeddings(
    input_ids: Tensor,
    inputs_embeds: Tensor,
    multimodal_embeddings: NestedTensors,
    placeholder_token_id: int | list[int],
) -> Tensor
Merge multimodal_embeddings into inputs_embeds by overwriting the positions in inputs_embeds corresponding to placeholder tokens in input_ids.
placeholder_token_id can be a list of token ids (e.g, token ids of img_start, img_break, and img_end tokens) when needed: This means the order of these tokens in the input_ids MUST MATCH the order of their embeddings in multimodal_embeddings since we need to slice-merge instead of individually scattering.
For example, if input_ids is "TTTTTSIIIBIIIBIIIETTT", where - T is text token - S is image start token - I is image embedding token - B is image break token - E is image end token.
Then the image embeddings (that correspond to I's) from vision encoder must be padded with embeddings of S, B, and E in the same order of input_ids for a correct embedding merge.
Note
This updates inputs_embeds in place.