strategy_adapters¶
Fine-Tuning Scheduler Strategy Adapters¶
- class finetuning_scheduler.strategy_adapters.FSDPStrategyAdapter(awp_overrides=None, *args, **kwargs)[source]¶
A
StrategyAdapterthat extendsFinetuningScheduler(FTS) to support flexible, multi-phase, scheduled fine-tuning with the Fully Sharded Data Parallel (FSDP) strategy (FSDPStrategy).As with standard FSDP usage, FSDP wrapping of a
LightningModulecan be performed either by providing anauto_wrap_policyor (for maximal control) by overriding theconfigure_modelmethod ofLightningModuleand manually wrapping the module.In order to support multi-phase scheduled fine-tuning with FSDP, FTS’s key precondition is that the defined fine-tuning schedule phases have disjoint sets of FSDP-flattened parameters (i.e.
FlatParameters, which are created when wrapping a set of modules in a FSDP instance/unit). This constraint is derived from the fact that therequires_gradattribute currently must be the same for all parameters flattened into the sameFlatParameter(if inuse_orig_params=Falsemode).In order to support multi-phase scheduled fine-tuning with FSDP in
use_orig_params=Falsemode, FTS’s key precondition is that the defined fine-tuning schedule phases have disjoint sets of FSDP-flattened parameters (i.e.FlatParameters, which are created when wrapping a set of modules in a FSDP instance/unit). This constraint is derived from the fact that (if inuse_orig_params=Falsemode) therequires_gradattribute must be the same for all parameters flattened into the sameFlatParameter.To facilitate module wrapping in alignment with fine-tuning schedule phases, FTS provides the
awp_overridesfeature which allows users to provide module name-based complements to a givenauto_wrap_policy. See the Example: Multi-Phase Scheduled Fine-Tuning with FSDP tutorial for a concrete example and additional guidance.FTS will attempt to validate that the module is wrapped in a manner that aligns with the defined fine-tuning schedule phases prior to the start of training and provided detailed feedback for the user if a misalignment is discovered.
Note
The
no_decayattribute that FTS supports onLightningModulewith the baseStrategyAdapteris not currently supported in the context of FSDP fine-tuning.Tip
Because of inter-module dependencies (among other reasons), wrapping every submodule in its own separate FSDP instance is often not a viable approach to ensuring fine-tuning schedule/module wrapping alignment. Starting with a provided
auto_wrap_policy(e.g.transformer_auto_wrap_policy) and providing module name-based complements as needed usingawp_overridesis often the most expedient approach to auto-wrapping in alignment with a fine-tuning schedule. As always, if needed, one can overrideconfigure_modeland manually wrap a givenLightningModuleto align with a desired fine-tuning schedule.The only user-facing configuration for
FSDPStrategyAdapterisawp_overrides, an optional list of module names that should be wrapped in separate FSDP instances, complementing the modules that would be individually wrapped byauto_wrap_policyprovided in theFSDPStrategystrategy configuration.- Parameters:
awp_overrides¶ (Optional[List]) – A list of module names to wrap in separate FSDP instances (i.e.,
auto_wrap_policyoverrides). Only applicable when complementing/overriding anauto_wrap_policyprovided in theFSDPStrategystrategy configuration. Override lists will be ignored when manually wrapping modules via aconfigure_modelmethod. If the named modules cannot be found, an exception will be thrown. Defaults to None.
- awp_overrides¶
A list of module names to wrap in separate FSDP instances.
- fsdp_param_transform(orig_thaw_pl, inspect_only)[source]¶
The parameter transformation function currently used by
fts_optim_transform()to transform original parameter lists for optimizer operations.- Parameters:
orig_thaw_pl¶ (List) – The original parameter name list before FSDP’s transformation of them.
inspect_only¶ (bool) – Whether to use the specified transform in read-only (i.e.
inspect_only) mode, avoiding any persistent state transformation that may accompany normal usage. Typically useful for state inspection and validation contexts.
- Returns:
- A transformed parameter name list that matches the current optimizer’s view of them after FSDP’s
transformation of the original parameter names.
- Return type:
List
- fts_optim_transform(orig_pl, inspect_only=False)[source]¶
Because FSDP performs parameter transformations that cause the current optimizer’s view of parameter names to diverge from the original parameter names, this parameter transformation is required for optimizer operations.
- Parameters:
orig_pl¶ (List) – The original parameter name list before FSDP’s transformation of them.
inspect_only¶ (bool) – Whether to use the specified transform in read-only (i.e.
inspect_only) mode, avoiding any persistent state transformation that may accompany normal usage. Typically useful for state inspection and validation contexts.
- Returns:
- A transformed parameter name list that matches the current optimizer’s view of them after FSDP’s
transformation of the original parameter names.
- Return type:
List
- load_optimizer_state_dict(checkpoint_connector)[source]¶
Override the default
load_optimizer_state_dictmethod so that we can allow FSDP to manage the movement of restored optimizer states to the relevant devices.
- logical_param_translation(param_names)[source]¶
Effectively the reverse transformation of
fts_optim_transform().- Parameters:
param_names¶ (List) – A parameter name list from the current optimizer’s view of them after FSDP’s transformation of the original parameter names.
- Returns:
The original parameter name list before a given FSDP’s transformation.
- Return type:
List
- on_after_init_fts()[source]¶
To accommodate FSDP, we defer executing the first fine-tuning phase that would otherwise be executed in this hook, which fires in
FinetuningSchedulersetup immediately afterinit_fts()- Return type:
- on_before_fts_fit_start()[source]¶
In this hook executed immediately before the
FinetuningScheduleron_fit_start()hook begins, we ensure the provided fine-tuning schedule and FSDP wrappedLightningModuleare appropriately aligned and valid. If the fine-tuning schedule and wrapped module are detected to be incompatible, detailed feedback is provided to the user (which is why multiple checks are aggregated before returning any alignment exceptions).- Raises:
MisconfigurationException – If any FTS FSDP fine-tuning schedule/module wrapping alignment exceptions are thrown. The provided exceptions provide detailed feedback for the user to address the misalignment.
- Return type:
- on_before_init_fts()[source]¶
In this hook executed immediately before
init_fts(), to accommodate FSDP we: :rtype:NoneDisable Lightning’s restoration of the optimizer to allow us to implement special handling
Prune
no_decayspecification since it is not currently supported in the context of FSDP fine-tuningValidate the
awp_overridesconfigurationConfigure FTS wrapping of the provided
LightningModuleto either use the providedLightningModule.configure_modelmethod (if present) or a providedauto_wrap_policy.
- on_before_restore_optimizers_and_lrs()[source]¶
Allow the
FSDPStrategyAdapterto override the defaultload_optimizer_state_dictmethod.This is necessary so we can allow FSDP to manage the movement of restored optimizer states to the relevant devices.
- Return type:
- class finetuning_scheduler.strategy_adapters.ModelParallelStrategyAdapter(fsdp_default_kwargs=None, fsdp_plan=None, *args, **kwargs)[source]¶
A
StrategyAdapterthat extendsFinetuningScheduler(FTS) to support flexible, multi-phase, scheduled fine-tuning with PyTorch’s composable distributed (e.g.fully_shard) and Tensor Parallelism APIs. FTS augments Lightning’s Model Parallel strategy (ModelParallelStrategy) by allowing users to apply thefully_shardAPI using module name/pattern-based configuration instead of manually inspecting modules and applying the API inLightningModule.configure_model(seefsdp_plan).See the FTS Distributed Composable API Training Examples tutorial for a concrete example and additional guidance.
Warning
ModelParallelStrategyAdapteris in BETA and subject to change. The interface can bring breaking changes and new features with the next release of PyTorch.Note
fsdp_planmodule name/pattern-basedfully_sharddirectives are applied after any preceding Tensor Parallel or explicitfully_sharddirectives inLightningModule.configure_model. FTS will only applyfully_shardto a specified module if it was not already applied to that module.Note
In addition to all valid
fully_shardAPI kwargs,fsdp_planalso supports aact_ckptandcpu_offload_policykwargs.For specified module/patterns (or
fsdp_default_kwargs),act_ckptallows one to pass a string alias specifying the use of the desired activation checkpointing API (e.g. “composable”, “wrapped”, “wrapped_offload”) as well as an optionalDictof activation checkpointing kwargs. The specified checkpointing APIs will be applied to the matching module(s) beforefully_shard.cpu_offload_policyis a convenience alias that will apply CPUOffloadPolicy to the matching module(s) along with any providedDictof policy kwargs.The only user-facing configuration for
ModelParallelStrategyAdapterarefsdp_planandfsdp_default_kwargs.- Parameters:
fsdp_plan¶ (Optional[Dict]) –
An optional dictionary of module names or regex pattern keys with associated
fully_shardcomposable distributed API kwargs to apply to matching modules.Allows users to apply the
fully_shardAPI using module name/pattern-based configuration instead of manually inspecting modules and applying the API inLightningModule.configure_model.fsdp_plandirectives can also be composed with explicitfully_shardcalls inLightningModule.configure_model, as thefsdp_plandirectives will only invokefully_shardon a specified module if it was not already applied to that module.All valid
fully_shardAPI kwargs are supported.fsdp_plandirectives are applied in the order provided in thefsdp_plandictionary.
Additionally,
fsdp_plansupportsact_ckptandcpu_offload_policykwargs. For specified module/patterns (orfsdp_default_kwargs):act_ckpt(Sequence[str,Dict|None] |ActCkptCfg): pass an alias specifying the use of the desired activation checkpointing API (e.g. “composable”, “wrapped”, “wrapped_offload”) as well as an optionalDictof activation checkpointing kwargs. The specified checkpointing APIs will be applied to the matching module(s) beforefully_shard.cpu_offload_policy(Dict[Optional[str,Any]]) is a convience alias that will applyCPUOffloadPolicyto the matching module(s) along with any provided Dict of policy kwargs. Defaults toNone.
fsdp_default_kwargs¶ (Optional[Dict]) – An optional dictionary of default
fully_shardAPI kwargs to apply to each matching module infsdp_plan. Module-name/pattern specific kwargs will take precedence over these. All kwargs valid forfsdp_planabove are supported. Defaults toNone.
- fsdp_plan¶
An optional dictionary of module names or regex pattern keys with associated
fully_shardcomposable distributed API kwargs to apply to matching modules.Allows users to apply the
fully_shardAPI using module name/pattern-based configuration instead of manually inspecting modules and applying the API inLightningModule.configure_model.fsdp_plandirectives can also be composed with explicitfully_shardcalls inLightningModule.configure_model, as thefsdp_plandirectives will only invokefully_shardon a specified module if it was not already applied to that module.All valid
fully_shardAPI kwargs are supported.fsdp_plandirectives are applied in the order provided in thefsdp_plandictionary.
Additionally,
fsdp_plansupportsact_ckptandcpu_offload_policykwargs. For specified module/patterns (orfsdp_default_kwargs):act_ckpt(Sequence[str,Dict|None] |ActCkptCfg): pass an alias specifying the use of the desired activation checkpointing API (e.g. “composable”, “wrapped”, “wrapped_offload”) as well as an optionalDictof activation checkpointing kwargs. The specified checkpointing APIs will be applied to the matching module(s) beforefully_shard.cpu_offload_policy(Dict[Optional[str,Any]]) is a convience alias that will applyCPUOffloadPolicyto the matching module(s) along with any provided Dict of policy kwargs.
- fsdp_default_kwargs¶
An optional dictionary of default
fully_shardAPI kwargs to apply to each matching module infsdp_plan. Module-name/pattern specific kwargs will take precedence over these. All kwargs valid forfsdp_planabove are supported.
- on_before_fts_fit_start()[source]¶
In this hook executed immediately before the
FinetuningScheduleron_fit_start()hook begins, we ensure the provided fine-tuning schedule and FSDP2 composedLightningModuleare appropriately aligned.If the fine-tuning schedule and composed modules yield parameter group configurations that may not be supported by some optimizer group operations, detailed feedback on potential remediation is provided to the user.
- Return type:
- on_before_init_fts()[source]¶
In this hook executed immediately before
init_fts(), to accommodate enhanced Model Parallel functionality, we: :rtype:NoneValidate the
fsdp_planconfigurationConfigure FTS wrapping of the provided
LightningModuleto either use the providedLightningModule.configure_modelmethod (if present) or a providedfsdp_plan.
- class finetuning_scheduler.strategy_adapters.StrategyAdapter[source]¶
Base class for all strategy adapters. Implements the default
FinetuningSchedulerhooks. Can be subclassed to extendFinetuningSchedulersupport for a complex or customStrategyvia an associatedStrategyAdapter.Warning
StrategyAdapteris in BETA and subject to change. The interface can bring breaking changes and new features with the next release of FTS.Tip
If you want to extend FTS to use a custom, currently unsupported strategy or override current FTS behavior in the context of a given training strategy, subclassing
StrategyAdapteris a way to do so. SeeFSDPStrategyAdapterfor an example implementation.The default fine-tuning phase execution function is set on
StrategyAdapterinitialization.This can be overridden by
StrategyAdaptersubclasses to adapt fine-tuning phase execution to meet strategy-specific requirements.- static base_ft_phase(module, thaw_pl, translation_func=None, init_thaw=False)[source]¶
Thaw/unfreeze the provided list of parameters in the provided
Module- Parameters:
- Returns:
- A Tuple of two lists.
The list of newly thawed/unfrozen parameters thawed by this function
A list of all currently thawed/unfrozen parameters in the target
Module
- Return type:
Tuple[List, List]
- connect(fts_parent)[source]¶
Create a handle for the associated
FinetuningSchedulerinstance.- Parameters:
fts_parent¶ (Callback) – The associated
FinetuningSchedulerinstance- Return type:
- fts_optim_transform(orig_pl, inspect_only=False)[source]¶
A method that can be overridden by a
StrategyAdapterif aStrategyperforms parameter transformations that cause the current optimizer’s view of parameter names to diverge from the original parameter names. By default, no transformation of schedule parameter names is required for optimizer operations.- Parameters:
orig_pl¶ (List) – The original parameter name list before a given
Strategy’s transformation of them.inspect_only¶ (bool) – Whether to use the specified transform in read-only (i.e.
inspect_only) mode, avoiding any persistent state transformation that may accompany normal usage. Typically useful for state inspection and validation contexts.
- Returns:
- A transformed parameter name list that matches the current optimizer’s view of them after a given
Strategy’s transformation of the original parameter names.
- Return type:
List
- logical_param_translation(param_names)[source]¶
Effectively the reverse transformation of
fts_optim_transform(). Can be overridden by aStrategyAdapterif aStrategyperforms parameter transformations that cause the original user view of parameter names to diverge from the current optimizer’s view. By default, no transformation of optimizer parameter names is required.
- on_after_init_fts()[source]¶
Hook executed in
FinetuningSchedulersetup immediately afterinit_fts().- Return type:
- on_before_fts_fit_start()[source]¶
Hook executed immediately before the
FinetuningScheduleron_fit_start()hook begins.- Return type:
- on_before_init_fts()[source]¶
Hook executed in
FinetuningSchedulersetup immediately beforeinit_fts()- Return type:
- on_before_restore_optimizers_and_lrs()[source]¶
Hook executed immediately before
FinetuningSchedulerrestores optimizers and schedulers.- Return type:
- phase0_optimizer_override()[source]¶
Reconfigure the user-configured optimizer (configured via configure_optimizers) to optimize the parameters (and only those parameters) scheduled to be optimized in phase 0 of the current fine-tuning schedule.
Reconfiguration only takes place here if FTS discovers the set of parameters to be initially thawed and present in the optimizer differs from the parameters specified in phase 0. Only the parameters included in the optimizer are affected; the choice of optimizer, lr_scheduler etc. remains unaltered.
- Return type:
- property pl_module¶
Convenient access to the
LightningModulebeing fine- tuned.- Returns:
The user’s
LightningModule- Return type:
LightningModule
- property pls_handle¶
Convenient access to the current
Strategyin use.- Returns:
The
Strategyin use.- Return type:
Strategy