This guide acts as an introduction to providing parameters to assets and jobs using the experimental Pythonic configuration APIs, which makes defining and passing this configuration more lightweight.
It's often useful to provide user-chosen values to Dagster jobs or software-defined assets at runtime. For example, you might want to choose what dataset an op runs against, or provide a connection URL for a database resource. Dagster exposes this functionality through a configuration API.
Defining Pythonic configuration on assets and ops#
Configurable parameters accepted by an op or asset are specified by defining a config model subclass of Config and a config parameter to the corresponding op or asset function. Under the hood, these config models utilize Pydantic, a popular Python library for data validation and serialization.
During execution, the specified config is accessed within the body of the op or asset using the config parameter, which is reserved specifically for this purpose.
Here, we define a subclass of Config holding a single string value representing the name of a user. We can access the config through the config parameter in the asset body.
from dagster import asset, Config
classMyAssetConfig(Config):
person_name:str@assetdefgreeting(config: MyAssetConfig)->str:returnf"hello {config.person_name}"
Here, we define a subclass of Config holding a single string value representing the name of a user. We can access the config through the config parameter in the op body.
Defining and accessing Pythonic configuration for a resource#
Configurable parameters for a resource are defined by specifying attributes for a resource class, which subclasses ConfigurableResource. The below resource defines a configurable connection URL, which can be accessed in any methods defined on the resource.
from dagster import op, ConfigurableResource
classMyDatabaseResource(ConfigurableResource):
connection_url:strdefquery(self, query:str):return get_engine(self.connection_url).execute(query)
To execute a job or materialize an asset that specifies config, you'll need to provide values for its parameters. When specifying config from the Python API, we can use the run_config argument for JobDefinition.execute_in_process or materialize. This takes a RunConfig object, within which we can supply config on a per-op or per-asset basis. The config is specified as a dictionary, with the keys corresponding to the op/asset names and the values corresponding to the config values.
Ops and assets can be configured using environment variables by passing an EnvVar when constructing your config object. This is useful when the value may vary based on environment or is sensitive. If you're using Dagster Cloud, environment variables can be set up directly in the UI.
In some cases, you may want to define a more complex config schema. For example, you may want to define a config schema that takes in a list of files or complex data. Below we'll walk through some common patterns for defining more complex config schemas.
Config fields can be annotated with metadata, which can be used to provide additional information about the field, using the Pydantic Field class.
For example, we can annotate a config field with a description, which will be displayed in the documentation for the config field. We can add a value range to a field, which will be validated when config is specified.
from dagster import Config
from pydantic import Field
classMyMetadataConfig(Config):# Here, the ellipses `...` indicates that the field is required and has no default value.
person_name:str= Field(..., description="The name of the person to greet")
age:int= Field(..., gt=0, lt=100, description="The age of the person to greet")# errors!
MyMetadataConfig(person_name="Alice", age=200)
Config fields can be marked as optional by specifying a default value. For example, we can mark the greeting_phrase field as optional by specifying a default of hello. Optional fields, such as person_name, can be specified a default value of None.
Union types are supported using Pydantic discriminated unions. Each union type must be a subclass of Config. The discriminator argument to Field specifies the field that will be used to determine which union type to use.
Here, we define a config schema which takes in a pet field, which can be either a Cat or a Dog, as indicated by the pet_type field.
from dagster import asset, materialize, Config, RunConfig
from pydantic import Field
from typing import Union
from typing_extensions import Literal
classCat(Config):
pet_type: Literal["cat"]="cat"
meows:intclassDog(Config):
pet_type: Literal["dog"]="dog"
barks:floatclassConfigWithUnion(Config):# Here, the ellipses `...` indicates that the field is required and has no default value.
pet: Union[Cat, Dog]= Field(..., discriminator="pet_type")@assetdefpet_stats(config: ConfigWithUnion):ifisinstance(config.pet, Cat):returnf"Cat meows {config.pet.meows} times"else:returnf"Dog barks {config.pet.barks} times"
result = materialize([pet_stats],
run_config=RunConfig({"pet_stats": ConfigWithUnion(
pet=Cat(meows=10),)}),)