Skip to main content

Pandas (dagster-pandas)

The dagster_pandas library provides utilities for using pandas with Dagster and for implementing validation on pandas DataFrames. A good place to start with dagster_pandas is the validation guide.

dagster_pandas.create_dagster_pandas_dataframe_type
beta

This API is currently in beta, and may have breaking changes in minor version releases, with behavior changes in patch releases.

Constructs a custom pandas dataframe dagster type.

Parameters:

  • name (str) – Name of the dagster pandas type.
  • description (Optional[str]) – A markdown-formatted string, displayed in tooling.
  • columns (Optional[List[PandasColumn]]) – A list of PandasColumn objects
  • metadata_fn (Optional[Callable[[], Union[Dict[str, Union[str, float, int, Dict, MetadataValue]]) – A callable which takes your dataframe and returns a dict with string label keys and
  • dataframe_constraints (Optional[List[DataFrameConstraint]]) – A list of objects that inherit from
  • loader (Optional[DagsterTypeLoader]) – An instance of a class that
class dagster_pandas.RowCountConstraint
beta

This API is currently in beta, and may have breaking changes in minor version releases, with behavior changes in patch releases.

A dataframe constraint that validates the expected count of rows.

Parameters:

  • num_allowed_rows (int) – The number of allowed rows in your dataframe.
  • error_tolerance (Optional[int]) – The acceptable threshold if you are not completely certain. Defaults to 0.
class dagster_pandas.StrictColumnsConstraint
beta

This API is currently in beta, and may have breaking changes in minor version releases, with behavior changes in patch releases.

A dataframe constraint that validates column existence and ordering.

Parameters:

  • strict_column_list (List[str]) – The exact list of columns that your dataframe must have.
  • enforce_ordering (Optional[bool]) – If true, will enforce that the ordering of column names must match.
class dagster_pandas.PandasColumn
beta

This API is currently in beta, and may have breaking changes in minor version releases, with behavior changes in patch releases.

The main API for expressing column level schemas and constraints for your custom dataframe types.

Parameters:

  • name (str) – Name of the column. This must match up with the column name in the dataframe you
  • is_required (Optional[bool]) – Flag indicating the optional/required presence of the column.
  • constraints (Optional[List[Constraint]]) – List of constraint objects that indicate the
dagster_pandas.DataFrame = <dagster._core.types.dagster_type.DagsterType object>

Define a type in dagster. These can be used in the inputs and outputs of ops.

Parameters:

  • type_check_fn (Callable[[TypeCheckContext, Any], [Union[bool, TypeCheck]]]) – The function that defines the type check. It takes the value flowing

  • key (Optional[str]) –

    The unique key to identify types programmatically. The key property always has a value. If you omit key to the argument to the init function, it instead receives the value of name. If neither key nor name is provided, a CheckError is thrown.

    In the case of a generic type such as List or Optional, this is generated programmatically based on the type parameters.

  • name (Optional[str]) – A unique name given by a user. If key is None, key

  • description (Optional[str]) – A markdown-formatted string, displayed in tooling.

  • loader (Optional[DagsterTypeLoader]) – An instance of a class that

  • required_resource_keys (Optional[Set[str]]) – Resource keys required by the type_check_fn.

  • is_builtin (bool) – Defaults to False. This is used by tools to display or

  • kind (DagsterTypeKind) – Defaults to None. This is used to determine the kind of runtime type

  • typing_type – Defaults to None. A valid python typing type (e.g. Optional[List[int]]) for the