crv.io.validate
Experimental API
crv.io.validate
Schema validation utilities for crv.io.
Purpose - Validate Polars DataFrames against canonical table descriptors from crv.core.tables. - Apply pragmatic checks with safe casting for scalar dtypes.
Source of truth (core) - crv.core.tables.TableDescriptor describes columns/dtypes/required/nullable/partitioning. - crv.core.grammar.TableName enumerates canonical table names. - crv.core.schema defines row-level combination rules (not enforced here). - crv.core.errors provides domain exceptions (SchemaError/GrammarError). The IO layer raises IoSchemaError for validation failures at the IO boundary.
Checks performed (Phase 1) - Required columns present. - When strict=True: no columns outside (required ∪ nullable). - Dtype compatibility: - Scalar types ("i64","f64","str") are safely cast when possible. - "struct" accepts pl.Struct or pl.Object (no deep validation). - "list[struct]" accepts pl.List (inner type not enforced yet).
Notes - File protocol baseline only; this module depends on polars and crv.core descriptors.
crv.io.validate.validate_frame_against_descriptor
validate_frame_against_descriptor(
df: polars.DataFrame, desc: crv.core.tables.TableDescriptor, *, strict: bool = True
) -> pl.DataFrame
Validate a Polars DataFrame against a TableDescriptor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
polars.DataFrame
|
Frame to validate. |
required |
desc
|
crv.core.tables.TableDescriptor
|
Canonical descriptor from crv.core.tables. |
required |
strict
|
bool
|
Enforce exact column set (no extras) when True. |
True
|
Returns:
| Type | Description |
|---|---|
polars.DataFrame
|
pl.DataFrame: Possibly with safe casts applied for scalar types. |
Raises:
| Type | Description |
|---|---|
crv.io.errors.IoSchemaError
|
If required columns are missing, extras are present under strict mode, or dtypes are incompatible and cannot be safely cast. |
Notes
- Scalar columns ("i64","f64","str"): attempt non-strict casts.
- Struct-like columns accept pl.Struct or pl.Object.
- List[struct] accepts any pl.List inner type in Phase 1.
Source code in src/crv/io/validate.py
crv.io.validate.validate_frame_for_table
validate_frame_for_table(
df: polars.DataFrame,
table: crv.core.grammar.TableName | str,
*,
strict: bool = True
) -> pl.DataFrame
Validate a DataFrame against the descriptor for a given table.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
polars.DataFrame
|
DataFrame to validate. |
required |
table
|
crv.core.grammar.TableName | str
|
Canonical table name (enum or lower_snake string). |
required |
strict
|
bool
|
Enforce exact column set when True. |
True
|
Returns:
| Type | Description |
|---|---|
polars.DataFrame
|
pl.DataFrame: Possibly casted frame. |
Raises:
| Type | Description |
|---|---|
crv.io.errors.IoSchemaError
|
On validation failure. |
Notes
- Resolves the descriptor from crv.core.tables via the TableName enum.
- Row-level semantic checks (e.g., combination rules) live in crv.core.schema.