crv.io.paths
Experimental API
crv.io.paths
Path and layout helpers for crv.io.
Overview (file protocol baseline)
-
Source of truth - Canonical table names: crv.core.grammar.TableName (lower_snake). - Partitioning discipline: ["bucket"], with bucket computed in IO as tick // IoSettings.tick_bucket_size (defaults sourced from crv.core.constants). - Schema/table descriptors: crv.core.tables (columns/dtypes/required/nullable).
Import DAG discipline - stdlib + crv.io.config only. No imports from higher-level packages.
Notes - This module focuses solely on path construction and run/table directory layout.
crv.io.paths.PartPaths
dataclass
Container for a part's temporary and final file paths.
Attributes:
| Name | Type | Description |
|---|---|---|
tmp_path |
str
|
Temporary file path used for initial write (e.g., "*.parquet.tmp"). |
final_path |
str
|
Final file path after atomic rename (e.g., "*.parquet"). |
Source code in src/crv/io/paths.py
crv.io.paths.bucket_id_for_tick
Compute the bucket id from a tick using floor division.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tick
|
int
|
Simulation tick (>= 0). |
required |
bucket_size
|
int
|
Number of ticks per bucket (>= 1). |
required |
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
Non-negative bucket id. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If bucket_size < 1 or tick < 0. |
Source code in src/crv/io/paths.py
crv.io.paths.format_bucket_dir
Format a bucket directory name as 'bucket=000123'.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bucket_id
|
int
|
Non-negative bucket id. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Formatted directory name. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If bucket_id < 0. |
Source code in src/crv/io/paths.py
crv.io.paths.run_root
Root directory for a run.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
settings
|
crv.io.config.IoSettings
|
IO settings containing root_dir. |
required |
run_id
|
str
|
Run identifier (see crv.core.ids.RunId; stored as string on disk). |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Path " |
Source code in src/crv/io/paths.py
crv.io.paths.tables_root
Tables root for a run.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
settings
|
crv.io.config.IoSettings
|
IO settings. |
required |
run_id
|
str
|
Run identifier. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Path " |
Source code in src/crv/io/paths.py
crv.io.paths.table_dir
Table directory path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
settings
|
crv.io.config.IoSettings
|
IO settings. |
required |
run_id
|
str
|
Run identifier. |
required |
table_name
|
str
|
Canonical table name (lower_snake; see crv.core.grammar.TableName). |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Path " |
Source code in src/crv/io/paths.py
crv.io.paths.manifest_path
Path to table manifest.json.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
settings
|
crv.io.config.IoSettings
|
IO settings. |
required |
run_id
|
str
|
Run identifier. |
required |
table_name
|
str
|
Canonical table name. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Path " |
Source code in src/crv/io/paths.py
crv.io.paths.bucket_dir
bucket_dir(
settings: crv.io.config.IoSettings, run_id: str, table_name: str, bucket_id: int
) -> str
Bucket directory path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
settings
|
crv.io.config.IoSettings
|
IO settings. |
required |
run_id
|
str
|
Run identifier. |
required |
table_name
|
str
|
Canonical table name. |
required |
bucket_id
|
int
|
Non-negative bucket id. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Path " |
Source code in src/crv/io/paths.py
crv.io.paths.part_paths
part_paths(
settings: crv.io.config.IoSettings,
run_id: str,
table_name: str,
bucket_id: int,
uuid_str: str,
) -> PartPaths
Compute temporary and final part file paths for a given bucket and UUID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
settings
|
crv.io.config.IoSettings
|
IO settings. |
required |
run_id
|
str
|
Run identifier. |
required |
table_name
|
str
|
Canonical table name. |
required |
bucket_id
|
int
|
Bucket id. |
required |
uuid_str
|
str
|
Hex string used to build a unique part name. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
PartPaths |
crv.io.paths.PartPaths
|
Paths for .parquet.tmp and final .parquet files. |
Source code in src/crv/io/paths.py
crv.io.paths.validate_run_id
Validate that a run_id is safe for filesystem paths.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
run_id
|
str
|
Proposed run identifier. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Same value if valid. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If run_id contains disallowed characters or is empty. |
Source code in src/crv/io/paths.py
crv.io.paths.normalize_run_id
Normalize a free-form run id by trimming and replacing spaces with underscores, then validate against the allowed character set.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
run_id
|
str
|
Candidate run identifier. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Normalized and validated run id. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the normalized run_id remains invalid. |