crv.io.manifest
Experimental API
crv.io.manifest
Per-table manifest data structures and helpers.
Manifest layout (JSON at
Notes: - Part path is stored relative to the bucket directory to keep the manifest relocatable. - Partition key is zero-padded "000123" (same format as bucket=000123).
crv.io.manifest.PartMeta
dataclass
Per-part metadata recorded in the table manifest.
Attributes:
| Name | Type | Description |
|---|---|---|
path |
str
|
Relative file name under the bucket directory
(e.g., "part- |
rows |
int
|
Row count for this part. |
bytes |
int
|
File size in bytes for this part. |
tick_min |
int
|
Minimum tick present in this part. |
tick_max |
int
|
Maximum tick present in this part. |
created_at |
str
|
ISO-8601 timestamp indicating when the part was written. |
Source code in src/crv/io/manifest.py
crv.io.manifest.PartitionMeta
dataclass
Aggregated metadata for a single partition (bucket).
Attributes:
| Name | Type | Description |
|---|---|---|
bucket_id |
int
|
Numeric bucket identifier (e.g., 123). |
state |
crv.io.manifest.State
|
Partition state, typically "ready" after a successful append. |
tick_min |
int
|
Minimum tick across all parts in this bucket. |
tick_max |
int
|
Maximum tick across all parts in this bucket. |
row_count |
int
|
Total number of rows across parts in this bucket. |
byte_size |
int
|
Total bytes across parts in this bucket. |
parts |
list[crv.io.manifest.PartMeta]
|
Ordered list of part metadata objects. |
Source code in src/crv/io/manifest.py
crv.io.manifest.TableManifest
dataclass
Manifest model persisted at
Attributes:
| Name | Type | Description |
|---|---|---|
table |
str
|
Canonical table name (lower_snake). |
version |
int
|
Manifest schema version (independent of core SCHEMA_V). |
created_at |
str
|
ISO-8601 creation timestamp. |
updated_at |
str
|
ISO-8601 timestamp of the last update. |
partitions |
dict[str, crv.io.manifest.PartitionMeta]
|
Mapping of zero-padded bucket key (e.g., "000123") to PartitionMeta. |
Notes
- Part paths are stored relative to the bucket directory to keep the manifest relocatable.
- Partition keys are zero-padded to 6 digits (e.g., "000123").
Source code in src/crv/io/manifest.py
crv.io.manifest.TableManifest.partitions
instance-attribute
crv.io.manifest.TableManifest.to_json_obj
Source code in src/crv/io/manifest.py
crv.io.manifest.TableManifest.from_json_obj
classmethod
Source code in src/crv/io/manifest.py
crv.io.manifest.load_manifest
load_manifest(
settings: crv.io.config.IoSettings, run_id: str, table_name: str
) -> TableManifest | None
Load a table's manifest.json if present.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
settings
|
crv.io.config.IoSettings
|
IoSettings used to resolve layout/root directory. |
required |
run_id
|
str
|
Run identifier (stored as string on disk). |
required |
table_name
|
str
|
Canonical table name (lower_snake). |
required |
Returns:
| Type | Description |
|---|---|
crv.io.manifest.TableManifest | None
|
TableManifest | None: Parsed manifest model, or None if not found. |
Notes
- This function does not validate against core descriptors; it simply loads the persisted manifest metadata for pruning/inspection.
Source code in src/crv/io/manifest.py
crv.io.manifest.write_manifest
write_manifest(
settings: crv.io.config.IoSettings,
run_id: str,
table_name: str,
manifest: crv.io.manifest.TableManifest,
) -> None
Persist manifest.json atomically.
The write path is: serialize JSON → write to "
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
settings
|
crv.io.config.IoSettings
|
IoSettings for layout. |
required |
run_id
|
str
|
Run identifier. |
required |
table_name
|
str
|
Canonical table name. |
required |
manifest
|
crv.io.manifest.TableManifest
|
Manifest model to serialize. |
required |
Raises:
| Type | Description |
|---|---|
OSError
|
If filesystem operations fail (caller may wrap in IoManifestError). |
Notes
- Ensures directory exists prior to writing.
- Keeps the manifest relocatable by using relative part paths.
Source code in src/crv/io/manifest.py
crv.io.manifest.update_with_new_part
update_with_new_part(
manifest: crv.io.manifest.TableManifest,
bucket_id: int,
part_meta: crv.io.manifest.PartMeta,
) -> None
Update the manifest with a newly written part for the given bucket.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
manifest
|
crv.io.manifest.TableManifest
|
Manifest instance to mutate. |
required |
bucket_id
|
int
|
Numeric bucket id (e.g., 123). |
required |
part_meta
|
crv.io.manifest.PartMeta
|
Metadata of the newly written parquet part. |
required |
Notes
- Creates a new PartitionMeta if the bucket is not yet present.
- Updates per-bucket aggregates (tick range, row/byte totals).
- Sets partition state to "ready" under single-writer semantics.
Source code in src/crv/io/manifest.py
crv.io.manifest.new_manifest
Create a fresh TableManifest with no partitions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table_name
|
str
|
Canonical table name (lower_snake). |
required |
version
|
int
|
Manifest schema version (default: 1). |
1
|
Returns:
| Name | Type | Description |
|---|---|---|
TableManifest |
crv.io.manifest.TableManifest
|
Newly created manifest with timestamps set to now. |
Source code in src/crv/io/manifest.py
crv.io.manifest.rebuild_manifest_from_fs
rebuild_manifest_from_fs(
settings: crv.io.config.IoSettings, run_id: str, table_name: str
) -> TableManifest
Rebuild a manifest by walking the table directory and scanning each part to compute rows and tick ranges. This is a slower, best-effort operation intended for recovery or validation.
Requires polars. We import it lazily to avoid import overhead elsewhere.