Dataset
fricon uses Arrow IPC format to store datasets. A basic knowledge of Arrow
data structures can be helpful to understand how fricon works.
Apache Arrow
You may be familiar with pandas, which is a widely-used data manipulation library in Python. Arrow is a similar library but with much stricter data types requirements. Each Arrow table comes with a schema that specifies the data types of each column. Following are some key classes in the python binding of Arrow:
pyarrow.RecordBatch: A record batch is a collection of arrays with the same length. Each record batch is associated with a schema.pyarrow.Array: An array is a sequence of values with the same data type.pyarrow.Scalar: A scalar is a single value with a data type.pyarrow.Schema: A schema is a collection of fields. Each field corresponds to a column in a table.pyarrow.Field: A field is a data type with a name.pyarrow.DataTypepyarrow.Table: A helper type to unify representations of single and collection of record batches with the same schema.
How are datasets stored?
A dataset is exactly one Arrow table stored in Arrow IPC format. When a dataset is created, the schema of the table is automatically inferred from the first row of data written. This allows for flexible data collection without requiring manual schema definition.
Type inference
fricon MVP currently supports a focused set of data types optimized for scientific measurements and signal processing. The following table lists the supported types:
| Python type | Dataset data type | Description |
|---|---|---|
float |
Float64 |
64-bit floating point numbers |
complex |
Complex128 |
128-bit complex numbers (real + imaginary) |
fricon.Trace |
Trace |
Time series data with various x-axis formats |
Note: The MVP version intentionally limits type support to float and complex types for simplicity. Additional types (bool, int, str) will be supported in future releases.
Supported trace variants
Trace data supports three different formats depending on how the x-axis (independent variable) is stored:
- SimpleList: Only y-values are stored, x-values are implicit indices (0, 1, 2, ...)
- FixedStep: Regular spacing with x₀ (starting point) and step size
- VariableStep: Arbitrary x-values stored alongside y-values
Future extensions
Additional data types (bool, int, str, timestamps) will be supported in future versions. The current focus on float, complex, and trace types ensures optimal performance and correctness for the most common scientific use cases.