Transforms are the mechanism by which raw tabular data is shaped into more useful forms. Instead of altering the original files or datasets, transforms provide a structured way to filter, join, aggregate, or enrich data while preserving lineage and reproducibility. This ensures that collections remain authoritative sources of truth, while still enabling flexibility for analysis and presentation.
Transforms can be thought of as “recipes” that describe how data should be processed. They can be applied to individual items, combined across collections, and reused as building blocks for further work.
Principles
A good transformation system should:
- Preserve original data – transformations are non-destructive and leave source files and metadata untouched.
- Be composable – multiple steps can be chained together, with each step building on the output of the previous one.
- Remain transparent – each stage is visible to users, with a clear understanding of what operations were performed.
- Carry meaning forward – column definitions and data elements propagate through the pipeline, so context is not lost when data is reshaped.
- Support error handling – when operations fail or produce conflicts, the system surfaces them in a way that helps users correct problems without guesswork.
How Transforms Work
At a high level, transforms operate on tabular data sources using a set of well-defined steps. Common operations include:
- Filtering – limiting rows based on conditions.
- Selecting – narrowing focus to relevant columns.
- Joining – combining multiple datasets on shared keys.
- Aggregating – summarizing rows into totals, averages, or grouped values.
- Enriching – adding new values from reference functions or libraries.
- Reshaping – renaming, re-ordering, or pivoting data for easier use downstream.
Each step is independent and reversible, allowing users to experiment with different views without fear of breaking the original data.
The Dataview
Delfini implements data transforms through the concept of a dataview. A dataview represents a transformed version of one or more datasets, and is defined by a sequence of transformation steps. Dataviews are data items in Delfini just like uploaded or linked data, and they update automatically when their definition or source data is updated.
By combining sources and applying transformations, dataviews allow users to isolate the exact slice of data they need, while still keeping the full provenance of how it was created. In this way, transforms act as both a workflow and a record of intent: they show not only the final output but also the path taken to get there.
Why It Matters
Consistent, transparent transformations make it easier to:
- Reproduce analyses across teams and projects.
- Maintain confidence in the validity of shared results.
- Explore new perspectives on the same underlying data.
Transforms provide the connective tissue between raw information and meaningful insight — without sacrificing accuracy, traceability, or collaboration.