Manage, organize, and share data in Collections
Collections are the top-level unit of organization in Delfini. Collections hold data items and manage metadata. Collections can be private to individual users, shared amongst teams of collaborators, or made public to all users of a given Delfini instance.
Collections can hold any data type as files. Data can be uploaded or downloaded using the web client, CLI, or API. Individual data items can be organized into folders, and assigned metadata. The collection as a whole can also be assigned a description, metadata, and tags.
Links
Delfini collections can include links to data. These links are treated equivalent to uploaded data items, and Delfini automatically manages retrieving and caching the data from the target. Links can point to items in other collections, on the local instance or a remote Delfini instance, or data hosted by public repositories.
Tabular Data
Delfini has native support for tabular data items. Uploaded data or linked data can be parsed from a variety of built-in data formats, including CSV, Parquet, and XLSX; parsers can also be added via plugin, with existing support for SAS, SPSS, and STATA.
Tabular data is easily previewed through the web interface and simple
visualizations can be added. Data is also accessible via API, with
support for GA4GH Data
Connect. The
pydelfini
Python client also
provides easy access to tabular data as Pandas dataframes for use in
Jupyter notebooks or scripts.
Column-level Metadata and Data Elements
Delfini tracks tabular column names, labels, and data types. For advanced metadata management, Delfini leverages data elements to capture data descriptions, permissible values, and concepts. Data elements can be defined by users as data dictionaries within collections, registered site-wide as common data elements (CDEs), or pulled from external repositories via plugin. Delfini offers helpful tools to assist and encourage users to assign data elements to their data.
Data Transformation and Dataviews
Delfini supports data transformation through the concept of a dataview. For those familiar with relational database concepts, a dataview is very similar to a normal database view, but with several key distinctions:
- Dataviews can access tabular data from other data items within the same collection; links can be used to bring in data from other collections.
- Dataviews can be created using the web client’s built-in dataview builder, with a drag-and-drop interface and live previews.
- Data elements are automatically propagated through dataviews, and dataviews can also be used to add or transform data elements.
Dataviews are powered by PRQL, an open-source data pipeline language that can be compiled into a SQL query. This resulting SQL query is then run by Delfini’s internal SQL execution engine, which is sandboxed for security.
Advanced users can create and upload dataviews as native PRQL code via the API. Delfini also includes support for authoring and sharing custom transformation functions expressed as PRQL functions.
Accounts and Spaces
Delfini includes support for shared data management and community building through accounts. Accounts can hold collections and users, giving users a shared place to manage their data. Accounts also have spaces, editable pages suitable for holding notes, documentation, and other helpful information about the group’s data. Space pages can include dynamic widgets that can be used for displaying data visualizations, timelines, and more.
Browsing and Searching for Data
Delfini allows users to browse for collections and filter for specific metadata fields or assigned data elements. Users can also text search across collections, items, data elements, and accounts.
Infrastructure and Administration
Delfini was designed to scale from a single user’s desktop up to a multi-site collaboration. It is available packaged as a single Docker container, and Terraform infrastructure templates are available for deploying on AWS and GCP.
Delfini supports users logging in via username and password, as well as native support for Google, GitHub, and other OAuth2 compatible identity providers. Audit logging and user moderation features are available.
Delfini stores uploaded data on object storage (AWS S3 and/or Google Cloud Storage) or on the local filesystem, depending on configuration. It uses PostgreSQL (multi-user) or SQLite (single-user) for metadata management.
Customization via Plugins
Delfini supports plugins on both the backend and frontend user interface. These plugins can be used to add functionality such as:
- Change the styling and content of the web interface
- Add support for other data formats or data sources
- Add data elements from external registries
- Perform security scans on uploaded data
- Add new widgets to account space pages
- Manage data sharing and publication