Apache Iceberg is an open-source protocol developed by Netflix for storing massive amounts of data. Iceberg tables are being used by many companies to perform large transformations on large amounts of data.ย
However, many of the current tools teams use to do this data work on top of Iceberg tables are built for the previous era and are not in the cloud or provide any level of collaboration. This is where Hex comes in.
Often teams choose Apache Iceberg when they prefer to manage their own data, without relying on on-premises or managed solutions, and want more control over security and partitioning. They may also want a table format that can be accessed by multiple query engines, such as Athena, Google, or Snowflake with consistent performance.
With Iceberg, you can create and manage tables in a more flexible way, without the need for complex and time-consuming schema migrations. Iceberg also provides better support for versioning and time travel, which are essential features for many use cases.
There are many engines you can run on top of Iceberg tables which can then be directly queried with Hex. In this example we use Dremio, and then query the Iceberg Table like any other table:
In Hex you can simply add a chart cell or use Python to visualize the data returned by your query on the Iceberg table. These visualization can then be composed into an interactive report to share with the whole team, while respecting access controls to the data.
Apache Iceberg provides users with the control they want over their data and Hex provides them with a powerful interface to work with that data. This allows data teams to easily collaborate on data engineering tasks, data science, and machine learning when using Iceberg tables. You donโt have to choose between open source storage and managed notebooks for analytics & data science.