In the world of analytics and data science, Jupyter Notebooks have been a popular tool for years. However, they have many limitations which have been well documented, particularly when it comes to interpretability, reproducibility, and performance.
These limitations stem from the fact that traditional notebooks are built around imperative programming, a paradigm where a list of sequential statements are run in order and update a shared mutable state. The shared mutable state refers to variables or data structures whose values can change over time as the notebook is executed.
When working on a complicated notebook, it can be challenging to understand the current state of the notebook from just reading it because of this shared mutable state. The state of variables may depend on the order and or the frequency of cell executions. Since cells can be run in any order there’s no guarantee what the last thing that was written to state was. This can lead to inconsistencies and reproducibility problems, making it difficult to share and reproduce results.
To address the shared mutable state issues, a new approach based on reactive programming has been introduced. This approach uses a directed acyclic graph (DAG) to manage dependencies between cells, allowing for better interpretability, reproducibility, and performance.
With this approach, cells are treated as nodes in the graph, and the edges represent the dependencies between them. This allows for a more precise understanding of how cells depend on each other, making it easier to track changes and reproduce results.
To implement this approach in Hex, code is parsed directly, generating a semantic representation of what the code is supposed to do. From there, a DAG is created based on the cells in the notebook, allowing you to automatically re-run dependent cells and keep the notebook state in sync with memory. Usability improvements were also made, such as hiding import edges by default and merging duplicate edges to reduce clutter in the graph UI.
This DAG-based approach to notebooks makes it easier to work with data and produce reliable, reproducible results by visualizing and respecting a consistent order of execution for the notebook.