These days I’m almost completely working inside the Python ecosystem, but I still miss tidyverse, particularly ggplot2, and I haven’t found a proper replacement in Python. Plotnine is a valiant effort, but not quite the same. I’ve tried to play around various ways to run Python and R in the same notebook, but in the end they all seem very bristle and involve lots of installing libraries and ipython extensions etc. Basically too much a pain in the ass, especially when working on multiple environments and projects. One reason I like Databricks is how the Databricks notebooks combine SQL, Python and R so seemlessly, but alas, Databricks is way too expensive for my personal use so its not an option.
Recently I had an aha moment that if I were able to just pass dataframes between R and Python, they both could run nicely in separate processes, perhaps even using RStudio for R (still on the fence whether the R integration with VS Code is superior). I was thinking of maybe doing it with simple csv files for start, or perhaps with sqlite, but then ran into a perfect solution: DuckDB! It seems I’m not the only person with this problem. And of course DuckDB offers a ton more. Furthermore, when going through one fantastic Dagster + DuckDB tutorial in YouTube https://www.youtube.com/watch?v=33sxkrt6eYk, there was yet another amazing seeming tool introduced: localstack! So, it might be possible to use DuckDB as the interface between R and Python (and also use it in Dagster), and have the DuckDB do some persistent storage very elegantly in S3 buckets I can host myself for free. Next I’m trying to check if I could somehow configure the localstack to use my Synology NAS for the buckets which would be even better. Then I could have my own mini AWS cloud that I could play around fully free, and use the DuckDB as an interface to pass data between Python, R, and who knows what else I end up working with (Julia, I’m coming for you one day, when I gather up the courage!)