Rancher struggles

For whatever reason K3d has a low level running traefik pod (resulting is some odd modprobe errors that I don’t even want to try to fix.) So, today I installed the K3s (instead of the k3d which runs in docker). It seems to have no problems with traefik, so going to try to install Rancher on it, but now Rancher website is down so waiting to get to read the logs

(NOTE: Got a call from Suse today.. few days I got TWO calls from databricks.. simply because I’m browsing their tutorials. Please, stop the damn direct marketing calls, its insane!!!)

Sidetracks

I thought it would be nice to have a personal persistent storage, so was thinking of buying NAS for that. Seems its going to be bit more complicated, and I might have to setup a nas CSI (container storage interface), storage classes etc., so thats probably going to be one deep rabbit hole again.

Sidetrack 3 : Venvs

As I was planning to start yet another Python project I realized that I’m having way too many venvs scattered everywhere (I usually create a ‘dedicated’ venv in the project dir.) So now Im looking into virtuanenvwrapper to store the venvs in sensible place, and also use same venv in similar projects so I dont have to install pytorch etc everywhere costing my precious SSD space.

Ranch is ready!

Finally, the local rancher is running:

Back to DBT studies

Models

Naming Conventions In working on this project, we established some conventions for naming our models.

Sources (src) refer to the raw table data that have been built in the warehouse through a loading process. (We will cover configuring Sources in the Sources module) Staging (stg) refers to models that are built directly on top of sources. These have a one-to-one relationship with sources tables. These are used for very light transformations that shape the data into what you want it to be. These models are used to clean and standardize the data before transforming data downstream. Note: These are typically materialized as views. Intermediate (int) refers to any models that exist between final fact and dimension tables. These should be built on staging models rather than directly on sources to leverage the data cleaning that was done in staging. Fact (fct) refers to any data that represents something that occurred or is occurring. Examples include sessions, transactions, orders, stories, votes. These are typically skinny, long tables. Dimension (dim) refers to data that represents a person, place or thing. Examples include customers, products, candidates, buildings, employees. Note: The Fact and Dimension convention is based on previous normalized modeling.

More on sources

Redshift + S3 for time-series data

https://github.com/garystafford/kinesis-redshift-streaming-demo