small knugget of knowledge in case your Rancher bootstrap password is not wokring:
kubectl -n cattle-system exec $(kubectl -n cattle-system get pods -l app=demo-rancher | grep ‘1/1’ | head -1 | awk ‘{ print $1 }’) – reset-password
More obscure fix on Rancher not working on Ubuntu 22.04:
GRUB_CMDLINE_LINUX=”cgroup_memory=1 cgroup_enable=memory swapaccount=1 systemd.unified_cgroup_hierarchy=0”
sudo update-grub
sudo reboot
Finishing the DBT Fundamentals course
Testing
- Testing is used in software engineering to make sure that the code does what we expect it to.
- In Analytics Engineering, testing allows us to make sure that the SQL transformations we write produce a model that meets our assertions.
- In dbt, tests are written as select statements. These select statements are run against your materialized models to ensure they meet your assertions.
Tests in dbt
- In dbt, there are two types of tests - schema tests and data tests:
-
Generic tests are written in YAML and return the number of records that do not meet your assertions. These are run on specific columns in a model.
-
Singular tests are specific queries that you run against your models. These are run on the entire model.
-
dbt ships with four built in tests: unique, not null, accepted values, relationships.
- Unique tests to see if every value in a column is unique
- Not_null tests to see if every value in a column is not null
- Accepted_values tests to make sure every value in a column is equal to a value in a provided list
- Relationships tests to ensure that every value in a column exists in a column in another model (see: referential integrity)
-
Generic tests are configured in a YAML file, whereas singular tests are stored as select statements in the tests folder.
- Tests can be run against your current project using a range of commands:
- dbt test runs all tests in the dbt project
- dbt test –select test_type:generic
- dbt test –select test_type:singular
- dbt test –select one_specific_model
Documentation
Documentation
Documentation is essential for an analytics team to work effectively and efficiently. Strong documentation empowers users to self-service questions about data and enables new team members to on-board quickly.
Documentation often lags behind the code it is meant to describe. This can happen because documentation is a separate process from the coding itself that lives in another tool.
Therefore, documentation should be as automated as possible and happen as close as possible to the coding.
In dbt, models are built in SQL files. These models are documented in YML files that live in the same folder as the models.
Writing documentation and doc blocks
Documentation of models occurs in the YML files (where generic tests also live) inside the models directory. It is helpful to store the YML file in the same subfolder as the models you are documenting.
For models, descriptions can happen at the model, source, or column level.
If a longer form, more styled version of text would provide a strong description, doc blocks can be used to render markdown in the generated documentation.
Generating and viewing documentation
In the command line section, an updated version of documentation can be generated through the command dbt docs generate. This will refresh the view docs
link in the top left corner of the Cloud IDE.
The generated documentation includes the following:
Lineage Graph
Model, source, and column descriptions
Generic tests added to a column
The underlying SQL code for each model
and more…
Deployment
Development vs. Deployment
- Development in dbt is the process of building, refactoring, and organizing different files in your dbt project. This is done in a development environment using a development schema (dbt_jsmith) and typically on a non-default branch (i.e. feature/customers-model, fix/date-spine-issue). After making the appropriate changes, the development branch is merged to main/master so that those changes can be used in deployment.
- Deployment in dbt (or running dbt in production) is the process of running dbt on a schedule in a deployment environment. The deployment environment will typically run from the default branch (i.e., main, master) and use a dedicated deployment schema (e.g., dbt_prod). The models built in deployment are then used to power dashboards, reporting, and other key business decision-making processes.
- The use of development environments and branches makes it possible to continue to build your dbt project without affecting the models, tests, and documentation that are running in production.
Creating your Deployment Environment
Scheduling a job in dbt Cloud
- Scheduling of future jobs can be configured in dbt Cloud on the Jobs page.
- You can select the deployment environment that you created before or a different environment if needed.
- Commands: A single job can run multiple dbt commands. For example, you can run dbt run and dbt test back to back on a schedule. You don’t need to configure these as separate jobs.
- Triggers: This section is where the schedule can be set for the particular job.
- After a job has been created, you can manually start the job by selecting Run Now
Reviewing Cloud Jobs
- The results of a particular job run can be reviewed as the job completes and over time.
- The logs for each command can be reviewed.
- If documentation was generated, this can be viewed.
- If dbt source freshness was run, the results can also be viewed at the end of a job.