To install Python and pip. This tutorial assumes that you have some familiarity with Python, but you should be able to follow along even if you're coming from a different programming language. To check that Python and pip (Python's package manager) are already installed in your environment or install them, follow the instructions here.
Dagster supports Python 3.7+.
To install Dagster, Dagit, and the packages you'll be using during this tutorial, run the following command from your terminal:
dagster is the command line interface (CLI) tool to run Dagster. For more information, refer to the Dagster installation guide.
dagit is the web-based UI for operating Dagster jobs, a library for your assets, a type-aware config editor, and a live execution interface
It also installs packages that aren't necessary for every Dagster project but are used for this tutorial. You don't have to read up on them, but if you're curious:
requests will be used to download data from the internet
pandas is a popular library for working with tabular data
matplotlib makes it easy to make charts in Python
wordcloud has utilities for text processing and making word clouds
dagster_duckdb manages how Dagster can read and write to DuckDB, an in-memory data warehouse similar to SQLite, that you'll use for this tutorial
dagster_duckdb_pandas allows loading data from DuckDB into Pandas DataFrames
To verify that your Dagster installation was successful, it's time for you to create your first Dagster project! You'll use the Dagster scaffolding command to give you an empty Dagster project that you can run locally.
To create the project, run:
dagster project scaffold --name tutorial-project
To verify that it worked and that you can run Dagster locally, run:
cd tutorial-project
dagster dev
Navigate to localhost:3000. You should see the Dagster UI. This command will run Dagster until you're ready to stop it. To stop the long-running process, press Control+C from the terminal that the process is running on.