Software

Programming Languages

The Python Programming Language

Python is a versatile, general-purpose programming language increasingly used in data science and machine learning. Its clear and readable syntax makes Python particularly beginner-friendly. Thanks to a vast ecosystem of libraries, Python covers a wide range of applications – from data analysis and visualization to statistical modelling and deep learning.

In our courses, Python is mainly used in Data Science 3 (DS3) for machine learning.

Installation

For working with Python in data science, we recommend installing one of the following distributions:

  • Miniforge – Lightweight, open-source distribution with conda-forge as the default channel. Recommended for experienced users.
  • Anaconda – Comprehensive distribution with many pre-installed data science packages. Ideal for beginners.

Both distributions include the conda package manager, which simplifies the installation and management of packages and environments.

TipRecommendation

We recommend Miniforge for a lean installation where you only install the packages you actually need. Alternatively, Anaconda provides a convenient solution with many pre-installed packages.

After installing Miniforge or Anaconda, you can install Python libraries with conda:

conda install pandas numpy matplotlib seaborn scikit-learn scipy statsmodels

The Graphical User Interface (GUI)

Python itself does not have a graphical user interface – it is typically used via the command line or within an integrated development environment (IDE). For working with Python, we recommend one of the following IDEs:

  • Positron – New IDE from Posit with native support for R and Python
  • Visual Studio Code – Versatile code editor with Python extension
  • JupyterLab – Interactive notebook environment, especially popular in data science
TipNote

JupyterLab is already included in Anaconda and can also be installed via Miniforge: conda install jupyterlab

Key Libraries

Library Description
pandas Data manipulation and analysis
numpy Numerical computations and array operations
matplotlib Basic data visualization
seaborn Statistical visualization (built on matplotlib)
scikit-learn Machine learning (classification, regression, clustering)
scipy Scientific computing and statistical tests
statsmodels Statistical models and econometric analysis

Further Resources