Data Science

A crash course on data science.

OS-level virtualization systems

OS-level virtualization is a key feature of data science. The idea is to have a contained environment that provides functionality, security, and replicability. For single machines one of the best implementations of OS-level virtualization is Docker.

Last updated on 2021-05-04

Version Control

Version control is essential for data science and software development. A simple way to incorporate version control is through using Git and a remote hosting platform. Git is a distributed version control system for tracking changes in source code during software development, but can be used as well for research, content, and many other types of source code.

Last updated on 2021-05-04

Relational A Relational Database Management System (RDBMS) is software that implements a digital database based on the relational model of data, as proposed by Codd (1970). This system allows for efficient storage and operations on tabular data and commonly implements the Structured Query Language (SQL).

Last updated on 2021-05-04

High Performance Computing

A High Performance Computing (HPC) is a supercomputer (i.e., a cluster of many computers/nodes). If you are operating in a HPC, it is likely that you have access to submitting jobs through Slurm: Workload Manager, an open-source job scheduler for Linux and Unix-like kernels, used by many of the world’s supercomputers and computer clusters.

Last updated on 2021-05-04

A Short tutorial using the Julia Programming Language Regular expressions allows to define a search pattern in a corpus. For example, we might want to find if a sequence of characters occurs in some text.

Last updated on 2021-05-04