Migrating from C++ to Python

Malte Schwerhoff, D-INFK, ETHZ

Chapter 1:
About This Tutorial

Welcome to the tutorial Migrating from C++ to Python! This tutorial is intended for students who took an introduction to computer science and programming course that used C++, but now need to solve exercises and perform tasks using Python.

Attention

This tutorial is work in progress, and not yet complete. To report mistakes, make suggestion or generally give feedback, contact Malte Schwerhoff, or contribute to the tutorial on GitLab.

Note

This tutorial comes with various code examples in C++ and Python that you can run and experiment with on Code Expert. Happy coding!

Target Audience

The tutorial was written for students who took one of the following Computer Science courses in ETH’s Basisjahr: 252-0832-00L, 252-0848-00L, 252-0847-00L and 252-0856-00L. Consequently, this migration tutorial roughly follows the order in which different topics were presented in these courses, and it is assumed that the reader is familiar with them. Moreover, the tutorial regularly contrasts equivalent C++ and Python programs, in order to help understanding the latter.

Goals and Non-Goals

This tutorial was written under the assumption that the reader (1) has a specific programming task to solve, (2) remembers from their Basisjahr course which general concepts and/or C++-specific language features might help solving the task, but (3) does not yet know a suitable Python counterpart. The tutorial assumes Python 3.6 and higher, but typically will not use features that have been added very recently (i.e. at the time of writing, from Python 3.9).

This tutorial does not try to provide a general, self-contained introduction to Python. It also does not try to enumerate all possible alternatives, e.g. for formatting a string, which matches our policy from the C++ Basisjahr courses. Moreover, in order to keep the programs closer to those you know from your C++ lecture(s), the Python programs presented here may not always be the most idiomatic (pythonic) way of solving a problem in Python. Lastly, the tutorial does not enumerate or suggest Python libraries, nor does it explain how to use popular libraries.

As always, it is strongly recommended to play with the code snippets, change them, test when things break, etc. This is crucial for learning any programming language — just as it is crucial to speak, and not just listen, in order to learn a human language.

Further Resources

A plethora of resources for learning and mastering Python exist, and likewise for applying particular Python libraries (e.g. Panda or NumPy) in particular problem domains (e.g. data analysis or scientific computing). The following list enumerates a few resources suitable for beginners, with no claim of exhaustiveness:

Table of Content

  1. About This Tutorial
  2. Comparing C++ and Python
  3. A First Python Program
  4. Integers
  5. Truth Values
  6. Conditionals, Blocks and Scopes
  7. Loops
  8. Floating-point Numbers
  9. Functions
  10. Libraries
  11. Containers and Iterators
  12. References, Pointers, Aliasing, and Argument Passing
  13. Strings, Characters, and Streams
  14. Recursion
  15. Classes

Chapter 2:
Comparing C++ and Python

Philosophy

C++

The philosophy that guides the design and development of the C++ programming language is “zero-overhead abstractions” in order to ensure resource efficiency (memory usage) and performance (execution speed). In a nutshell, this means that a C++ program that, for convenience, uses an abstraction such as a vector and its iterator, should execute as fast as if the programmer had written machine code directly.

Consequently, C++ offers programmers a large variety of language features, probably more than any other language does. This enables – but also burdens – users to always make the right choice, e.g. to choose between signed and unsigned int, const and non-const, or between value and pointer types. Moreover, it burdens (or enables) users to explicitly manage their memory: remember new, delete, destructors, etc.? This duality – enabler or burden – also affects teaching: C++ allows teaching a wide range of concepts and features, which can be instructive and insightful, but also overwhelming.

Python

Python’s design philosophy is “There should be one – and preferably only one – obvious way to do it”. Due to this, Python is rather easy to learn and to apply, which makes it very popular in areas outside of “pure” computer science, such as mathematics, engineering or biology, but also in computer science itself, and in schools.

Python’s philosophy and ease of use are not without disadvantages, though: potential performance problems are the most obvious one; more debatable are the risk of not properly understanding the concepts and foundations of programming, because the language successfully hides them, or makes it seemingly unnecessary to study them.

Program Execution

C++

C++ is a statically typed language, and its type checker can therefore report many (potential) problems before the program is even executed. A C++ program is compiled to machine code, which enables fast execution. Possible disadvantages of this approach are (the feeling of) having to deal with compiler errors before getting to the fun part of actually executing a program, and having to compile a program anew for different operating systems and hardware platforms.

Python

Python is a dynamically typed language (more details later), which means that only very few properties are checked before the program is executed, and most errors are reported during execution. This makes it easier to “get going”, but means that testing becomes even more relevant for uncovering bugs.

Python is an interpreted language: instead of being compiled to machine code, the interpreter executes every statement on-the-fly, similar to how you execute a C++ program on paper or in your head. This indirectly enables one of Python’s major advantage over C++: its huge and well-maintained repository of libraries and frameworks – shipped as packages – which are easily installed and ready to use. This is one of the reasons for Python’s success; a corresponding repository for C++ does (yet) exist.

Being dynamically typed and interpreted also enables Python programs to inspect and modify themselves during execution. Code inspection (called introspection or reflection) can be useful when neither documentation nor googling reveals how to use a library in a certain situation. Modifying a running program enables certain important use-cases, but can also horribly break things. These advanced topics won’t be discussed in this tutorial, but plenty of corresponding online resources exists, such as these three here for inspection.

Programs written in basic Python are usually quite a bit slower than a corresponding C++ program, in particular when computations over large datasets are performed. However, Python is very popular in the data science community – how can that be, if Python is so slow? The “trick” is that data science libraries/frameworks are often actually written in C++, and then made available to Python programmers via an appropriate interface. This advanced topic will not be discussed in this tutorial, but plenty of corresponding online resources exists, such as this article or this one.

Chapter 3:
A First Python Program

Getting Started

To get started, download and install Python for your operating system from python.org or from your operating system’s app store, or use TODO the playground project of the Code Expert course corresponding to this tutorial.

A Familiar Program

C++
// This program inputs an integer a from the user,
// computes a^8, and outputs the result
#include <iostream> // Library for in- and output

int main() {
  // Output an informing prompt
  std::cout << "Compute a^8 for a = "; 

  int a; // Declare and input a value for variable a
  std::cin >> a; 

  // Declare variable b, and and initialise with a*a
  int b = a*a; // b = a^2

  // Remaining computation to get a^8
  b = b*b;     // b = a^4
  b = b*b;     // b = a^8

  // Output result in a readable form
  std::cout << a << "^8 = " << b << '\n';

  return 0;
}
Python
# This program inputs an integer a from the user,
# computes a^8, and outputs the result

# Output an informing prompt, and read a value 
# for variable a
a = input("Compute a^8 for a = ")

# Convert the input string into an integer
# (or fail with an error)
a = int(a)

# Initialise b with a*a
b = a*a # b = a^2

# Remaining computation to get a^8
b = b*b # b = a^4
b = b*b # b = a^8

# Output result in a readable form
print("{}^8 = {}".format(a, b))

First Observations

Python uses # to start a line comment, and " to begin and end string literals. Unlike C++, Python does not require a semicolon (;) to end a statement, line breaks are used instead. If you want to split a statement across multiple lines, e.g. for readability, Python offers multiple ways to do this.

The above program is called a Python script; basically, because it does not have a main function. Python has something similar to the main function you know from C++, as discussed in TODO: some later chapter.

Python’s functions for inputting a value from the user (input) and outputting it back to the user (print) don’t need to be imported (that’s what includes are called in Python), but Python supports libraries and imports, as discussed in TODO: some later chapter.

The most important difference is that Python 1. does not require explicit variable declaration and 2. variables are dynamically typed. Consider the following two lines:

Python
a = input("Compute a^8 for a = ")
a = int(a)

Here, three things happen that differ significantly from how a C++ program behaves:

  1. The first time variable a is written to, it is implicitly declared and brought into scope.
  2. The input function always returns a string, and a thus is a variable of type string.
  3. The int function takes the string value and converts it into an integer, which is then stored in a — variable a thus dynamically changes its type to integer.

This type change can even be observed, by using the type function:

Python
a = input("Compute a^8 for a = ")
print(type(a)) # prints "str"

a = int(a)
print(type(a)) # prints "int"

Implicit Variable Declarations

In C++, you have to declare a variable before you can use it, in Python, it is declared implicitly, upon first use. This is convenient, but not without risks, as the following program illustrates:

Python
onomatopoeia = 1
onomotopoeia = onomatopoeia + 1
print(onomatopoeia) # prints 1, but we wanted it to print 2

The program contains a small typo, which would have resulted in a compiler error in C++. In Python, however, the program executes without an error: instead of complaining about the misspelled variable on line 2 not being declared, the assignment implicitly declares a new variable.

Dynamically Typed Variables

It is important to understand that variables in Python are still typed, even if the types don’t explicitly show up in the program. Consider the following program:

Python
num = input("A number, please: ")
num = num + 1 # error
print(num)

This program will abort – at runtime – with the error TypeError: can only concatenate str (not "int") to str. In C++, this program would have been rejected by the compiler.

One way to fix the program is to change the type of value num holds, and thereby the type of num itself:

Python
num = input("A number, please: ")
num = int(num) + 1 # ok
print(num)

Changing a variable’s type, however, can make the program more complicated to understand because developers must keep the types in their head when reasoning about the program. It is therefore advised to rarely change a variable’s type. Type changes can often be avoided completely, as illustrated by the next two code snippets:

Python
num = int(input("A number, please: "))
num = num + 1 # could be merged with line 1
print(num)
Python
num_string = input("A number, please: ")
num = int(num_string) + 1
print(num)

Python supports optional type hints, which are not directly used by Python itself, but can be used by other tools, e.g. static analysis tools such as linters.

R- and L-values

The concepts of r-values – do not represent a memory location and thus cannot be assigned to – and l-values – represent a memory location and can be assigned to – apply to Python as well. Typical l-values in Python are variables, e.g. num, and container accessors, e.g. my_list[0] (TODO: containers are discussed in some later chapter). L-values tend to be simpler in Python, since it does not support, e.g. i++ and ++i (do you remember which one is the l-value in C++?), or functions returning references (return type &T in C++). Typical r-values in Python are literals such as "hello" and 1.

Final Words

Both C++ and Python support exponentiation out of the box: C++ has function std::pow, Python has operator ** and function pow.

Chapter 4:
Integers

A Familiar Program

C++
#include <iostream>

int main() {
  // Input (try 28)
  std::cout << "Temperature in Celsius =? ";
  int celsius;
  std::cin >> celsius; 

  // Computation and output (28°C --> 82°F)
  std::cout << celsius 
            << " Celsius are "
            << 9 * celsius / 5 + 32 
            << " Fahrenheit.\n";

  return 0;
}
Python
# Input (try 28)
celsius = input("Temperature in Celsius =? ")

# Computation and output (28°C --> 82.4°F)
print(celsius 
      + " Celsius are "
      + str(9 * int(celsius) / 5 + 32)
      + " Fahrenheit")

First Observations

The example illustrates that Python strings can be concatenated with +, just like C++ strings – and that function str turns a non-string (here, the computed degrees Fahrenheit) into a string, so that concatenation can be performed.

More interestingly, though: while the C++ expression celsius / 5 evaluates to an integer (because both operands are of integer type), the Python expression int(celsius) / 5 evaluates to a float! To obtain an integer result, you can use the integer division operator //, or apply the int function:

Python
celsius = 28
result = celsius / 5
print(type(result)) # float
print(result) # 5.6
Python
celsius = 28
result = celsius // 5
print(type(result)) # int
print(result) # 5
Python
celsius = 28
result = int(celsius / 5)
print(type(result)) # int
print(result) # 5

Integer Operators

Python’s numerical operators include the usual suspects: +, -, *, float division /, integer division //, integer modulo %, and the power operator **. As in C++, a binary minus (x - y) and a unary minus (-x) is supported. Examples: -2 - 4 evaluates to -6, and -(2 - 4) evaluates to 2.

The last two expressions illustrate that, as in C++, unary minus takes precedence over (binds stronger than) binary minus. Python’s operator precedences generally follow those of C++, as can be seen in this table.

Side remark: Python always evaluates operands from left to right (and likewise for function arguments, etc.). In C++, the evaluation order was not defined, which could give rise to undefined behaviour. The latter is generally not a problem in Python, since every operation either succeeds or results in an error – there is no such thing as undefined behaviour in Python.

Unbounded Integers

// we assume 32 bit integers, and initialise
// a with the largest possible unsigned integer
unsigned int a = 4'294'967'296 - 1;

// prints 4294967295
std::cout << "a = " << a << "\n";

// prints 0
std::cout << "a + 1 = " << (a + 1) << "\n";
a = 4_294_967_296 - 1

# prints 4294967295
print("a = {}".format(a))

# prints 4294967296
print("a + 1 = {}".format(a + 1))

# prints 18446744065119617025
print("a * a = {}".format(a * a))

C++ integers are of bounded size, typically 32 bits wide, and integer operations may over- or underflow, as demonstrated by the program on the left. In contrast, Python’s built-in integers are unbounded, just like mathematical integers, as demonstrated by the program on the right.

Remember the initial discussion about the different philosophies behind C++ and Python? This is an instance of it: C++ chose bounded integers, which are supported by hardware (CPUs) and thus fast, but under-/overflow; Python chose unbounded integers, which is how (non-programmers) expect integers to behave.

To ensure performance when it matters, scientific computation libraries for Python typically introduce their own integer data types (for example, NumPy), which are bounded, hardware-supported integers – typically exactly those that C++ offers.

Unsigned Integers?

Unlike C++, Python does not have unsigned integers built-in, but scientific computation libraries often define their own unsigned integer data types (for example, NumPy).

Chapter 5:
Truth Values

A Familiar Program

C++
#include <iostream>
#include <cassert>

// Input dividend and divisor
std::cout << "Dividend a = ";
int a;
std::cin >> a;

std::cout << "Divisor b = ";
int b;
std::cin >> b;
assert (b != 0);

// Compute results
int div = a / b;
int mod = a % b;

// Check div-mod identity
assert(div * b + mod == a);
Python
# Input dividend and divisor
a = int(input("Dividend a = "))
b = int(input("Divisor b = "))

assert b != 0, "Divisor must not be zero"

# Compute results
div = a // b
mod = a % b

# Check div-mod identity
assert div * b + mod == a, "Identity violated"

First Observations

Just like C++, Python also supports boolean values, and has an assert statement that can be used to check correctness properties of, e.g. user input or computed results. Python’s built-in assert statement even takes an optional string, which is used as the error message if the assertion fails.

Boolean Literals and Operators

Python’s boolean literals are upper-cased – True and False – in contrast to their lower-cased C++ analogues. Python also supports the usual boolean operators (see this table for precedences), but they are spelled out rather than written with symbols:

C++
// DeMorgan's law - let x, y be booleans
assert (!(x && y) == (!x || !y))
Python
# DeMorgan's law - let x, y be booleans
assert (not x and y) == (not x or not y)

Python’s conjunction and and disjunction or are also short-circuiting: e.g. in the expression d != 0 and 1 < n/d, the second conjunct 1 < n/d is only evaluated if d != 0 evaluated to True.

Relational Operators

Python also supports the usual relational operators (see this table for precedences) for comparisons, e.g. <= for “less than or equal”. They generally behave as you would expect, but three aspects are worth a closer look.

The first is that Python’s relational operators can be chained, as commonly done in mathematics:

C++
// A hypothetical index check
assert (0 <= i && i < length)

 // Valid C++, but most likely not what you wanted
assert (0 <= i < length)
Python
# A hypothetical index check
assert 0 <= i < length)

The second is that Python has two special relational operators, is and is not, which do not directly show up in C++, but implicitly exist there as well. The two operators will be explained in TODO: some later chapter.

Lastly, be aware of the following: relational operators can also be applied to other types: e.g. when comparing strings, or lists (vectors), lexicographical order is used. The observed behaviour can be intuitive, e.g. for strings, but may not always be, e.g. for lists:

Python
# Comparing strings: rather intuitive
"Ant" < "Bee" # True
"Gemma" < "Gem" # False
"3" < "2" # False
"3" < "22" # False
Python
# Comparing lists: not so intuitive
[1] < [1] # False
[1] < [1,1] # True
[1] < [1,0] # True
[1] < [0,0] # False

As usual in programming: carefully consider if the use of a potentially unintuitive feature is worth it, or if there is a cleaner alternative that would improve readability and maintainability of the code.

Conversions and Truthiness

Booleans can be converted to numerical values, with the same interpretation as in C++:

Python
print(int(True))  # 1
print(int(False)) # 0
Python
print(True + 2)  # 3
print(False + 2) # 2

Conversions to boolean values are also supported by Python, including the familiar conversion from numerical values to booleans: 0 is interpreted as False, all other numerical values as True.

Python supports many additional conversions to boolean, though: None (similar to C++’s nullptr), [] (an empty list), and many other values are falsy because they are interpreted as False; the remaining values are truthy because they are interpreted as True. For more details, see this Stackoverflow question or the official Python documentation.

Boolean Operators and Truthiness

Truthy/falsy values can be used in combination with the boolean operators and, or, and not, but the behaviour might be surprising when coming from a different programming language. Consider the following examples:

Python
print([]  or "Empty :-(") # prints Empty :-(
print([1] or "Empty :-(") # prints [1]
Python
print([]  and "Non-empty") # []
type( []  and "Non-empty") # list
print([1] and "Non-empty") # Non-empty
type( [1] and "Non-empty") # string
Python
print(not "C++") # False
type( not "C++") # boolean

Such concise expressions can be handy, as illustrated by the example on the left. However, they can also make it more difficult to understand the code and its behaviour, compared to a less concise but more explicit alternatives.

Chapter 6:
Conditionals, Blocks and Scopes

A Familiar Program

C++
#include <iostream>

// Input an integer n
int n;
std::cout << "Integer n = ";
std::cin >> n;

if (n < 0) {
  std::cout << n << " is negative\n";
} else if (0 < n) {
  std::cout << n << " is positive\n";
} else {
  std::cout << n << " is zero\n";
}
Python
# Input an integer n
n = int(input("Integer n = "))

if n < 0:
  print("{} is negative".format(n))
elif 0 < n:
  print("{} is positive".format(n))
else:
  print("{} is zero".format(n))

First Observations

Python’s syntax for conditional statements differs slightly from that of C++, but most differences – elif instead of else if, no parentheses around conditions, trailing colon : – are not worth discussing.

Significant Whitespace

Much more important, and different from C++, is how bodies of conditional branches – the code blocks nested under if, else if/elsif and else – are defined: not by braces ({ ... }), but by indentation!

C++
# braces delimit body
if (...) {
  stmt1;
  stmt2;
};
Python
# indentation delimits body
if ...:
  stmt1
  stmt2

C++
# braces optional for single-statement body
if (...)
  stmt1;
Python
# indentation never optional
if ...:
  stmt1

The code snippets above illustrates how nested blocks look in C++ and in Python: delimited by braces in C++, and by leading whitespace in Python. In a sense, the language designer of Python embraced the recommendation to indent nested blocks in order to improve code readability, and simply made indentations mandatory.

The next snippets illustrate that an incorrect indentation can lead to parser errors (left), as well as bugs (right):

C++
// formatting not helpful, but otherwise OK
stmt1;
  if (...)
    stmt2;
Python
# won't parse
stmt1
  if ...: # indentation too deep
    stmt1

C++
// formatting not helpful, but otherwise OK
if (...) {
  stmt1; // in body
stmt2;   // in body
}
Python
# will parse
if ...:
  stmt1 # in body
stmt2   # not in body

Important: In Python, all bodies – of conditionals, loops, functions, classes, … – must be correctly indented. Getting this right should be easy if you already developed the habit of formatting your code (as was recommended in the C++ course), but otherwise may take some time.

Variable Scopes

In C++, every block denotes a scope, and nested (inner) scopes have access to the variables declared in their surrounding (outer) scopes. This is also true in Python, but conditionals and loops do not form nested scopes, whereas functions (discussed in [TODO: some later chapter]) do.

C++
bool b = ...;
int v = 0;

if (b) {
  v = -1;
  bool took_if = true;
} else {
  int v = 1; // shadows outer v
}

std::cout << v; // -1 or 0

if (!b)
  std::cout << took_if; // compiler error
Python
b = ...
v = 0

if b:
  v = -1
  took_if = True
else:
  v = 1

print(v) # -1 or 1

if not b:
  print(took_if) # no error

The example above illustrates some scoping differences, more details can be found in this dedicated tutorial.

Empty Blocks

It can sometimes be helpful to leave blocks initially empty, or to use a comment as a placeholder for actual code (remember the lecture on stepwise refinement?) – but statements such as if unfortunately require a body. In C++, one can simply provide an empty block as the body (or the empty statement, i.e. a semicolon). Python instead provides a dedicated statement called pass, which does nothing.

C++
if (cache.contains(input)) {
  result = cache.at(input)
} else {
  // TODO: compute result, and save in 
  //       cache for later reuse
}
Python
if input in cache:
  result = cache[input]
else:
  # TODO: compute result, and save in 
  #       cache for later reuse        
  pass

The switch-Statement

Python does not provide a switch statement, but it can be partly simulated, as e.g. discussed in this Stackoverflow post.

Chapter 7:
Loops

A Familiar Program

C++
#include <iostream>
#include <cassert>

// Input a positive integer n
unsigned int n;
std::cout << "Compute the sum 1+...+n for n = ";
std::cin >> n;
assert(1 <= n);

// Compute sum 1 + 2 + ... + n-1 + n
unsigned int s = 0;
for (unsigned int i = 1; i <= n; ++i)
  s += i;

// Output final result
std::cout << s << "\n";
Python
# Input a positive integer n
n = int(input("Compute the sum 1+...+n for n = "));
assert 1 <= n, "{} is not positive".format(n)

# Compute sum 1 + 2 + ... + n-1 + n
s = 0
for i in range(1, n + 1):
  s += i

# Output final result
print("{}".format(s))

First Observations

Even without properly understand what the Python for loop does: just reading “for i in some range/interval” (Python) vs. “start with 1 and increment by 1 until n is reached” seems to suggest that the Python loop is less technical/more intuitive than the C++ version. Which is true, and in line with the different language philosophies.

The for-Loop

A classical C++ for loop – for (init; cond; iter) body; – has four components: an initialisation statement, a continuation condition, an iteration expression and a loop body. This allows arbitrary iterations, of which two are particularly common: iterating over a range of numbers (also known as a “counting loop”), by inc-/decrementing a numerical variable, and iterating over the elements of a container (via an iterator). For the latter use case, the alternative form for (elem : container) body; can be used.

Python decided to support only the latter form (but this does not reduce expressiveness, as we will see shortly), i.e. an iteration over all elements of a container:

C++
std::vector<std::string> names = 
  {"Ada", "Julia", "Ruby"};

for (std::string name : names)
  std::cout << name << "\n";
Python
names = ["Ada", "Julia", "Ruby"] # a list of names

for name in names:
  print(name)

Iterating over all elements also allows iterating over a range of numbers, as illustrated by the loop from this chapter’s initial example:

C++
# sum up first n numbers
unsigned int s = 0;
for (unsigned int i = 1; i <= n; ++i)
  s += i;
Python
# sum up first n numbers
s = 0
for i in range(1, n + 1):
  s += i

Python’s range function (which is actually a constructor for a container equipped with an iterator) takes up to three arguments: a start value (here 1), a stop value (here n + 1) and a step value (omitted here, defaults to 1).

Python also supports break and continue, just like C++. Python’s for loop even supports an else-clause. For more examples, see this part of Python’s documentation.

Side remark: summing up a range of numbers is even easier in Python: s = sum(range(1, n + 1)). You can do the same in C++, but sum and range are not part of C++’s standard library, so you would have to implement them yourself.

The while-Loop

Python also offers a while loop, but does not support the do-while loop C++ offers:

C++
unsigned int s = ...;


// Collatz sequence
unsigned int c = s;
while (c != 1) {
  if (c % 2 == 0) c = c / 2;
  else c = 3*c + 1;
}

std::cout << "Collatz terminated for c = " << s << "\n";


// Print at least one star, then half s until
// zero is reached
do {
  std::cout << "*";
  s = s / 2;
} while (s != 0);
Python
s = ... # some positive integer


# Collatz sequence
c = s;
while c != 1:
  if c % 2 == 0: c = c // 2
  else: c = 3*c + 1

print("Collatz terminated for c = {}".format(s))


# Print at least one star, then half s until
# zero is reached
print("*", end="")
s = s // 2
while s != 0:
  print("*", end="")
  s = s // 2

A remark on Python’s print function: to suppress the newline (\n) that is automatically appended to each printed string, you can pass the named argument end="". If omitted – as we have always done so far – it defaults to end="\n". More details on named arguments are provided in [TODO: some later chapter].

Comprehensions and Functional Programming

Consider the following pseudo code: given a container with input data, apply a function f to each element, to obtain a container of output data.

// this is pseudo code
in_data = [e0, e1, e2, ...]
out_data = []
for every e in data do
  append f(e) to out_data

This is a very common scenario in general programming, but in particular in data analysis and processing. Here, each input element is mapped (by function f) to an output element; a common variant of this pattern is to filter (remove) some elements during the process. Both patterns are also common in mathematics, and often expressed via set comprehensions, e.g. {f(e) | e ∈ S} (mapping all e in a set S to f(e)) or {f(x) | 0 < x ∈ S} (mapping and filtering).

In computer science, the functional programming paradigm popularised this “mathematical” style of programming, and many languages (Scala, Python, JavaScript, Java, …) integrated it in recent years. Newer versions of C++ also support this style, but it is not yet as commonly used in C++ as it is in Python (and Python offers much nicer syntax).

In this tutorial, we will only show small examples illustrating comprehensions/functional programming in Python. As usual, many free online resource for functional programming in Python exist, such as this official article, or this blog post.

Mapping

Our mapping example is the computation of the square of each number in a given container. The following three snippets contrast a loop-based solution in C++ (left) and Python (middle) with a comprehension-based Python solution (right). The latter is the preferred way of implementing such a mapping task in Python.

C++
// Square each element, using 
// a loop and append
std::vector<int> elems = 
  {0, 1, -1, 2, -2};
std::vector<int> squares = {};

for (int e : elems)
  squares.push_back(e*e);

// squares is {0, 1, 1, 4, 4}
Python
# Square each element, using 
# a loop and append
elems = [0, 1, -1, 2, -2]
squares = []

for e in  elems:
  squares = squares + [e*e]
Python
# Square each element, using a 
# comprehension
elems = [0, 1, -1, 2, -2]
squares = [e*e for e in elems]
Filtering

Filtering is similar to mapping (and can be combined with it), but only considers those elements of a container that match a certain criterion (satisfy a predicate): in our example, we filter (keep) only non-negative numbers. The following three snippets contrast a loop-based solution in C++ (left) and Python (middle) with a comprehension-based Python solution (right). The latter is the preferred way of implementing such a filtering task in Python.

C++
// Filter non-negative elements,
// using a loop and append
std::vector<int> elems = 
  {0, 1, -1, 2, -2};        
std::vector<int> nonneg = {};

for (int e : elems)
  if (0 <= e) nonneg.push_back(e);

// nonneg is {0, 1, 2}
Python
# Filter non-negative elements,
# using a loop and append 
elems = [0, 1, -1, 2, -2]
nonneg = []

for e in elems:
  if 0 <= e:
    nonneg = nonneg + [e]
Python
# Filter non-negative elements,
# using a comprehension
elems = [0, 1, -1, 2, -2]
nonneg = [e for e in elems if 0 <= e]

Filtering and mapping can be combined, e.g. we can only square non-negative numbers:

Python
# Square only non-negative elements, using a comprehension
elems = [0, 1, -1, 2, -2]
nonneg_squares = [e*e for e in elems if 0 <= e]
print(nonneg_squares) # [0, 1, 4]

Reducing and Lambda Functions

Another often-occurring task is to reduce (fold) a container into a single value, e.g. to sum up all elements. In mathematics, operators such as Σ (sum) and Π (product), but also (integral), can be seen as such reductions. In Python, you can use the reduce function for this:

Python
elems = [3, 4, 5]
sum = 0
for e in elems:
  sum = sum + e
print(sum) # 12
Python
elems = [3, 4, 5]
sum = reduce(lambda x,y: x + y, elems)
print(sum) # 12

In the snippet on the right, reduce takes a lambda function (basically a function literal, or an anonymous function) that determines how the reduction is performed: here, by successively summing up (i.e. reducing) two elements of the list, which effectively computes the expression (3 + 4) + 5.

The effective computation can also be observed by inserting appropriate print statements, as illustrated next. To do this, we pass a regular function (the function itself, not its result) instead of a lambda function.

Python
def add(x, y):
  print("{}+{}".format(x, y))
  return x + y

# Computes 12, as before
# Prints 3+4 and 7+5 in the process
sum = reduce(add, [3, 4, 5])

Further Information

Since lambda functions and functional programming weren’t covered in our C++ course, we also won’t go into details here. As usual, many free online resources for Python (but also C++) exist that cover these topics. For Python, examples are this article, this article and this article; for C++, give this article a try if you’re interested.

Chapter 8:
Floating-point Numbers

A Familiar Program

C++
std::cout << 1.1 - 1.0 - 0.1 << "\n"; 
// Outputs 8.32667e-17
Python
print(1.1 - 1.0 - 0.1)
# Outputs 8.326672684688674e-17

Python Implements IEEE Floating-point Numbers

Floating-point numbers in Python are equivalent to C++’s double type (consult the Python documentation for details): they only approximate the real numbers — as demonstrated by the code snippets above — but nevertheless suffice for most scientific computations.

Two differences are worth mentioning: the first is that computations in Python are quite a bit slower. Remember the harmonic sum approximation from the C++ course?

C++
const unsigned int n = 100000000;

double harmonic = 0;
for (unsigned int i = n; i >= 1; --i)
  harmonic += 1.0 / i;        
Python
n = 100000000

harmonic = 0.0
i = n
while 1 <= i:
  harmonic += 1.0 / i
  i -= 1

On my computer, the C++ program executes in less than 1 second, whereas the Python program takes about 20 seconds to execute. Both programs compute the same result, though.

The second difference between Python and C++ with respect to floating-point numbers is that Python only provides the double-precision type – called double in C++, but float in Python – but not the smaller, single-precision type (float in C++). This is typically not a problem, though.

Mixed Expressions

Recall from C++ that, when combining integer and floating-point values in an expression, the result is of floating-point type as well, since that is the considered to be the larger type. This is the same in Python, as demonstrated below:

C++
std::cout << 8 + 0.123 << "\n"; // 8.123
Python
print(8 + 0.123) # 8.123
print(type(8 + 0.123)) # float

Chapter 9:
Functions

A Familiar Program

C++
// Euklid's algorithm (with modulo)
int gcd(int x, int y) {
  assert(0 < x && 0 < y);

  while (y > 0) {
    int t = x % y;
    x = y;
    y = t;
  }

  return x;
}

...
std::cout << gcd(6, 9); // 3
Python
# Euklid's algorithm (with modulo)
def gcd(x, y):
  assert 0 < x and 0 < y, \
         "Arguments must be positive"

  while y > 0:
    t = x % y
    x = y
    y = t

  return x

print(gcd(6, 9)) # 3

First Observations

A Python function declaration has the shape

def NAME(X1, ..., XN):
  BODY

where NAME is the function’s name, X1 to XN are parameter names, and BODY is the function body. Python functions thus look similar to C++ functions, but with types removed, def added, and an indented body instead of a code block in curly braces.

Function calls in Python also resemble those you know from C++: R = NAME(E1, ..., EN) calls function NAME, passes the results of evaluating expressions E1 to EN as arguments, and stores the call’s return value in R.

Forward Declarations (Not Needed)

C++
// Forward declaration of is_odd
bool is_odd(unsigned int x);

bool is_even(unsigned int x) {
  return x == 0 || is_odd(x - 1);
}

bool is_odd(unsigned int x) {
  return x != 0 && is_even(x - 1);
}
Python
def is_even(x):
  assert 0 <= x
  return x == 0 or is_odd(x - 1)

def is_odd(x):
  assert 0 <= x
  return x != 0 and is_even(x - 1)

Consider the above programs, which (very inefficiently) determine whether a positive number is even or odd. The C++ program requires a forward declaration of is_odd; otherwise, the compiler will complain that is_even cannot call is_odd because the latter is not yet known. Technically, this is because the C++ compiler only passes over a program once, from top to bottom, whereas compilers for most other languages pass twice over a program. Python also does the latter, and thus doesn’t require forward declarations.

Function Overloading

In C++ and other statically typed languages, function overloading allows you to implement different functions with the same name, but different arguments. This by itself would result in ambiguity at call-site – which function to call? – but the ambiguity is resolved by matching formal against actual arguments, i.e. by comparing the types of the arguments at call site with the types of the function declaration.

As an example, consider a function that outputs different types of values in different ways – for simplicity, integers and reals with a corresponding output prefix:

C++
void show(int i) {
  std::cout << "integer value = " << i << '\n';
}

void show(double d) {
  std::cout << "real value= " << d << '\n';
}

// Calls to show(int)
show(25);  // integer value = 25
show(-12); // integer value = -12

// Calls to show(double)
show(0.46); // real value= 0.46
show(-1.2); // real value= -1.2

Function overloading is not possible in Python, due to the lack of static types. However, function overloading is also not an essential programming language feature (i.e. it does not increase a language’s expressiveness) because it can be “simulated” by having multiple functions with different names, e.g. show_int and show_double, combined with checking the type of values/objects at runtime. In Python, the latter is done via isinstance(o, T), which is true if and only if object o is of dynamic type T.

Python (broken)
def show(i):
  print("integer value =", i)

# ATTENTION: Since Python does not have static 
# types, the next show function simply shadows
# (replaces) the first one!
def show(f):
  print("real value =", f)

# All calls go to show(f)
show(25);   # real value = 25
show(-12);  # real value = -12
show(0.46); # real value = 0.46
show(-1.2); # real value = -1.2
Python (works)
def show_int(i):
  print("integer value =", i)

def show_float(f):
  print("real value =", f)

def show(v):
  if (isinstance(v, int)): 
    show_int(v)
  elif (isinstance(v, float)): 
    show_float(v)
  else:
    assert false, \
           "Unexpected type {}".format(type(v))

# Ultimately calls show_int
show(25);  # integer value = 25
show(-12); # integer value = -12

# Ultimately calls show_float
show(0.46); # real value = 0.46
show(-1.2); # real value = -1.2 

Operator Overloading

As just seen, function overloading (or its simulation) enables having one function name, e.g. show, with different implementations and thus behaviours, depending on the type of the arguments. This idea naturally extends to operators – which are essentially functions that we merely call differently: e.g. addition on integers (1 + 2 == 3) is typically given a different implementation than addition on strings ("H" + "i" == "Hi"). We will discuss operator overloading in the context of classes, in TODO: some later chapter.

Default Arguments

Assume you are asked to implement a function for converting a decimal integer into its binary representation, i.e. to base 2. A suitable Python function signature would then be to_base_2(n) for a non-negative integer n. However, you remember that the algorithm for converting into binary representation also works for other bases, e.g. 3 and 8. Consequently, you implement a more general function to_base(n, b) where 2 <= b <= 10 is the base to convert to. You’re happy with the result – but your colleagues aren’t: in their code, they nearly always need the binary representation, and they are annoyed that they always have to provide 2 as the argument to your function.

That’s where default arguments come in (also in C++, although we haven’t officially introduced them in the C++ course):

C++
std::string to_base(unsigned int n,
                    unsigned int base = 2) {
  assert(2 <= base && base <= 10);

  std::string result = "";

  while (n != 0) {
    result = std::to_string(n % base) + result;
    n /= base;
  }

  return result;
}

...
std::cout << to_base(77) << "\n";    // 1001101
std::cout << to_base(77, 2) << "\n"; // 1001101
std::cout << to_base(77, 5) << "\n"; // 302
Python
def to_base(n, base = 2):
  assert 0 <= n
  assert 2 <= base and base <= 10

  result = ""

  while not n == 0:
    result = str(n % base) + result
    n = n // base

  return result


print(to_base(77))    # 1001101
print(to_base(77, 2)) # 1001101
print(to_base(77, 5)) # 302

Here, if no value is provided for parameter base when calling to_base, the default value 2 will be used. Default arguments thus enable implementing functions that are generic/flexible (many parameters), but still convenient to call in many default situations, without having to provide the same, common arguments over and over again.

Default arguments – often combined with keyword arguments, introduced next – are very common in Python. E.g. the print function specifies four default arguments, one of which (parameter end = "\n") determines how to end the output.

Keyword Arguments

Imagine that you are tasked with implementing a function that outputs a sequence of numbers. You brainstorm a bit, and conclude that it would be nice to also output some context information, to be able to specify the character that separates the numbers, and the final character. You thus implement the following Python function (implementation omitted):

Python
def show(numbers,
         context = "",
         separator = " ",
         suffix = "\n"):
  ...

# Some usage examples
show([1,2,3]) # 1 2 3
show([1,2,3], "vec: ") # vec: 1 2 3
show([1,2,3], "vec: ", ",") # vec: 1,2,3
show([1,2,3], "", " ", " (end)") # 1 2 3 (end)

You’re happy with the result – until you notice something annoying with the last usage example above: in order to provide the fourth parameter (here, " (end)"), you need to provide all earlier parameters, even if you don’t want to change their default values (here, "" and " ").

Keyword arguments to the rescue:

Python
# def show(...) as before
...
show([1,2,3], suffix=" (done)") # 1 2 3 (done)

Python’s keyword arguments feature allows you to use the parameter name at call-site to specify which parameter (think: assign to) you want to provide, regardless of the position of the argument at call site. Consequently, “normal” arguments are called positional arguments, to differentiate them from keyword arguments. In contrast to Python, C++ (currently) only supports positional arguments, but not keyword arguments.

Arbitrary Argument Lists

Let’s take another look at our show function from above: it is intended to print arbitrarily many numbers, and therefore expects the first argument to be a sequence (technically, an iterable, TODO: see some later chapter) of values. Due to this, we also have to create and pass a sequence (iterable) at call site, e.g. as in the call show([1,2,3]). In a sense, this requires a step too much: instead of just passing the individual elements, callers need to wrap them in a sequence in order to form a single value (of sequence type) that is then passed to the function.

To avoid the extra step, Python supports arbitrary argument lists, also known as variadic parameters/arguments/functions, as illustrated by the next version of the show function (for simplicity, with fewer parameters):

Python
def show(*numbers, suffix = "\n"):
  ...

show(1) # 1
show(1, 2) # 1 2
show(1, 2, 3) # 1 2 3
show(1, 2, 3, suffix="and so on\n") # 1 2 3 and so on

Such variadic parameters are denoted by a leading *, as in *numbers (Python does not use * for anything pointer-related), and they greedily consume as many arguments as possible. It is thus reasonable to have the variadic parameter last (if any), or to have it followed by keyword arguments only. See the Python documentation for additional details, e.g. argument list unpacking.

Python’s print function is a typical specimen of such a variadic function. Its signature is:

Python
def print(*objects, sep=' ', end='\n', file=sys.stdout, flush=False)

In case you’re interested: C++ provides a couple of language features that come close to Python’s arbitrary argument lists, but they are not as conveniently usable. If you’re interested, start reading on cppreference (1, 2, 3, 4, 5) or search on StackOverflow.

Arbitrary Keyword Argument Lists

The concept of arbitrary argument lists can be transferred to keyword arguments as well, which can be handy in (at least) two situations: for functions that can handle arbitrarily many (optional) arguments, and for functions that need to forward many such arguments.

To illustrate the first situation – which often arises in the context of printing/logging/recording arbitrary additional information – let’s consider the following function:

Python
def myfun(mandatory, **optionals):
  # ... actual function implementation omitted ...
  print("mandatory =", mandatory)

  # Iterate over all keywords (parameter names) in optionals, and access
  # the corresponding values with optionals[keyword]
  for keyword in optionals:
    print(keyword, "=", optionals[keyword]))

myfun("M", name="temp", weight=1.2, is_empty=False)
  # Produces the following output:
  #   mandatory = M
  #   name = temp
  #   weight = 1.2
  #   is_empty = False

To illustrate the second situation, assume that you are given a function magic that takes many optional arguments, and that you need to call from your own function myfun1. To allow callers of your function to pass these optional arguments to the underlying function, you can “collect” them in an arbitrary keyword argument list:

Python
def magic(mandatory, opt1="", opt2=1, opt3=True):
  # ... actual function implementation omitted ...
  print("mandatory =", mandatory)
  print("opt1 =", opt1)
  print("opt2 =", opt2)
  print("opt3 =", opt3)

def myfun1(arg1, arg2, **optionals):
  # ...
  # To convert from an arbitrary argument list (optionals) to
  # individual arguments (opt1=..., opt2=..., etc.), the list must be
  # unpacked (**optionals). 
  # See also https://docs.python.org/3/tutorial/controlflow.html#unpacking-argument-lists.
  magic(arg1 + arg2, **optionals)
  # ...

myfun1(2, 2, opt2=22)
  # Produces the following output:
  #   mandatory = 4
  #   opt1 = 
  #   opt2 = 22
  #   opt3 = True

An alternative would be to literally copy the optional arguments from the underlying function to the user-exposed function:

Python
# def magic(...) as before

def myfun2(arg1, arg2, opt1="", opt2=1, opt3=True):
  # ...
  magic(arg1 + arg2, opt1, opt2, opt3)
  # ...

myfun2(2, 2, opt2=22) # Same output as before (with myfun1)

Caution: myfun1 may seem preferable (no need to copy options), but be aware that myfun2 is actually safer: with myfun1, users can indeed pass arbitrary arguments to magic, including too many arguments, and arguments with unexpected types. This may result in errors – raised by magic, whose call users can’t directly see and may be confused about – or, even worse, unexpected and hard to explain erroneous results.

In case you’re interested: C++ does not support arbitrary keyword argument lists directly, but container std::map (or container std::unordered_map) can be used to simulate them, at least to some extent. Major differences will be (1) types: C++ maps have a single value type, which complicates storing values of different types; and (2) convenience: must explicitly create a map of arguments, which requires additional code.

Multiple Return Values via Tuples

Recall from TODO: some earlier chapter that Python supports n-ary tuples (pairs, triples, …), which can be unpacked into separate variables. This makes it straightforward to implement functions that return multiple values, as illustrated below. The same is possible in C++, but the resulting code is less concise.

C++
#include <algorithm> // for min_element/max_element
#include <tuple> // for tuple, tie

std::tuple<int, int> minmax(
    const std::vector<int>& elems) 
{
  int a = 
    *std::min_element(elems.begin(), elems.end());

  int z = 
    *std::max_element(elems.begin(), elems.end());

  return {a, z};
  // More explicit version of previous line:
  //   return std::tuple<int, int>{a, z};
}


int least, greatest;
std::tie(least, greatest) = minmax(std::vector<int>{7, 40, 11});

std::cout << "least = " << least
          << ", greatest = " << greatest;
Python
def minmax(elems):
  a = min(elems)
  z = max(elems)

  return (a, z)


least, greatest = minmax([7, 40, 11])

print("least = {}, greatest = {}"
      .format(least, greatest))

Generic Functions

Consider a min function that returns the lesser of two elements: e.g. min(1,2) returns 1, and min("Zoo", "Ale") returns "Ale". Implementing such a generic function – one that works for different types of arguments, e.g. integers and strings – is straightforward in Python, since Python is not statically typed. In C++, the same can be achieved with templates, which were only briefly introduced in the C++ course.

C++
// A template essentially allows functions to take
// types as arguments
template<typename T>
T min(T x, T y) {
  if (x < y) return x;
  else return y;
}

int main() {
  // The compiler automatically derives the template
  // parameter T from the provided arguments
  int m1 = min(3, 7); // T is int
  std::string m2 = min("Zoo", "Ale"); // T is string

  std::cout << "m1 = " << m1
            << ", m2 = " << m2 << '\n';

  // Compiler error: T cannot be int and string
  //   min(3, "Ale");
}
Python
def min(x, y):
  if x < y: return x
  else: return y


m1 = min(3, 7)
m2 = min("Zoo", "Ale")

print("m1 = {}, m2 = {}".format(m1, m2))

# Runtime error: comparison x < y in function min 
# fails at runtime because integer 3 and string 
# "Zeus" cannot be compared.
#   min(3, "Zeus")

We will get back to generic functions in the context of classes, see TODO: this later chapter.

Pre- and Postconditions

In the C++ course, we used function pre- and postconditions in two ways: as comments in the code, meant for users as documentation, and with assert statements, to help us uncover bugs more quickly. Both use cases apply to Python as well.

Chapter 10:
Libraries

Libraries (and namespaces, packages, etc.) are a way of (1) structuring code, (2) combining functions and classes into “groups” that share common concerns, and (3) fostering code reuse. Imagine, for example, a simple Tic-Tac-Toe web application, whose code could be structured into several groups: code that deals with the user interface forms one group, code that implements the game logic forms another group, code that generally helps with building a game AI goes into a third group, etc. The first two groups are purely internal, i.e. not used elsewhere, but the third group is of general interest, and thus made available to other users.

This idea of “grouping functionality that belongs together” and potentially making it available to others is so important in programming that every (mainstream) programming language supports some instance of this general idea. Technical solutions and names may differ (namespaces, packages, modules, libraries, frameworks, …), but the underlying idea is the same.

A (Somewhat) Familiar Program

C++
// ------ file numbers.cpp ------
namespace num {  
  std::string to_base(...) {
    // ... as before ...
  }

  // ... there could be more functions 
  //     (and classes) here ...
}


// ------ file main.cpp ------
#include <iostream> // Include a standard library
#include "numbers.cpp" // Include a local file

int main() {
  // Needs fully-qualified name num::to_base
  // to call the function
  std::cout << num::to_base(256, 8) << '\n';

  // Include (only) num::to_base into the 
  // current namespace → function can now
  // be called without namespace prefix
  using num::to_base;

  std::cout << to_base(256, 8); // 400

  // Includes everything from num into the 
  // current namespace. This may result in
  // unexpected name clashes and is thus to be
  // used with care.
  using namespace num;

  std::cout << sign(-313); // -1

  return 0;
}
Python
# ------ file numbers.py ------
def to_base(...):
  # ... as before ...

def sign(n):
  return 1 if 0 <= n else -1

# ... there could be more functions 
#     (and classes) here ...


# ------ file main.py ------
import io # Import a standard library 
          # (which we don't use here)
import numbers # Import a local file

# Needs fully-qualified name numbers.to_base
# to call the function
print(numbers.to_base(256, 8))

# Imports (only) numbers.to_base into the 
# current namespace → function can now
# be called without namespace prefix
from numbers import to_base

print(to_base(256, 8)) # 400

# Imports everything from numbers into the 
# current namespace. This may result in 
# unexpected name clashes and is thus to be
# used with care.
from numbers import *

print(sign(-313)) # -1

First Observations

C++ and Python both allow loading existing functionality into a program: via statement #include in C++, and import in Python. In C++, the namespace construct can be used to group code on an additional level (files and namespaces); in Python, a file induces a homonymous namespace (file numbers → namespace numbers). C++ differentiates between including a standard/global library (#include <...>) and including a local library/file (#include "..."); Python uses the same syntax (import ...) for both.

Afterwards, functions from the library can be accessed in a fully-qualified manner, i.e. prefixed with the namespace: here, num::to_base in C++ and numbers.to_base in Python.

Namespace prefixes can clutter code, and both languages thus provide means of avoiding the need for fully-qualified access: here, via using num::to_base in C++, and from numbers import to_base in Python. In both cases, the function can afterwards be invoked as to_base(...), i.e. without the namespace prefix.

Since namespaces may contain many functions, both languages also provide means of avoiding the need for fully-qualified access for all functions from a particular namespace: via using namespace num in C++, and from numbers import * in Python. Since this can result in name clashes and subtle problems, it is often recommended to avoid this, and to import individual functions instead.

Header Files (Not Needed)

C++’s separation of declaration and implementation into header (.h) and code (.cpp) file, respectively, does not exist in Python. We’re not going into the technical details here, just … enjoy the reduced complexity :-).

PyPI: Python Package Index

We only briefly touched the idea of libraries in C++, and we’re not going to dig deeper in this tutorial and for Python, either. If you’re interested in how to write and publish your own libraries, see, e.g. this article, and this article.

However, as mentioned earlier, one of Python’s success factors is its huge, well-maintained and easy-to-use universe of libraries: the Python Package Index (PyPI). Whenever you need to develop a piece of Python software, make sure to check PyPI for useful libraries – chances are high that you’ll find high-quality building stones for your software there.

Chapter 11:
Containers and Iterators

As you probably remember, C++ offers many different containers (collections), e.g. vectors and (un)ordered sets, because 1. nearly every application has to manage statically unknown amounts of data and 2. different applications have different (efficiency) requirements regarding the data to manage. The situation is the same in Python, but in this tutorial, we’ll only address the most commonly used container data structures in Python: lists, dictionaries and tuples. For additional containers and details, see the Python documentation.

A Familiar Program: The Sieve of Eratosthenes

C++
unsigned int n = 0;

std::cout << "n = ";
std::cin >> n;

// Initialise crossed_out to contain n+1 times false
std::vector<bool> crossed_out(n + 1, false);

for (unsigned int i = 2; i <= n; ++i) {
  if (!crossed_out[i]) { // i is prime
    std::cout << i << " ";

    // cross out all proper multiples of i
    for (unsigned int m = 2 * i; m <= n; m += i)
      crossed_out[m] = true;
  }
}
Python
n = int(input("n = "))

# Initialise crossed_out to contain n+1 times false
crossed_out = [False] * (n + 1)

for i in range(2, n + 1):
  if not crossed_out[i]: # i is prime
    print(i, end=" ")

    # cross out all proper multiples of i
    for m in range(2 * i, n + 1, i):
      crossed_out[m] = True

First Observations

Python’s lists are (maybe modulo performance) the natural analogue of C++’s vector container: a data structure that can hold arbitrarily many values, is accessed by index, can grow and shrink (not used in the Sieve example above), can be initialised in different ways, and so on.

Working With Lists

We’ll briefly summarise the most common operations on lists next. Consult the Python documentation on lists for additional operations and details. In particular, see these non-mutating operations, and these mutating operations.

In contrast to C++ vectors, Python lists can be heterogeneous, i.e. they can contain elements of mixed types: e.g. the list ["Blanka", 100, 1.2] could represent a computer game player with a name, an energy percentage value, and a defence multiplier. Use cases for this are rare, however, and it is usually better to use a class (TODO: See here) or a dictionary (TODO: See here) instead of a heterogeneous list.

Dictionaries

Dictionaries (also known as maps, lookup tables and associative arrays) are another data structure that is very popular in Python (and also other languages, including C++). It can be understood as a generalisation of a list: essentially, a list maps indices to elements, whereas a dictionary maps keys – of arbitrary type – to values. Here is a small example:

Python
addressbook = {
  "Ida": "+15550111",
  "Marcos": "spam@marc.os",
  "Yoko": "+15550135"
}

print(addressbook["Ida"]) # Prints "+15550111"

addressbook["Yoko"] = "yoko@script.py" # Replaces entry for "Yoko"

del addressbook["Ida"] # Deletes key "Ida" (and associated value)

addressbook["Gaurav"] = "+15550101" # Add a new key-value pair

print("Address book has", len(addressbook), "entries:") # Prints "Address book has 3 entries:"

for key in addressbook: # Prints all key-value pairs
  print("  ", key, "-->", addressbook[key])

Dictionaries are the data structure of choice in situations where values (bits of information) are associated with – and accessed through – a unique key per value, as illustrated by the address book example above. See the Python documentation on dictionaries for additional operations and details.

Dictionaries can be heterogeneous in keys and values, as illustrated next. Using values of different types can be useful, but a dedicated class with just the right fields/member variables is usually the more robust long-term solution. The use of heterogeneous keys is even less rarely useful.

Python
player = {
  "name": "Blanka",
  "energy": 100,
  "multiplier": 1.2
}
Python
wild_mix = {
  3: "three",
  "four": 4.0,
  True: False
}

Tuples

It can sometimes be useful to group a fixed number of elements (pairs, triples, etc.), but not arbitrarily many, i.e. not a list of elements. For this, Python provides tuples – which are essentially lists of constant size. The most common operations on tuples are summarised next:

Tuples can be packed and unpacked: the former happens when the parentheses are omitted: e.g. for the assignment pair = 1,2, the right-hand side of the assignment is packed into a tuple. Unpacking is the opposite: e.g. e1, e2 = pair unpacks the pair and assigns the first pair element to e1 and the second to e2. Runtime errors are raised if the arity of the two sides doesn’t match: e.g. e1, e2, e3 = pair will abort with an error.

Tuples, and tuple unpacking, is often used when a function needs to return more than one value, as illustrated next:

Python
from statistics import mean, median

def analyse(series):
  avg = mean(series)
  med = median(series)

  # Return two results as a pair
  return (avg, med)

series = [3, 13, 11, 8, 15, 6, 99, 100]

avg, med = analyse(series) # unpack returned tuple

print("avg =", avg, "\nmed =", med)

A word of caution: if you find yourself using and passing around lots of tuples, consider using dedicated classes (or named tuples) – with suitable fields/member variables – instead, since it makes your code more robust in the long run.

Consult the Python documentation for more information about tuples: in particular, here, here and here.

Other Containers

Python provides several additional containers, e.g. an implementation of a mathematical set, as also used in the C++ course. Other containers are variants of lists or dictionaries that have been optimised for more specialised use cases. See the Python documentation on containers and on data structures for details.

Chapter 12:
References, Pointers, Aliasing, and Argument Passing

Attention

Most mainstream programming languages, Python included, use C++-like pointers to access objects, but call them references. In order to explain similarities and differences, we will first recap the C++ side, and then move on to Python.

C++: Recapitulation of Pointers vs. References

In C++, references (type T&) and pointers (type T*) are two different types and concepts, although both are, in some way, concerned with the idea of aliasing. Most mainstream languages, Python included, only offer C++-like pointers, but not C++-like references.

Note: In the remainder of this section, we will therefore focus on C++ pointers, and even use them in situations where a C++ reference would be easier/more reasonable. We’ll also ignore the aspect of static vs. dynamic (new) memory allocation in this section of the tutorial.

Consider the illustration below, of two pointers p and q aliasing in memory: each pointer identifies a separate memory location (their addresses differ, i.e. &p != &q), but the value in those locations – the address of string "hi", to which they both point – is the same (i.e. p == q). Thus, when dereferencing the pointers, the same object is reached (i.e. *p == *q), and we therefore say that p and q are aliases.

Illustration of two pointers aliasing in memory

In Python, all of this — memory addresses, a pointer’s value vs. the object it points to, etc. — is hidden from you and handled behind the scenes, as we will see soon. This simplifies the language quite a bit (great!), but is not completely without disadvantages (oh well …).

C++: Recapitulation of Data Passing, Copying and Aliasing

When passing data, e.g. to a function, but also during an assignment, we have two options: to copy the data, or to share/alias it (in C++: via a reference or a pointer). With large data sets (e.g. a matrix), aliasing is efficient, but it can cause problems if aliased data is modified. Put simply: copying is save, aliasing is efficient.

As an example, consider the next two (simplified) C++ snippets: two versions of a function that prints the contents of a vector in increasing order; to achieve the latter, the data is sorted first. The version with copying (print_copy) does not affect the call-site vector, in contrast to the version with sharing (print_alias). Such call-site effects are often not desirable because they can introduce subtle bugs, and make it harder to reason about program correctness.

C++ (copy)
void print_copy(std::vector data) {
  std::sort(data);
  std::cout << data;
}

std::vector data = {0, -2, 8, 3};
print_copy(data); // prints -2,0,3,8
assert(data == {0, -2, 8, 3}); // holds
C++ (alias)
void print_alias(std::vector* data) {
  std::sort(*data);
  std::cout << *data;
}

std::vector data = {0, -2, 8, 3};
print_alias(&data); // prints -2,0,3,8
assert(data == {0, -2, 8, 3}); // fails        

In order to account for the tension between safety (→ copying) and efficiency (→ aliasing), you were given the following rule of thumb for passing data: small data (e.g. a number) should be copied; large data (e.g. a vector) should be aliased, ideally immutably (→ const).

Python: Data Passing, Copying and Aliasing

C++ enables programmers to explicitly choose between copying and aliasing, by using the appropriate types (e.g. std::vector<int> vs. std::vector<int>*). Most newer languages, including Python and Java, differ from C++ in this respect: they don’t enable programmers to make this choice, and they make the above rule of thumb the default instead.

Important: In Python, objects of small/primitive types, such as integers and booleans, are always copied1, whereas objects of larger types are always aliased. This is illustrated by the next code snippets.

Python (copy)
# Assuming that n is an integer, then the function 
# argument is copied at call site. The increment 
# therefore *does not* affect the call site.
def num_print_copy(n):
  n += 1
  print("n = ", n)

n = 1
num_print_copy(n) # prints 2
assert n == 1 # holds
Python (alias)
# Python does not enable users to specify if they 
# want to copy or alias. Hence, a function 
# num_print_alias — where the increment *would*
# affect the call site, cannot be defined in Python.
Python (copy)
import copy # For function copy.deepcopy

# Assuming that data is a list, then the function
# argument is aliased at call site. 
# Function list_print_copy thus cannot directly 
# be defined in Python, and explicit copying is 
# needed instead, to avoid affecting the call site.
def list_print_copy(data):
  # Must copy the data ourself!
  copied_data = copy.deepcopy(data)
  copied_data.sort()          
  print("data =", copied_data)

data = [0, -2, 8, 3]
list_print_copy(data) # prints -2 0 3 8
assert data == [0, -2, 8, 3] # holds
Python (alias)
# Assuming that data is a list, then the function 
# argument is aliased at call site. The sorting 
# therefore *does* affect the call site.
def list_print_alias(data):
  data.sort()
  print("data =", data)

data = [0, -2, 8, 3]
list_print_alias(data) # prints -2 0 3 8
assert data == [0, -2, 8, 3] # fails

Let’s compare the solutions: in both C++ and Python, a copy-version (print_copy vs. list_print_copy) and an alias-version (print_alias vs. list_print_alias) exist — that’s good. However: in C++, the function signature tells you whether or not a copy is made, whereas in Python, you need to look at the function implementation itself to see this. Moreover, if the deepcopy call is accidentally removed from list_print_copy, no warning or error is generated.

Python: Dereferencing (Not Needed)

Since Python does not make aliasing (think “C++ pointers and memory addresses”) explicit — compare the function signatures list_print_copy(data) and list_print_alias(data) — it also does not require users to dereference a variable in order to get to the object it points to. Consider the call data.sort() inside list_print_alias: if this were C++ (and sort a member function of a vector), you would have to dereference data first, and write either (*data).sort() (or equivalently, data->sort()). No need for this in Python, it is implicitly done for you.

Python: Pointer Arithmetic (Not Needed)

C++ focuses on efficiency, and provides programmers with “low-level” ways of interacting with the computer, including allocating “raw” chunks of memory (i.e. arrays). Pointer arithmetic (e.g. *(p+7)) can then used to access such memory. Python, on the other, focuses on simplicity: it does not support such low-level memory operations, and therefore also does not support (or need) pointer arithmetic.

Python: Summary

Keep the following in mind:

  1. Python kind of has pointers, but they are implicit, less complex, and called references
  2. Python always passes small/primitive values (integers, doubles etc.) by copying
  3. Python always passes large values (lists, instances of classes, etc.) by aliasing
  4. Watch out for call-site effects when mutating aliased objects

TODO: If you wonder what you can’t do in Python due to it “hiding” pointers from you, have a look at CX -> Chapter 12 -> Explicit Pointers


Recapitulation of C++ Pointers

Consider the illustration below, of two pointers p and q aliasing in memory: each pointer identifies a separate memory location (their addresses differ, i.e. &p != &q), but the value in those locations – the address of string "hi", to which they both point – is the same (i.e. p == q). Thus, when dereferencing the pointers, the same object is reached (i.e. *p == *q), and we therefore say that p and q are aliases.

Illustration of two pointers aliasing in memory

Two pointers – p and q – aliasing in memory: each pointer identifies a separate memory location (i.e. &p != &q), but when dereferencing the pointers – *p and *q – the same object is reached .


  1. The statement is technically wrong, see e.g. https://stackoverflow.com/questions/6502575, but the effect is ultimately the same: you get value semantics for small types, and reference semantics for larger types. 

Chapter 13:
Strings, Characters, and Streams

Attention

We assume ASCII strings, and mostly ignore Unicode, just as we did in the C++ course. Unicode in Python will be briefly addressed at the end of this section.

Strings and Characters in Python

Strings in Python work similar to C++’s std::string, but Python does not have a dedicated type for characters.

C++
std::string message = "I was here";
char first = message[0];

std::cout << message << "\n"; // I was here
std::cout << first << "\n";   // I
Python
message = "I was here"
first = message[0]

print(message) # I was here
print(type(message)) # str

print(first) # I
print(type(first)) # str

For Python, a single character is a string of length one, and there is no dedicated character type. Hence, both message and first are of dynamic type str in above Python snippet.

Characters and Their Numerical Values

In C++, char is a numerical type, and characters and numerical values can be used interchangeably. That is not the case in Python – after all, a character is just a string – and dedicated functions (ord and chr) must be used to convert from character to numerical value, and vice versa.

C++
char letter1 = 'A';
char letter2 = 65; // Also 'A'
std::cout << letter1; // A
std::cout << letter2; // A
std::cout << (int) letter1; // 65
std::cout << letter1 + 1; // B
Python
letter1 = "A";
letter2 = chr(65) # Also "A"
print(letter1) # A
print(letter2) # A
print(ord(letter1)) # 65
print(chr(ord(letter1) + 1)) # B

A few observations about the snippets above:

A Familiar Program I: Cyclic Character Shifting from Caesar Encoding

C++
// PRE: -32 < s < 32 (to avoid over-/underflows)
// POST: Returns c cyclically shifted by s
//       characters, if c is a printable character
//       (ASCII 32-126). Otherwise, returns c.
char shift(char c, int s) {
  if (32 <= c && c <= 126) // Is c printable?
    c = 32 + (c - 32 + s) % 95;

  return c;
}
Python
# POST: Returns c cyclically shifted by s
#       characters, if c is a printable character
#       (ASCII 32-126). Otherwise, returns c.
def shift(c_str, s):
  assert len(c_str) == 1 # Must be single character
  c = ord(c_str) # Get numeric character value

  if 32 <= c and c <= 126: # Is c printable?
    c = 32 + (c - 32 + s) % 95

  return chr(c)

The above programs are a C++ and a Python version of the cyclic shift function from the Caesar Encoding algorithm. The programs are basically identical, except that (1) for C++, arithmetic under-& overflows must be avoided (undefined behaviour on signed integers, including char), and that (2) for Python, explicit conversions between character (string) and numerical value (integer) are needed.

String Literals and Operations

String Literals

In C++, single quotes (') are used for characters, double quotes (") for strings. In Python, both are used for strings, and, e.g. assert 'equality' == "equality" therefore holds. This can be handy if strings need to contain quotes, e.g. as often necessary when (directly) generating JSON, XML or HTML output: instead of escaping characters, which makes the resulting strings less readable, different outside quotes can be used.

C++
std::cout << "id='007'\n";    // id='007'
std::cout << "id=\"007\"\n";  // id="007"
// std::cout << 'id="007"\n'; // Compiler error
std::cout << R"(id="007")";   // id="007"
Python
print("id='007'")   # id='007'
print("id=\"007\"") # id="007"
print('id="007"')   # id="007"

In the C++ program above, the raw string literal R"(...)" enables the use of double quotes inside strings, but it is arguably less nice than the corresponding Python code.

Both languages also support multi-line strings:

C++
// Note: indentation will be part of the string
std::cout << R"(
First line

Third line
)";
Python
# Note: indentation will be part of the string.
# Both """ and ''' behave identical.
print("""
First line

Third line
""")
String Operations

Python strings support Python’s common sequence operations, see also TODO: earlier chapter on containers, and many additional operations. The latter include, e.g. splitting at a particular character and checking if a string starts with a particular prefix.

A Familiar Program II: Caesar Encoding with Streams

C++
// Includes are omitted, as is namespace prefix std

char shift(char c, int s); // As above

// POST: Read characters from in, shift each
//       character by s, and write the result
//       to out
void caesar(istream& in, ostream& out, int s) {
  in >> noskipws; // Don't skip whitespace

  // En-/decoding loop
  char next;
  while (in >> next && next != '\n') {
    out << shift(next, s);
  }
}

int main() {
  const int shift = 7; // Character shift

  // v1: From string to console
  istringstream input("Top secret message");
  caesar(input, cout, shift);

  // v2: From file to console
  ifstream infile = ifstream("message.txt");
  caesar(infile, cout, shift);

  // Other source and destination streams are
  // possible, e.g. from console to file.
}
Python
# Imports are omitted

def shift(c_str, s) # As above

# POST: Read characters from in_stream, shift each
#       character by s, and write the result to
#       out_stream
def caesar(in_stream, out_stream, s):
  # Whitespaces are included by default

  # En-/decoding loop
  for line in in_stream:
    for char in line:
      print(shift(char, s), end="")


shift_by = 7 # Character shift

# v1: From string to console
message = io.StringIO("Top secret message")
caesar(message, sys.stdout, shift_by)

# v2: From file to console
with open("message.txt", "r") as infile:
  caesar(infile, sys.stdout, shift_by)

# Other source and destination streams are
# possible, e.g. from console to file.

The above programs illustrate that C++ and Python both implement the idea of streams that support reading/writing data from/to different sources (console, string, file, …) in a uniform way. For more information about Python’s streams, start reading here (files) and here (package io).

Chapter 14:
Recursion

Recursion in Python works just as it does in C++. The only noteworthy aspect is that the corresponding program may be slower, e.g. compare the runtime of the C++ and the Python version of the n-queens solver shown below.

A Familiar Program I: Fibonacci

C++
unsigned int fib(unsigned int n) {
  if (n < 2)
    return n;
  else
    return fib(n-2) + fib(n-1);
}
Python
def fib(n):
  if n < 2:
    return n
  else:
    return fib(n-2) + fib(n-1)

A Familiar Program II: N-Queens

C++
// Includes are omitted, as is namespace prefix std

using Queens = vector<unsigned int>;

// POST: Returns if queen in the given row occupies 
//       a valid position, i.e. does not threaten 
//       another queen.
bool valid(const Queens& queens, unsigned int row) {
  unsigned int c = queens.at(row);
  for (unsigned int r = 0; r != row; ++r) {
    if (c == queens.at(r) || // column
        c - row == queens.at(r) - r || // diagonal
        c + row == queens.at(r) + r) // diagonal

      return false;
  }

  return true;
}


// PRE:  queens.size() >= row, and
//       all queens from row 0 to row-1 are valid, 
//       i.e. do not threaten each other
// POST: Returns if there is a valid position for 
//       queens on row .. queens.size(). If so, 
//       vector queens contains a valid 
//       configuration.
bool solve(Queens& queens, unsigned int row) {
  if (row == queens.size())
    return true;

  for (unsigned int col = 0; 
       col != queens.size(); 
       ++col) {

    queens.at(row) = col;

    if (valid(queens, row) && 
        solve(queens, row + 1)) {

      return true;
    }
  }

  return false;
}

int main() {
  int n;
  cout << "Solve n-queens problem for n = ";
  cin >> n;

  Queens queens(n); // Vector with n times zero
  bool solved = solve(queens, 0);

  if (solved) {
    cout << "Found a solution\n";
  } else {
    cout << "No solution\n";
  }

  return 0;
}
Python
# POST: Returns if queen in the given row occupies 
#       a valid position, i.e. does not threaten 
#       another queen.
def valid(queens, row):
  c = queens[row]
  for r in range(0, row):
    if (c == queens[r] or # Column
        c - row == queens[r] - r or # Diagonal
        c + row == queens[r] + r): # Diagonal

      return False

  return True


# PRE:  len(queens) >= row, and
#       all queens from row 0 to row-1 are valid, 
#       i.e. do not threaten each other
# POST: Returns if there is a valid position for 
#       queens on row .. len(queens). If so, 
#       vector queens contains a valid 
#       configuration.
def solve(queens, row):
  if row == len(queens):
    return True

  for col in range(0, len(queens)):
    queens[row] = col

    if (valid(queens, row) and 
        solve(queens, row + 1)):

      return True

  return False


n = int(input("Solve n-queens problem for n = "));

queens = [0] * n # Sequence with n times zero
solved = solve(queens, 0)

if solved:
  print("Found a solution")
else:
  print("No solution")

Chapter 15:
Classes

Attention

  • This chapter ignores the inheritance aspect of classes, and focuses on structuring code and encapsulating data. Inheritance and subtyping are discussed in TODO: some later chapter.
  • Python has classes, but not structs; this does not reduce the language’s expressiveness, though, since C++ structs and classes differ only in their members’ default visibility.

Encapsulation and Information Hiding

Classes (ignoring inheritance and subtyping) serve two purposes: structuring a program by grouping data and operations thereon, and abstracting over implementation details. This has potential benefits for implementers and users of a class: class users don’t need to know how exactly a class is implemented, as long as they know which operations are possible. Class implementers, on the other hand, can change the internals of a class, as long as the operations the user relies on don’t behave differently. Moreover, implementers can protect the class data from direct access, which makes it possible to enforce crucial invariants on the data. In computer science, all these ideas are known as encapsulation and information hiding.

In the C++ course, we have seen several examples of this: simples classes, e.g. for rational and complex numbers, but also complex classes such as an array-based and a linked-list-based vector. Users can operate on a vector (add/remove elements, access elements, …) without having to know how the vector is implemented internally, and implementers can change the latter, as long as all operations remain functional.

A Familiar Program

Let’s begin with a very simple class for rational numbers that allows only object creation and getting a string representation:

C++
class Rational {
  int nom;
  int den; // INV: must not be zero

  public:
  Rational (int n, int d): nom(n), den(d) {
    assert(d != 0);
  }

  std::string to_string() const {
    return 
      std::to_string(this->nom) + 
      "/" + 
      std::to_string(this->den);
  }
};


Rational r = Rational(3,5);
std::cout << r.to_string(); // 3/5

// ERROR: member variable 'den' is private, and
//        thus not accessible from outside 
//        the class
// r.den = 0;
Python
class Rational:
  def __init__(self, n, d):
    assert d != 0

    self._nom = n
    self._den = d # INV: must not be zero

  def to_string(self):
    return "{}/{}".format(self._nom, self._den)


r = Rational(3,5)
print(r.to_string()) # 3/5

# Python does not support private vs. public 
# members, and the following assignment is
# therefore possible
r._den = 0

First Observations

The above programs illustrate several syntactic differences between C++ and Python, but beneath that, also similarities on the conceptual level. Let’s inspect the C++ program first.

Class Rational in C++
  • The class has fours members: two int-typed member variables (nom, den), a constructor (Rational) and a member function that returns a string representation (to_string).

  • Inside member functions, e.g. to_string, the member variables are dereferenced from the this pointer (e.g. this->nom). The explicit dereference is not necessary: e.g. just nom suffices, in which case this->... is implicitly assumed.

  • The constructor takes the parameters n and d, and assigns them (nom(n), den(d)) to the member variables nom and den, respectively. The syntax differs from regular assignments (for technical reasons), but that shouldn’t bother us.

  • The default member visibility for C++ classes is private, the member variables are therefore private and not accessible from the outside. In contrast, constructor and member function to_string are public.

  • Having private member variables, in combination with the assert in the constructor, ensures that den != 0 is indeed invariant, i.e. it always holds and cannot be violated (as attempted at the end of the program).

  • Member function to_string is marked as const, which gives users the guarantee that calling r.to_string() does not change the state of r.

C++ (same code as above)
class Rational {
  int nom;
  int den; // INV: must not be zero

  public:
  Rational (int n, int d): nom(n), den(d) {
    assert(d != 0);
  }

  std::string to_string() const {
    return 
      std::to_string(this->nom) + 
      "/" + 
      std::to_string(this->den);
  }
};


Rational r = Rational(3,5);
std::cout << r.to_string(); // 3/5

// ERROR: member variable 'den' is private, and
//        thus not accessible from outside 
//        the class
// r.den = 0;

Attention

Different programming (language) communities sometimes have different terminology for the same concept (sometimes with slightly different meaning). E.g. member variables are also known as instance variables, fields, or properties. We will continue to use C++ terminology in this tutorial, but you will most likely also encounter other terminology, e.g. when reading other tutorials, watching other videos, etc.

Class Rational in Python
  • The Python constructor also stores the constructor arguments in member variables of the newly created instance (e.g. self._nom = n).

  • Python does not support different visibilities, and all members are always public. By convention, variables starting with an underscore (e.g. _nom) are meant to be private and should not be accessed from the outside. However, Python does not enforce this restriction, as demonstrated by the last assignment in the program.

  • As a consequence, it is ultimately not possible to enforce invariants. However, decent programmers should refrain from accessing “private” member variables (those starting with an underscore) directly, in which case invariants (e.g. denominator not zero) can still be maintained.

  • As mentioned in TODO: some earlier chapter, Python does not support anything similar to const. Implementers of to_string could thus – by accident or because they are up to no good – change the object’s internal state, whereas in C++, the compiler would prevent this.

Python (same code as above)
class Rational:
  def __init__(self, n, d):
    assert d != 0

    self._nom = n
    self._den = d # INV: must not be zero

  def to_string(self):
    return "{}/{}".format(self._nom, self._den)


r = Rational(3,5) # Calls constructor (__init__)
print(r.to_string()) # 3/5

# Everything is public in Python, and the following
# assignment is therefore possible
r._den = 0

Operator Overloading

Recall from the lectures on C++ that operator overloading refers to implementing operators such that, e.g. r1 + r2 sums up two rationals. Without operator overloading, we could still sum up two rationals by calling a suitable function (e.g. add(r1, r2)), but using a corresponding operator allows code that looks more natural and direct. Operator overloading is possible in Python as well, but differs from C++ in two ways: it is only possible in the form of member functions, i.e. in the context of classes, and it uses a slightly less direct syntax.

The next two programs show the Rational classes from before, extended with overloaded operators for multiplying two rationals (*), in-place multiplication (*=), and equality comparison (==):

C++
class Rational {
  // ... as before ...

  Rational operator*(const Rational& other) const {
    int n = nom * other.nom;
    int d = den * other.den;

    return Rational(n, d);
  }

  void operator*=(const Rational& other) {
    nom *= other.nom;
    den *= other.den;
  }

  bool operator==(const Rational& other) {
    // In the C++ course, we used the GCD algorithm
    // to reduce rationals, e.g. from 6/4 to 3/2.
    // Here, we use a less elegant solution.
    return (nom * other.den) == (den * other.nom);
  }          
};


Rational r1 = Rational(3, 6);
Rational r2 = Rational(2, 3);
Rational r3 = Rational(1, 3);

assert(r1 * r2 == r3); // Holds

r2 *= r3;

assert(r2 == Rational(2, 9)); // Holds
Python
class Rational:
  # ... as before ...

  def __mul__(self, other):
    n = self._nom * other._nom;
    d = self._den * other._den;

    return Rational(nom, den)

  def __imul__(self, other):
    self._nom *= other._nom
    self._den *= other._den

    # NOTE: Python's operator *= is expected to 
    #       return the self-object
    return self

  def __eq__(self, other):
    # In the C++ course, we used the GCD algorithm
    # to reduce rationals, e.g. from 6/4 to 3/2.
    # Here, we use a less elegant solution.
    return \
      self._nom * other._den \
        == \
      self._den * other._nom


r1 = Rational(3, 6)
r2 = Rational(2, 3)
r3 = Rational(1, 3)

assert r1 * r2 == r3 # Holds

r2 *= r3

assert r2 == Rational(2, 9) # Holds

The main difference between the two programs is the syntax for operator overloading; another subtle difference is that in-place operators (here: *=) that modify the target (here: r2) are expected to return self in Python. To find out which operators can be overloaded, see cppreference.com for C++, and python.org for Python.

Producing Textual Representations

In order to output our rationals, we currently need to write std::cout << r.to_string() in C++, and print(r.to_string()) in Python. Instead, we would like to just write std::cout << r and print(r). In C++, this is done by overloading the << operator of output streams; in Python, we implement the special __str__ member function, which is called by print.

C++
class Rational {
  // ... as before, including function to_string ...
};

std::ostream& operator<<(std::ostream& out, 
                         const Rational& rat) 
{
  out << rat.to_string();

  return out;
}


Rational r = Rational(2, 1);
std::cout << r; // 2/1
Python
class Rational:
  def __init__(self, n, d):
    assert d != 0

    self._nom = n
    self._den = d # INV: must not be zero

  def __str__(self): # Previously named to_string
    return "{}/{}".format(self._nom, self._den)


r = Rational(2, 1)
print(r) # 2/1

Generic Functions Reloaded

Recall the generic min function from TODO: this earlier chapter: a single function that was applicable to different types, e.g. could compute min(1,2) as well as min("Hi", "Bye"). If we want to pass instances of our Rational class to min, we need to equip our rationals with a less-than function – i.e. we need operator overloading.

C++
class Rational {
  // ... as before ...

  bool operator<(const Rational& other) const {
    double this_frac = double(nom)/den;
    double other_frac = double(other.nom)/other.den;

    return this_frac < other_frac;
  }
};


template<typename T>
T min(T x, T y) {
  if (x < y) return x;
  else return y;
}


Rational r1 = Rational(3, 4);
Rational r2 = Rational(1, 4);

std::cout << "min = " << min(r1, r2); // 1/4
Python
class Rational:
  # ...as before ...

  def __lt__(self, other):
    self_frac = self._nom / self._den
    other_frac = other._nom / other._den

    return self_frac < other_frac


def min(x, y):
  if x < y: return x
  else: return y


r1 = Rational(3, 4);
r2 = Rational(1, 4);

print("min =", min(r1, r2)) # 1/4

In both programs, we overload the less-than operator < for rationals: in C++, as the member function operator<; in Python, as the member function __lt__. In both cases, we compare rationals by computing the real numbers (floating-point numbers) they correspond to. In the C++ program, this requires a conversion (cast) to double, to avoid precision loss due to integer division. This is not necessary in Python, whose division operator (/) performs floating-point division.

If the operators were not overloaded, then “the usual” happens: in C++, the compiler would report an error, whereas in Python, you would get a runtime error.

Declaring vs. Implementing Member Functions (Not Needed)

In the C++ course, you briefly learned that classes should be declared in header files (rational.h), but implemented in a separate C++ file (rational.cpp), as illustrated by the simplified example below. Fortunately, this separation is not necessary in Python.

C++ (rational.h)
class Rational {
  int nom;
  int den; // INV: must not be zero

  public:
  Rational (int n, int d);
  // ... more functions ...
};
C++ (rational.cpp)
Rational::Rational (int n, int d): nom(n), den(d) {
  assert(d != 0);
}

Further Readings