1.1. Python tutorial#

As of 2023 Q2, Python has become the most popular programming language. Python code is often said to be almost like pseudocode as it allows you to express powerful ideas in very few codes while being readable. As an example, here is an implementation of the classic quicksort algorithm in Python:

def quicksort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[len(arr) // 2]                    # set the middle as pivot
    left = [x for x in arr if x < pivot]          # find all x smaller than pivot in arr
    middle = [x for x in arr if x == pivot]       # find all x equal to pivot in arr
    right = [x for x in arr if x > pivot]         # find all x larger than pivot in arr
    return quicksort(left) + middle + quicksort(right)

print(quicksort([3, 6, 8, 10, 1, 2, 1]))
[1, 1, 2, 3, 6, 8, 10]

If you are already familiar with Python, you may choose to skip this tutorial; however, if not, this tutorial will be a quick crash course on Python programming as the basis for subsequent tutorials on data analytics and visualization. In this tutorial, we will cover:

  • Coding environment of Python

  • Basic data types and Containers

  • Control flows and Functions

  • Classes and Module imports

  • How to look for help

1.1.1. Jupyter Notebooks#

Before we dive into Python, we’d like to briefly talk about notebooks. A Jupyter notebook allows you to execute Python code in your web browser, along with writing your own documentation. In Jupyter notebook, the basic unit is cells, which categorize into two types:

  1. Code cells, where we write and execute Python codes, and

  2. Markdown cells, where we write our thoughts in texts using Markdown format.

Jupyter makes it an excellent place to test our codes in pieces and record your excellent ideas at the same time; for this reason, it is widely used in data analytics. What’s more 🤔? the Binder service allows us to run Python codes entirely in the cloud. Binder is basically Jupyter notebook on steroids: it’s free, requires no setup, comes preinstalled with many packages, and is easy to share with the world.

We will use Jupyter notebooks through this module and all the tutorials:

  • Run Tutorials in Binder (recommended). Just click the rocket logo 🚀 and binder at the very top of each tutorial.

  • Run Tutorials in Jupyter Notebook. If you wish to run the notebook locally, we would recommend installing Anaconda to manage your computer environment. After installation, you could open Anaconda Navigator and launch Jupyter Notebook in ‘Home’ page. You will get the same page as Binder. In ‘Environments’ Page, you could install many third-party packages.

1.1.2. Basic data types#

1.1.2.1. Numbers#

Integers and floats work as you would expect from other languages. When initializing variables, Python would assign proper data types. Python has a built-in function type() to look at the type.

x = 3  # x is a variable which is assigned with a numeric value: 3
y = 1.0

print(x, type(x))  #print() is a built-in function for printing 
print(y, type(y))
3 <class 'int'>
1.0 <class 'float'>

Python also supports common operators for numbers, as well as self-assignment operators.

print(x + 1)  # Addition
print(x * 2)  # Multiplication
print(x ** 2) # Exponentiation
print(x // 2) # Floor division
4
6
9
1
print(y)
y += 1  # Same as y = y + 1
print(y)
y *= 2  # Same as y = y * 2
print(y)
1.0
2.0
4.0

1.1.2.2. Booleans#

In Python, the two Boolean constants are written as True and False.

t, f = True, False  # Python can do multiple assignments in one line
print(type(t), type(f))
<class 'bool'> <class 'bool'>

Now let’s look at logial operators for Booleans: and, or and not.

print(t and f) # Logical AND;
print(t or f)  # Logical OR;
print(not t)   # Logical NOT;
False
True
False

We could do comparison operators to number pairs, which produce Boolean results.

x, y, z = 3, 1.0, 3.0
print(x < y)   # Return True if x is LESS than y
print(x == z)  # Return True if x is EQUAL to y
False
True

1.1.2.3. Strings#

h = 'hello'       # String literals can use single quotes
w = "world"       # or double quotes
print(h, len(h))  # Built-in function len() return the length of elements
hello 5

When performed on strings, + operator concatenate multiple string; but it cannot be used with other data types.

hw = h + ' ' + w  # String concatenation
print(hw)
hello world
hw1 = h + ' ' + w + 2023  # Cannot concatenate with numbers
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [10], in <cell line: 1>()
----> 1 hw1 = h + ' ' + w + 2023

TypeError: can only concatenate str (not "int") to str

However, we could format strings based on other variables. The good practice is to use {} as placeholders in strings and then use the format() method of strings to insert variables. Python will convert number variables to strings to be inserted.

format() method also allows us to insert variables in different orders or even use a variable multiple times.

hw1 = '{} {}! {}'.format(h, w, 2023)             # String formatting by sequence
print(hw1)
hw2 = '{1} {0}! {2} {2:.2f}'.format(h, w, 2023)  # String formatting by specifying orders and formats
print(hw2)
hello world! 2023
world hello! 2023 2023.00

String in Python is also an object, which comes with many useful methods; for example:

print(h)
print(h.upper())       # Convert a string to uppercase; prints "HELLO"
print(h.replace('l', '(ell)'))  # Replace all instances of one substring with another
hello
HELLO
he(ell)(ell)o

You can find more information about Python basic data types in the official documantion, such as the list of all string methods.

1.1.3. Containers#

It would be really cumbersome to manage each single data with a separate variable. Python includes four built-in container types to store collections of data: lists, dictionaries, tuples, and sets.

1.1.3.1. Lists#

Lists are used to store multiple items in a single variable. In Python syntax, they are enclosed in square brackets [] with data separated by a comma ,. Note that elements in Python lists can be different data types.

ls = [3, 1, 'foo']  # This list contains three elements with different types
print(ls, len(ls))
[3, 1, 'foo'] 3

After creation, we could use append method to add elements to the end of lists, and pop method to remove a specific element. Some other methods of list objects can be found here.

ls.append('bar') # Add a new element to the end of the list
print(ls)
ls.pop()         # Remove and return the last element of the list
print(ls)
[3, 1, 'foo', 'bar']
[3, 1, 'foo']

There are two ways to retrive value(s) in lists:

  1. Index one item: Just use index number within enclosed brackets []. Note that in python, indexing starts from 0. Indexing can also be in reverse order using negative values as following.

../../_images/python_index.png
print(ls[2])     # Indexing 3rd element; list indexing starts from 0
print(ls[-1])    # Negative indices count from the end of the list
foo
foo
  1. Slice a part: Slicing is done by defining the index values of the first element (a) and the last element (b) in the form of parentlist [a:b]. Note that b is not included in the resulting slicing. If a (or b) is not defined then slicing will include from the first (or till the last).

nums = [0, 1, 2, 3, 4, 5, 6]
print(nums)
print(nums[2:4])    # Get a slice from index 2 to 4 (exclusive)
print(nums[2:])     # Get a slice from index 2 to the end
print(nums[:-1])    # Slice indices can also be negative
nums[2:4] = [8, 9]  # Assign a new sublist to a slice
print(nums)
[0, 1, 2, 3, 4, 5, 6]
[2, 3]
[2, 3, 4, 5, 6]
[0, 1, 2, 3, 4, 5]
[0, 1, 8, 9, 4, 5, 6]

You can also slice with a fixed step length (c): [a:b:c].

print(nums[:-1:2])  # Get a slice from index 0 to -1 (exclusive) in a step length of 2
print(nums[::-1])   # Get a slice of whole list in reverse order
[0, 8, 4]
[6, 5, 4, 9, 8, 1, 0]

We will meet slicing again in NumPy tutorial.

1.1.3.2. Dictionaries#

A dictionary stores pairs of key and value in the form of braces {key: value}. Dictionaries are more like a database because here you can index a particular sequence with your user-defined string.

d = {'cat': 'cute', 'dog': 'furry'}  # Create a new dictionary with some data
print(d['cat'])       # Get an value from a dictionary
cute
print('fish' in d)  # `in` is the membership operator to check the presence
d['fish'] = 'wet'   # Set a new entry in a dictionary
print('fish' in d)
False
True

One useful built-in method of dictionaries is get where you can get the value with a default for the cases when the key does not exist.

print(d.get('monkey', 'N/A'))  # Get a value with a default
print(d.get('fish', 'N/A'))    # Get a value with a default
N/A
wet

1.1.3.3. Tuples#

A tuple is an immutable ordered version of lists in the form of parentheses ().

t1 = (5, 6)  # Create a tuple

print(t1, type(t1))
(5, 6) <class 'tuple'>
t1[0] = 1  # Tuple is immutable after initialization; 
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [22], in <cell line: 1>()
----> 1 t1[0] = 1

TypeError: 'tuple' object does not support item assignment

1.1.4. Control Flow#

1.1.4.1. Conditions: if-elif-else#

Control flow of conditions is used to specify different codes of algorithms to run under different conditions. Next is an example. Note that there should be indentation with four blanks for each section of algorithms.

x, y = 10, 12

if x > y:
    print("x>y")  # Four blanks before the algorithm
elif x < y:
    print("x<y")  # Four blanks before the algorithm
else:
    print("x=y")  # Four blanks before the algorithm
x<y

1.1.4.2. Loops:#

Control flow of loops is used to iterate codes for each element in containers or under a specific condition.

  • for loops across an iterable object

List itself is a typical iterable object. Here is an example that iterates over list’s elements. Python built-in function range(a, b) also returns an iterable sequence from a to b (b not included) with increments by 1 (by default), which is quite common in for loops.

list_of_lists = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

for list1 in list_of_lists: # Iterate over elements in list_of_lists
    print(list1)            # Four blanks before the algorithm
print('Bye')                # Without four blanks, this is not a part of iterations
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
Bye
for i in range(0, 2):
    print(list_of_lists[i])
[1, 2, 3]
[4, 5, 6]

For a dictionary, its item() method returns an iterable list of keys and values, which could also be employed in for loops. As long as we have an iterable item, we could use it for for loops.

d = {'person': 2, 'cat': 4, 'spider': 8}

for animal, legs in d.items():
    print('A {} has {} legs'.format(animal, legs))
A person has 2 legs
A cat has 4 legs
A spider has 8 legs
  • while loops under a specific condition

i = 1
while i < 3:    # Iterate when i smaller than 3
    print(i**2) # Four blanks before each line of algorithm
    i += 1      # Four blanks before
1
4

1.1.4.3. List comprehension and dictionary comprehension#

As a special feature, comprehension offers a shorter loop syntax within one line. We can employ control flows of other lists for the initialization of a new list.

nums = [0, 1, 2, 3, 4]
squares = [x**2 for x in nums]
print(squares)

even_squares = [x**2 for x in nums if x % 2 == 0]
print(even_squares)
[0, 1, 4, 9, 16]
[0, 4, 16]

Similarly, for dictionaries, we could use dictionary comprehension to create a new dictionary based on an existing list.

even_num_to_square = {x: x**2 for x in nums if x % 2 == 0}
print(even_num_to_square)
{0: 0, 2: 4, 4: 16}

1.1.5. Functions#

Python functions are defined using the def keyword. Here is an example function sign(x) which return the sign of x. To call a function, use the function name followed by parenthesis.

def sign(x):   # Define a function with one argument x
    '''determine the sign of a single value'''
    if x > 0:  # four blanks before each line within the function body
        return 'positive'  # another four blanks within `if` expressions
    elif x < 0:
        return 'negative'
    else:
        return 'zero'

for x in [-1, 0, 1]: 
    print("{} is {}".format(x, sign(x)))  # Use parenthesis to run functions 
-1 is negative
0 is zero
1 is positive

The above function could be read as: a function by name sign is defined, which accepts one argument x. The statements of this function is an if-elif-else condition flow, returning a string positive if x > 0, negative if x < 0, or zero when x == 0.

We often define functions to take optional keyword arguments, like this:

def hello(name, loud=False):
    if loud:
        print('HELLO, {}'.format(name.upper()))
    else:
        print('Hello, {}!'.format(name))

hello('Bob')  # Without specifin second argument, function would take default for it
hello('Fred', loud=True)
Hello, Bob!
HELLO, FRED

For the case of one line simple algorithm within the function, Python offers a short lambda syntax in the form of lambda arguments: expression. The following is an example to calculate \(y=x^3+x^2+x\).

y = lambda x: x**3 + x**2 + x  # define a simple lambda funtion
y(-1)                          # call it
-1

1.1.6. Classes and Objects#

Using the above knowledge, we can already code the process of algorithms we want. However, the real magic and power of Python are its numerous online packages, supported by evolving and active communities. Before diving into them, let’s take a very brief look at the object-oriented programming paradigm in Python, as almost all packages use this paradigm to pack their codes. TBH, almost everything in Python are objects.

Object-oriented programming defines a class – a “blueprint” for creating objects – at first; then we create an instance (object), which will incorporate its own properties and methods as defined by class. Properties are the variables reflecting status of this instance, and methods are the functions we could operate on an instance. The syntax for defining classes in Python is straightforward. Note that . is used to assign properties, and __init__ is required for each class as the instance initialization method.

Let’s try the following Car 🚗 class example:

class Car():
    
    def __init__(self, company, model, year):  
        """initialize the properties of a car"""
        self.company = company  # claim one property of object
        self.model = model
        self.year = year
        self.odometer = 0
        
    def get_info(self):  # functions are the methods of object
        """return car information in a string"""
        car_info = "{} {} {}".format(self.year, self.company, self.model)
        return car_info
        
    def run(self, distance):
        """run car for a distance"""
        self.odometer += distance
    
    def read_odometer(self):
        """return the distances the car has run through"""
        odo_info = "This car has run {} km.".format(self.odometer)
        return odo_info
../../_images/python_class.png

Let’s say now I buy a new Car and name it as my_lovely_car🚘:

my_lovely_car = Car("Tesla", "Model 3", 2022)  # calling class name would trigger __init__ to create an instance
print(my_lovely_car.company)     # retrive a property
print(my_lovely_car.get_info())  # call a method
Tesla
2022 Tesla Model 3

And today I drive it for 5.5 km. Then I wish to glance at distances it has already run by.

print(my_lovely_car.read_odometer())
my_lovely_car.run(5.5)
print(my_lovely_car.read_odometer())
This car has run 0 km.
This car has run 5.5 km.
Car.run(my_lovely_car, 3)  # methods can also be called under the class name
print(Car.read_odometer(my_lovely_car))
This car has run 8.5 km.

1.1.7. Import modules#

It’s time to leverage on numerous Python-based packages to empower our codes 💪. We can import modules by a statement of import. To access one of the functions, we could specify the name of the module and the name of the classes or functions, concatenating by a dot ..

import numpy  # import third-party NumPy module

print(numpy)
print(numpy.arange(1, 5))  # arange function generates an array with evenly spaced values
<module 'numpy' from '/Users/baymin/opt/anaconda3/lib/python3.9/site-packages/numpy/__init__.py'>
[1 2 3 4]

Sometimes, in order to facilitate scripting we assign a short alias to the module name; we may also directly import specific functions or subpackages so that we could use it without the module name.

# Assign a short alias to make it easier for us to use it
import numpy as np
print(np.arange(1, 4))

# Import a submodule in module
from numpy import random  # random is a submodule of numpy for random sampling
print(random.random())    # random.random() function generates a random value
[1 2 3]
0.7186135151017801
# Try this!!!
import antigravity

1.1.8. One more thing: Look for help#

Coding is also a journey of DEBUG 🐞. For programmers, it is important to learn how to solve problems. Here are some suggestions when you feel stuck or confused.

1.1.8.2. Read online official documentation#

When learning a new package, it’s always good to briefly read its official documentation. A typical well-documented package offers User Guide (introducing the framework to work around the package), API references (listing details of each entry in the package), and gallery (showing off their good examples).

Try browsing the official website of machine learning package scikit-learn! Its doc combines theory, codes, and visualization to deliver ideas.

1.1.8.3. Search in community: Stack Overflow#

Python can also be titled as an internet-based programming language, not only because of so many open-source third-party packages available online, but also due to actively engaged communities. StackOverflow is a great Q&A website to search for similar questions from community buddies as you have. The answers supported most by community will show at first following the question.

1.1.8.4. Ask in ChatGPT#

ChatGPT is an AI-powered language model developed by OpenAI. Prompted with proper questions, ChatGPT can give suggestions on code snippets that fit your demand.

1.1.9. References#