Basics of Python Programming Language

What is Python?

  • Python is a dynamic, interpreted general purpose programming language initially created by Guido van Rossum in 1991.
  • Supports several popular programming paradigms:
    • procedural
    • object-oriented
    • functional
  • Python is widely used in bioinformatics and scientific computing, as well as many other fields and in industry.
  • Specifically designed to make programs readable and easy to develop.
  • Versatile and easy-to-use language.
  • Python is available on all popular operating systems

Why learn Python?

  • R and Python are the two most popular programming languages used by data analysts and data scientists. Both are free and open source.
  • Python is a general-purpose programming language, while R is a statistical programming language.
  • Google Trend Search Index for R (blue) versus Python (red) over the last 10 years:

Python Programming Language

  • Standard library provides built-in support for several common tasks:
    • numerical & mathematical functions
    • interacting with files and the operating system etc.
  • Has a rich library:
    • Pandas - Data Manipulation and Analysis
    • BioPython - For Bioinformatics
    • NumPy - Multi-dimensional arrays/matrices along with high-level mathematical functions
    • Matplotlib - For Plots
    • TensorFlow - Machine Learning and AI

How to use Python?

Interactive Mode

  • First invoke the Python interpreter and then work with it interactively.
  • Give the interpreter Python commands, one at a time.
  • To start the Python interpreter in interactive mode, type the command python on the command-line (shell prompt), as shown below.

Scripting Mode

  • Scripting mode is also called the normal mode (programming mode)
  • Non-interactive
  • Provide the Python interpreter a text file containing a Python program (script) as input, on the command-line, as follows:

Jupyter Notebook

  • A web application that allows creating and sharing documents that contain live code, equations, visualizations and explanatory text.
  • Provides a rich architecture for interactive data science and scientific computing with:
    • Over 40 programming languages such as Python, R, Julia and Scala.
    • A browser-based notebook with support for code, rich text, math expressions, plots and other rich media.
    • Support for interactive data visualization.
    • Easy to use tools for parallel computing.
Tip

In this course, we will use Jupyter Notebook (or JupyterLab—the more powerful and extensible evolution of Jupyter Notebook—for all Python coding exercises). Feel free to choose either Jupyter Notebook or JupyterLab based on your preference, as both are widely supported and have similar functionality for the purposes of this course.

Any IDE of your choice

Google search top 10 results: Integrated development environment Software / python

A look around Jupyter Notebook

  • Open Jupyter Notebook and create a new notebook as shown below.

  • Juptyer has two modes: edit mode (blue cell border) and command mode (grey) cell border).

    • To enter edit mode, press Enter or click into a cell. In edit mode, most of the keyboard is dedicated to typing into the cell’s editor. Thus, in edit mode there are relatively few shortcuts.

    • To enter command mode, press Esc. In command mode, the entire keyboard is available for shortcuts, so there are many more.

  • To enter different types of content—such as Markdown, raw text, or Python code, you need to select the appropriate cell type. You can do this by using the drop-down menu in the toolbar at the top of the Jupyter Notebook interface.

    • Code cells allow you to write and execute Python code.

    • Markdown cells are used to format text with headings, lists, links, and other rich text features.

    • Raw cells allow you to input unformatted text that will not be executed or rendered.

  • Shortcuts to execute cells in both modes:

    • Shift + Enter run the current cell
    • Ctrl + Enter (Mac: Cmd + Enter) run selected cells
    • Alt + Enter (Mac: Option + Enter) run the current cell, insert below
    • Ctrl + S (Mac: Cmd + S) save and checkpoint
  • Some useful shortcuts, in command mode:

    • Up select cell above
    • Down select cell below
    • Shift + Up extend selected cells above
    • Shift + Down extend selected cells below
    • A insert cell above
    • B insert cell below
    • D + D (press the key twice) delete selected cells
  • The Help->Keyboard Shortcuts dialog lists the available shortcuts (or type h in Command mode).

To get started, open Jupyter Notebook and navigate to the IntroPython folder, which was shared with you. Once inside, open the notebook titled IntroPython-Day1.ipynb. This notebook contains the materials and exercises for Day 1 of the course.

Comments

When writing code it is very handy to make notes to yourself about what the code is doing. In Python, any text that appears after the hash symbol ‘#’ is called a ‘comment’. Python interpreter can’t see this text, and won’t try to run it as commands. Comments are useful for reminding your future self what you were aiming to do with a particular line of code, and what was or wasn’t working.

# This is a comment

Help

The Python help() function invokes the interactive built-in help system. If the argument is a string, then the string is treated as the name of a module, function, class, keyword, or documentation topic, and a help page is printed on the console. If the argument is any other kind of object, a help page on the object is displayed.

It’s recommended to try it in your interpreter when you need help to write Python program and use Python modules.

The following displays the help on the builtin print function.

help('print')
Help on built-in function print in module builtins:

print(...)
    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
    
    Prints the values to a stream, or to sys.stdout by default.
    Optional keyword arguments:
    file:  a file-like object (stream); defaults to the current sys.stdout.
    sep:   string inserted between values, default a space.
    end:   string appended after the last value, default a newline.
    flush: whether to forcibly flush the stream.

The following displays the help page on the math module (or library).

help('math')
Help on module math:

NAME
    math

MODULE REFERENCE
    https://docs.python.org/3.9/library/math
    
    The following documentation is automatically generated from the Python
    source files.  It may be incomplete, incorrect or include features that
    are considered implementation detail and may vary between Python
    implementations.  When in doubt, consult the module reference at the
    location listed above.

DESCRIPTION
    This module provides access to the mathematical functions
    defined by the C standard.

FUNCTIONS
    acos(x, /)
        Return the arc cosine (measured in radians) of x.
        
        The result is between 0 and pi.
    
    acosh(x, /)
        Return the inverse hyperbolic cosine of x.
    
    asin(x, /)
        Return the arc sine (measured in radians) of x.
        
        The result is between -pi/2 and pi/2.
    
    asinh(x, /)
        Return the inverse hyperbolic sine of x.
    
    atan(x, /)
        Return the arc tangent (measured in radians) of x.
        
        The result is between -pi/2 and pi/2.
    
    atan2(y, x, /)
        Return the arc tangent (measured in radians) of y/x.
        
        Unlike atan(y/x), the signs of both x and y are considered.
    
    atanh(x, /)
        Return the inverse hyperbolic tangent of x.
    
    ceil(x, /)
        Return the ceiling of x as an Integral.
        
        This is the smallest integer >= x.
    
    comb(n, k, /)
        Number of ways to choose k items from n items without repetition and without order.
        
        Evaluates to n! / (k! * (n - k)!) when k <= n and evaluates
        to zero when k > n.
        
        Also called the binomial coefficient because it is equivalent
        to the coefficient of k-th term in polynomial expansion of the
        expression (1 + x)**n.
        
        Raises TypeError if either of the arguments are not integers.
        Raises ValueError if either of the arguments are negative.
    
    copysign(x, y, /)
        Return a float with the magnitude (absolute value) of x but the sign of y.
        
        On platforms that support signed zeros, copysign(1.0, -0.0)
        returns -1.0.
    
    cos(x, /)
        Return the cosine of x (measured in radians).
    
    cosh(x, /)
        Return the hyperbolic cosine of x.
    
    degrees(x, /)
        Convert angle x from radians to degrees.
    
    dist(p, q, /)
        Return the Euclidean distance between two points p and q.
        
        The points should be specified as sequences (or iterables) of
        coordinates.  Both inputs must have the same dimension.
        
        Roughly equivalent to:
            sqrt(sum((px - qx) ** 2.0 for px, qx in zip(p, q)))
    
    erf(x, /)
        Error function at x.
    
    erfc(x, /)
        Complementary error function at x.
    
    exp(x, /)
        Return e raised to the power of x.
    
    expm1(x, /)
        Return exp(x)-1.
        
        This function avoids the loss of precision involved in the direct evaluation of exp(x)-1 for small x.
    
    fabs(x, /)
        Return the absolute value of the float x.
    
    factorial(x, /)
        Find x!.
        
        Raise a ValueError if x is negative or non-integral.
    
    floor(x, /)
        Return the floor of x as an Integral.
        
        This is the largest integer <= x.
    
    fmod(x, y, /)
        Return fmod(x, y), according to platform C.
        
        x % y may differ.
    
    frexp(x, /)
        Return the mantissa and exponent of x, as pair (m, e).
        
        m is a float and e is an int, such that x = m * 2.**e.
        If x is 0, m and e are both 0.  Else 0.5 <= abs(m) < 1.0.
    
    fsum(seq, /)
        Return an accurate floating point sum of values in the iterable seq.
        
        Assumes IEEE-754 floating point arithmetic.
    
    gamma(x, /)
        Gamma function at x.
    
    gcd(*integers)
        Greatest Common Divisor.
    
    hypot(...)
        hypot(*coordinates) -> value
        
        Multidimensional Euclidean distance from the origin to a point.
        
        Roughly equivalent to:
            sqrt(sum(x**2 for x in coordinates))
        
        For a two dimensional point (x, y), gives the hypotenuse
        using the Pythagorean theorem:  sqrt(x*x + y*y).
        
        For example, the hypotenuse of a 3/4/5 right triangle is:
        
            >>> hypot(3.0, 4.0)
            5.0
    
    isclose(a, b, *, rel_tol=1e-09, abs_tol=0.0)
        Determine whether two floating point numbers are close in value.
        
          rel_tol
            maximum difference for being considered "close", relative to the
            magnitude of the input values
          abs_tol
            maximum difference for being considered "close", regardless of the
            magnitude of the input values
        
        Return True if a is close in value to b, and False otherwise.
        
        For the values to be considered close, the difference between them
        must be smaller than at least one of the tolerances.
        
        -inf, inf and NaN behave similarly to the IEEE 754 Standard.  That
        is, NaN is not close to anything, even itself.  inf and -inf are
        only close to themselves.
    
    isfinite(x, /)
        Return True if x is neither an infinity nor a NaN, and False otherwise.
    
    isinf(x, /)
        Return True if x is a positive or negative infinity, and False otherwise.
    
    isnan(x, /)
        Return True if x is a NaN (not a number), and False otherwise.
    
    isqrt(n, /)
        Return the integer part of the square root of the input.
    
    lcm(*integers)
        Least Common Multiple.
    
    ldexp(x, i, /)
        Return x * (2**i).
        
        This is essentially the inverse of frexp().
    
    lgamma(x, /)
        Natural logarithm of absolute value of Gamma function at x.
    
    log(...)
        log(x, [base=math.e])
        Return the logarithm of x to the given base.
        
        If the base not specified, returns the natural logarithm (base e) of x.
    
    log10(x, /)
        Return the base 10 logarithm of x.
    
    log1p(x, /)
        Return the natural logarithm of 1+x (base e).
        
        The result is computed in a way which is accurate for x near zero.
    
    log2(x, /)
        Return the base 2 logarithm of x.
    
    modf(x, /)
        Return the fractional and integer parts of x.
        
        Both results carry the sign of x and are floats.
    
    nextafter(x, y, /)
        Return the next floating-point value after x towards y.
    
    perm(n, k=None, /)
        Number of ways to choose k items from n items without repetition and with order.
        
        Evaluates to n! / (n - k)! when k <= n and evaluates
        to zero when k > n.
        
        If k is not specified or is None, then k defaults to n
        and the function returns n!.
        
        Raises TypeError if either of the arguments are not integers.
        Raises ValueError if either of the arguments are negative.
    
    pow(x, y, /)
        Return x**y (x to the power of y).
    
    prod(iterable, /, *, start=1)
        Calculate the product of all the elements in the input iterable.
        
        The default start value for the product is 1.
        
        When the iterable is empty, return the start value.  This function is
        intended specifically for use with numeric values and may reject
        non-numeric types.
    
    radians(x, /)
        Convert angle x from degrees to radians.
    
    remainder(x, y, /)
        Difference between x and the closest integer multiple of y.
        
        Return x - n*y where n*y is the closest integer multiple of y.
        In the case where x is exactly halfway between two multiples of
        y, the nearest even value of n is used. The result is always exact.
    
    sin(x, /)
        Return the sine of x (measured in radians).
    
    sinh(x, /)
        Return the hyperbolic sine of x.
    
    sqrt(x, /)
        Return the square root of x.
    
    tan(x, /)
        Return the tangent of x (measured in radians).
    
    tanh(x, /)
        Return the hyperbolic tangent of x.
    
    trunc(x, /)
        Truncates the Real x to the nearest Integral toward 0.
        
        Uses the __trunc__ magic method.
    
    ulp(x, /)
        Return the value of the least significant bit of the float x.

DATA
    e = 2.718281828459045
    inf = inf
    nan = nan
    pi = 3.141592653589793
    tau = 6.283185307179586

FILE
    /Users/sanduniprasadi/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.9/lib-dynload/math.cpython-39-darwin.so

The help function can also be used on built-in or user-defined classes.

help('int')
Help on class int in module builtins:

class int(object)
 |  int([x]) -> integer
 |  int(x, base=10) -> integer
 |  
 |  Convert a number or string to an integer, or return 0 if no arguments
 |  are given.  If x is a number, return x.__int__().  For floating point
 |  numbers, this truncates towards zero.
 |  
 |  If x is not a number or if base is given, then x must be a string,
 |  bytes, or bytearray instance representing an integer literal in the
 |  given base.  The literal can be preceded by '+' or '-' and be surrounded
 |  by whitespace.  The base defaults to 10.  Valid bases are 0 and 2-36.
 |  Base 0 means to interpret the base from the string as an integer literal.
 |  >>> int('0b100', base=0)
 |  4
 |  
 |  Built-in subclasses:
 |      bool
 |  
 |  Methods defined here:
 |  
 |  __abs__(self, /)
 |      abs(self)
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __and__(self, value, /)
 |      Return self&value.
 |  
 |  __bool__(self, /)
 |      True if self else False
 |  
 |  __ceil__(...)
 |      Ceiling of an Integral returns itself.
 |  
 |  __divmod__(self, value, /)
 |      Return divmod(self, value).
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __float__(self, /)
 |      float(self)
 |  
 |  __floor__(...)
 |      Flooring an Integral returns itself.
 |  
 |  __floordiv__(self, value, /)
 |      Return self//value.
 |  
 |  __format__(self, format_spec, /)
 |      Default object formatter.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getnewargs__(self, /)
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __hash__(self, /)
 |      Return hash(self).
 |  
 |  __index__(self, /)
 |      Return self converted to an integer, if self is suitable for use as an index into a list.
 |  
 |  __int__(self, /)
 |      int(self)
 |  
 |  __invert__(self, /)
 |      ~self
 |  
 |  __le__(self, value, /)
 |      Return self<=value.
 |  
 |  __lshift__(self, value, /)
 |      Return self<<value.
 |  
 |  __lt__(self, value, /)
 |      Return self<value.
 |  
 |  __mod__(self, value, /)
 |      Return self%value.
 |  
 |  __mul__(self, value, /)
 |      Return self*value.
 |  
 |  __ne__(self, value, /)
 |      Return self!=value.
 |  
 |  __neg__(self, /)
 |      -self
 |  
 |  __or__(self, value, /)
 |      Return self|value.
 |  
 |  __pos__(self, /)
 |      +self
 |  
 |  __pow__(self, value, mod=None, /)
 |      Return pow(self, value, mod).
 |  
 |  __radd__(self, value, /)
 |      Return value+self.
 |  
 |  __rand__(self, value, /)
 |      Return value&self.
 |  
 |  __rdivmod__(self, value, /)
 |      Return divmod(value, self).
 |  
 |  __repr__(self, /)
 |      Return repr(self).
 |  
 |  __rfloordiv__(self, value, /)
 |      Return value//self.
 |  
 |  __rlshift__(self, value, /)
 |      Return value<<self.
 |  
 |  __rmod__(self, value, /)
 |      Return value%self.
 |  
 |  __rmul__(self, value, /)
 |      Return value*self.
 |  
 |  __ror__(self, value, /)
 |      Return value|self.
 |  
 |  __round__(...)
 |      Rounding an Integral returns itself.
 |      Rounding with an ndigits argument also returns an integer.
 |  
 |  __rpow__(self, value, mod=None, /)
 |      Return pow(value, self, mod).
 |  
 |  __rrshift__(self, value, /)
 |      Return value>>self.
 |  
 |  __rshift__(self, value, /)
 |      Return self>>value.
 |  
 |  __rsub__(self, value, /)
 |      Return value-self.
 |  
 |  __rtruediv__(self, value, /)
 |      Return value/self.
 |  
 |  __rxor__(self, value, /)
 |      Return value^self.
 |  
 |  __sizeof__(self, /)
 |      Returns size in memory, in bytes.
 |  
 |  __sub__(self, value, /)
 |      Return self-value.
 |  
 |  __truediv__(self, value, /)
 |      Return self/value.
 |  
 |  __trunc__(...)
 |      Truncating an Integral returns itself.
 |  
 |  __xor__(self, value, /)
 |      Return self^value.
 |  
 |  as_integer_ratio(self, /)
 |      Return integer ratio.
 |      
 |      Return a pair of integers, whose ratio is exactly equal to the original int
 |      and with a positive denominator.
 |      
 |      >>> (10).as_integer_ratio()
 |      (10, 1)
 |      >>> (-10).as_integer_ratio()
 |      (-10, 1)
 |      >>> (0).as_integer_ratio()
 |      (0, 1)
 |  
 |  bit_length(self, /)
 |      Number of bits necessary to represent self in binary.
 |      
 |      >>> bin(37)
 |      '0b100101'
 |      >>> (37).bit_length()
 |      6
 |  
 |  conjugate(...)
 |      Returns self, the complex conjugate of any int.
 |  
 |  to_bytes(self, /, length, byteorder, *, signed=False)
 |      Return an array of bytes representing an integer.
 |      
 |      length
 |        Length of bytes object to use.  An OverflowError is raised if the
 |        integer is not representable with the given number of bytes.
 |      byteorder
 |        The byte order used to represent the integer.  If byteorder is 'big',
 |        the most significant byte is at the beginning of the byte array.  If
 |        byteorder is 'little', the most significant byte is at the end of the
 |        byte array.  To request the native byte order of the host system, use
 |        `sys.byteorder' as the byte order value.
 |      signed
 |        Determines whether two's complement is used to represent the integer.
 |        If signed is False and a negative integer is given, an OverflowError
 |        is raised.
 |  
 |  ----------------------------------------------------------------------
 |  Class methods defined here:
 |  
 |  from_bytes(bytes, byteorder, *, signed=False) from builtins.type
 |      Return the integer represented by the given array of bytes.
 |      
 |      bytes
 |        Holds the array of bytes to convert.  The argument must either
 |        support the buffer protocol or be an iterable object producing bytes.
 |        Bytes and bytearray are examples of built-in objects that support the
 |        buffer protocol.
 |      byteorder
 |        The byte order used to represent the integer.  If byteorder is 'big',
 |        the most significant byte is at the beginning of the byte array.  If
 |        byteorder is 'little', the most significant byte is at the end of the
 |        byte array.  To request the native byte order of the host system, use
 |        `sys.byteorder' as the byte order value.
 |      signed
 |        Indicates whether two's complement is used to represent the integer.
 |  
 |  ----------------------------------------------------------------------
 |  Static methods defined here:
 |  
 |  __new__(*args, **kwargs) from builtins.type
 |      Create and return a new object.  See help(type) for accurate signature.
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  denominator
 |      the denominator of a rational number in lowest terms
 |  
 |  imag
 |      the imaginary part of a complex number
 |  
 |  numerator
 |      the numerator of a rational number in lowest terms
 |  
 |  real
 |      the real part of a complex number

Printing values

The print command allows printing messages and also to execute some expressions. Below shows some example usages of the print statement.

To print a message or text enclose it within quotation marks.

print("Hello World!")
Hello World!

You can print multiple expressions by separating them with commas. Python will insert a space between each element and a newline at the end of the message.

You can change this behaviour using the following two arguments.

  • sep - takes a character that separates multiple print statements
  • end - takes a character to print add the end of the statement.
print(1, 2, 3)
print(1, 2, 3, sep='|')
print(1, 2, 3, sep=',', end='*')
1 2 3
1|2|3
1,2,3*

Some additional example usages of print command:

  • Python provides multiple ways to format numbers using f-strings as follows.

    {data:[align][width][delimiter].[precision]}
    • Align: < (left) > (right) ^ (center)
    • Width: number of characters
    • Delimiter: 1000s separator (normally , or _)
    • Precision: how many digits displayed for decimal numbers or maximum field width for strings
      • f is fixed decimal places
      • g is significant figures

    Examples:

    # Occupy 10 spaces, align left, show 5 decimal places
    print(f'This is one way: {22/7:<10.5f}.')
    # Occupy 20 spaces, align center, no decimal places
    print(f'This is another way: {22/7:^20.0f}.')
    This is one way: 3.14286   .
    This is another way:          3          .
  • Using string format method:

    print('First name is {} and the last name is "{}!"'.format('john', 'doe'))
    First name is john and the last name is "doe!"

Variables and Assignment

In the previous examples we directly used numbers and strings. However, we might want to assign values to variables for later usage or to deal with more complex expressions. We can associate a name to a value/expression and access the value/expression through the associated name.

x = 2
print(x)
2
y = 5 * 3
print(y)
15

We cannot use arbitrary strings as variables. The Python variable naming rules are:

  • Must begin with a letter (a - z, A - Z) or underscore (_).
  • Other characters can be letters, numbers or _ only.
  • Names are case sensitive.
  • Reserved words cannot be used as a variable name.

Basic Data Types

There are three basic numeric types in Python:

  • Plain integers with unlimited precision (int)
  • Floating point numbers or numbers with a decimal point (float)
  • Complex numbers (complex)

In addition, Booleans (bool) are a subtype of integers. They represent truth or false as used in logical operations.

x = 23
y = -9
z = complex(3, 5)
print(x, y, z)
23 -9 (3+5j)
p = 5.67
q = -22/7
r = 2e-3
print(p, q, r, sep='\n')
5.67
-3.142857142857143
0.002

You can check the type of values using the built-in function type() as follows.

type(0)
type(22/7)
type(complex(1, 2))
type(True)
type(False)
<class 'int'>
<class 'float'>
<class 'complex'>
<class 'bool'>
<class 'bool'>

Python converts numbers internally in an expression containing mixed types to a common type for evaluation. But sometimes, we need to coerce a number explicitly from one type to another to satisfy the requirements of an operator or function parameter.

x = "5"
print(int(x))        # convert x to a plain integer
print(float(x))      # convert x to a floating-point number

x = 3
y = 7
# convert x to a complex number with real part x and imaginary part zero
print(complex(x))    
# convert x and y to a complex number with real part x and imaginary part y
print(complex(x, y)) 
5
5.0
(3+0j)
(3+7j)

Sequences

The most basic data structure in Python is the sequence. Sequences are compound data types, and used to group together other values. Each element of a sequence is assigned a number - its position or index. The first index is zero, the second index is one, and so forth.

There are certain things you can do with all sequence types. These operations include indexing, slicing, adding, multiplying, and checking for membership. In addition, Python has many built-in functions to be used with sequence types: e.g., for finding the length of a sequence and for finding its largest and smallest elements.

Python has seven built-in types of sequences (strings, bytes, lists, tuples, bytearrays, buffers, and range objects); the most common one is lists, which we will discuss now.

Lists

The list is the most versatile data-type available in Python which can be written as a list of comma-separated values (items) between square brackets. Items in a list need not all have the same type. Creating a list is as simple as listing different comma-separated values between square brackets.

list1 = ['ATG', 'TCA', 23, 12]
list2 = [1, 2, 3, 4, 5 ]
list3 = ["a", "b", "c", "d", 'pqr', 12.345]

Accesing values in Lists

To access values in lists, use square brackets for slicing along with the index or indices to obtain value available at that index.

list1 = ['ATG', 'TCA', 23, 12]    # create a list 
print("list1[0] -", list1[0])     # print the first element in the list

list2 = [1, 2, 3, 4, 5 ]          # create a list 
print("list2[1:5] -", list2[1:5]) # print elements from 2 to 6
list1[0] - ATG
list2[1:5] - [2, 3, 4, 5]

A few other examples of indexing and slicing:

list1 = ['Adenine', 'Cytosine', 'Guanine', 'Thymine']
print(list1[2])
print(list1[-3])
print(list1[2:])
print(list1[:-2])
Guanine
Cytosine
['Guanine', 'Thymine']
['Adenine', 'Cytosine']

Updating Lists

You can update single or multiple elements of lists by giving the slice on the left-hand side of the assignment operator. This will access single or multiple elements as mentioned above. Then, provide the new values that you need to change on the right-hand side of the assignment operator. Make sure the number of accessed elements are the same as the number of assigning (new) elements.

list1 = ['ATG', 'TCA', 23, 12]              # create a list
print("Value at index 3 : ", list1[3])      # print the 4th element

list1[3] = 'GGC'                            # update the 4th element
print("New value at index 3 : ", list1[3])  # print the 4th element
Value at index 3 :  12
New value at index 3 :  GGC

Additionally, you can add elements to the end of a list (even an empty list) with the append() function.

list1.append('CCG')                         # insert element at the end of the list
print(list1)                                # print the list
['ATG', 'TCA', 23, 'GGC', 'CCG']

Deleting List elements

To remove a list element, you can use either the del statement if you know exactly which element(s) you are deleting.

print("List of 5 elements =", list1)          # print list1 
del list1[2]                                  # delete element based on its index (3rd element)
print("After deleting 2nd element =", list1)  # print list1 
List of 5 elements = ['ATG', 'TCA', 23, 'GGC', 'CCG']
After deleting 2nd element = ['ATG', 'TCA', 'GGC', 'CCG']

The remove() method of a list object can also be used to delete the element based on the value.

list1.remove('TCA')                           # delete element based on its value ('TCA')
print("After removing TCA element =",list1)   # print list1 
After removing TCA element = ['ATG', 'GGC', 'CCG']

Alternatively, you can use del statement after using the index() function to find the index of the element based on its value:

indx = list1.index('GGC')                     # get index of element 'GGC'
del list1[indx]                               # delete element based on its index 
print("After deleting GGC element =", list1)  # print list1 
After deleting GGC element = ['ATG', 'CCG']

Other List operations

Lists respond to the + and * operators (much like strings, discussed next), where ‘+’ means concatenation and ‘*’ means repetition, and the result is a new list. In fact, lists respond to all general sequence operations.

list1 = [1, 2, 3]
print("Length of the list =", len(list1))     # length
Length of the list = 3
list2 = [4, 5, 6]
print("Concatenated list =", list1 + list2)   # concatenation
Concatenated list = [1, 2, 3, 4, 5, 6]
print("Repeating list elements =", list1 * 3) # repetition 
Repeating list elements = [1, 2, 3, 1, 2, 3, 1, 2, 3]
print("Is 3 a member of list1?", 3 in list1)  # membership
Is 3 a member of list1? True
for x in list1:                               # iteration (discussed in detail later)
  print(x, end=' ')
1 2 3 

Strings

Strings are amongst the most popular types in Python. We can create them simply by enclosing characters in quotes. Python treats single quotes (’‘) the same as double quotes (““). That is, ‘aaa’ and”aaa” are the same. A string can also be triple quoted, either with three single quotes, as’‘’aaa’’’, or three double quotes, as “““aaa”““.

str1 = 'This is a string'
str2 = "This is also a string"
str3 = """This is a string that extends 
over multiple lines"""
print(str1, str2, str3, sep='\n')
This is a string
This is also a string
This is a string that extends 
over multiple lines

Strings can be concatenated (glued together) with the + operator, and repeated with * (similar to lists). This is another way to create new strings.

words = 'This' + 'is' + 'concatenation' # concatenation
print("Concatenation =", words)
print("Repetition =", 'ACG' * 3)        # repetition
print("Length =", len(words))           # length
print("Membership =", "is" in words)    # membership
Concatenation = Thisisconcatenation
Repetition = ACGACGACG
Length = 19
Membership = True
for x in words:      # iteration (discussed in detail later)
  print(x, end='|')
T|h|i|s|i|s|c|o|n|c|a|t|e|n|a|t|i|o|n|

Python does not support a character type; these are treated as strings of length one, thus also considered a substring. Individual elements can be accessed with an index. Substrings can be specified with the slice notation: two indices separated by a colon.

Strings can be accessed and manipulated using similar operations we introduced above for lists.

print(words)
print(words[4])
print(words[0:6])
print(words[6:])
print(words[-15:])
Thisisconcatenation
i
Thisis
concatenation
isconcatenation
text = "ATGTCATTTGT"
text[0:2] = "CCC"
TypeError: 'str' object does not support item assignment

To change a value in a string, replace() function can be used.

long_text = """Betty bought some butter. 
But the butter was bitter. 
So, betty baught more butter to make bitter butter better"""
print("Replaced text = ", long_text.replace("butter", "egg"))
Replaced text =  Betty bought some egg. 
But the egg was bitter. 
So, betty baught more egg to make bitter egg better

The in operator lets you check if a substring is contained within a larger string, but it does not tell you where the substring is located. This is often useful to know and python provides the .find() method which returns the index of the first occurrence of the search string, and the .rfind() method to start searching from the end of the string. If the search string is not found in the string both these methods return -1.

dna = "ATGTCACCGTTTGGC"
print("TCA is at position:", dna.find("TCA"))
print("The last Cytosine is at position:", dna.rfind('C'))
print("Number of Adenines:", dna.count("A"))
TCA is at position: 3
The last Cytosine is at position: 14
Number of Adenines: 2

When we read text from files (which we will see in the next workshop), often there is unwanted whitespace at the start or end of the string. We can remove leading whitespace with the .lstrip() method, trailing whitespace with .rstrip(), and whitespace from both ends with .strip().

All of these methods return a copy of the changed string, so if you want to replace the original you can assign the result of the method call to the original variable.

string = "           This is a string with leading and trailing spaces             "
print('|', string, '|')
print('|', string.lstrip(), '|')
print('|', string.rstrip(), '|')
print('|', string.strip(), '|')
|            This is a string with leading and trailing spaces              |
| This is a string with leading and trailing spaces              |
|            This is a string with leading and trailing spaces |
| This is a string with leading and trailing spaces |

You can split a string into a list of substrings using the .split() method, supplying the delimiter as an argument to the method. If you don’t supply any delimiter the method will split the string on whitespace by default (which is very often what you want!)

seq = "ATG TCA CCG GGC"
codons = seq.split(" ")
print(codons)
['ATG', 'TCA', 'CCG', 'GGC']

To split a string into its component characters you can simply cast the string to a list:

bases = list(seq)
print(bases)
['A', 'T', 'G', ' ', 'T', 'C', 'A', ' ', 'C', 'C', 'G', ' ', 'G', 'G', 'C']

.split() is the counterpart to the .join() method that lets you join the elements of a list into a string only if all the elements are of type String.

print(codons)
print("|".join(codons))
['ATG', 'TCA', 'CCG', 'GGC']
ATG|TCA|CCG|GGC

We also saw earlier that the + operator lets you concatenate strings together into a larger string. Note that this operator only works on variables of the same type. If you want to concatenate a string with an integer (or some other type), first you have to cast the integer to a string with the str() function.

s = "chr"
chrom_number = 2
print(s + str(chrom_number))
chr2

Dictionary

Sometimes we want to access data by some useful name rather than an index. For example, as a result of some experiment we may have a set of genes and corresponding expression values. We could put the expression values in a list, but then we’d have to remember which index in the list correspond to which gene and this would quickly get complicated. For these situations a dictionary is a very useful data structure.

Dictionaries contain a mapping of keys to values (like a word and its corresponding definition in a dictionary). The keys of a dictionary are unique (i.e. they cannot repeat). Dictionaries do not store data in any particular order.

dna = {"A": "Adenine", "C": "Cytosine", "G": "Guanine", "T": "Thymine"}
print(dna)
{'A': 'Adenine', 'C': 'Cytosine', 'G': 'Guanine', 'T': 'Thymine'}

You can access values in a dictionary using the key inside square brackets.

print("A represents", dna["A"])
print("G represents", dna["G"])
A represents Adenine
G represents Guanine

An error is triggered if a key is absent from the dictionary.

print("N represents", dna["N"])
KeyError: 'N'

You can access values safely with the get method, which gives back None if the key is absent and you can also supply a default values.

print("N represents", dna.get("N"))
print("N represents (with a default value)", dna.get("N", "unknown"))
N represents None
N represents (with a default value) unknown

Examples of some operators used with dictionaries.

dna = {"A": "Adenine", "C": "Cytosine", "G": "Guanine", "T": "Thymine"}

# check if a key is in/not in a dictionary
print("G" in dna)
print("Y" not in dna)
True
True
# length of a dictionary
print(len(dna))
4
print(dna)
# assign new values to a dictionary
dna['Y'] = 'Pyrimidine'
print(dna)
{'A': 'Adenine', 'C': 'Cytosine', 'G': 'Guanine', 'T': 'Thymine'}
{'A': 'Adenine', 'C': 'Cytosine', 'G': 'Guanine', 'T': 'Thymine', 'Y': 'Pyrimidine'}
# change value of an existing key
dna['Y'] = 'Cytosine or Thymine'
print(dna)
{'A': 'Adenine', 'C': 'Cytosine', 'G': 'Guanine', 'T': 'Thymine', 'Y': 'Cytosine or Thymine'}
# list all the keys
print(list(dna.keys()))
# list all values
print(list(dna.values()))
# list all key value pairs
print(list(dna.items()))
['A', 'C', 'G', 'T', 'Y']
['Adenine', 'Cytosine', 'Guanine', 'Thymine', 'Cytosine or Thymine']
[('A', 'Adenine'), ('C', 'Cytosine'), ('G', 'Guanine'), ('T', 'Thymine'), ('Y', 'Cytosine or Thymine')]

Operators and Expressions

Python language supports the following types of operators.

  • Arithmetic operators
  • Comparison (i.e., relational) operators
  • Assignment operators
  • Bitwise operators
  • Logical operators
  • Membership operators
  • Identity operators

Let’s look at some of these types one by one.

Python Arithmetic Operators

Operator Description
+ Addition - Adds values on either side of the operator
- Subtraction - Subtracts right hand operand from left hand operand
* Multiplication - Multiplies values on either side of the operator
/ Division - Divides left hand operand by right hand operand
% Modulus - Divides left hand operand by right hand operand and returns remainder
** Exponent - Performs exponential (power) calculation on operators
// Floor (or integer) division - Division such that the fractional part of the result is removed, and only the integer part remains.

Python Comparison Operators

Operator Description
== Checks if the value of two operands are equal; if yes then condition becomes true.
!= Checks if the value of two operands are not equal; if values are not equal then condition becomes true.
<> Checks if the value of two operands are not equal. This is similar to the != operator.
> Checks if the value of left operand is greater than the value of right operand.
< Checks if the value of left operand is less than the value of right operand.
>= Checks if the value of left operand is greater than or equal to the value of right operand.
<= Checks if the value of left operand is less than or equal to the value of right operand.

Python Assignment Operators

Operator Description
= Simple assignment operator, assigns values from right side operands to left side operand
+= Add AND assignment operator, it adds right operand to the left operand and assign the result to left operand (Ex: i += 1 is same as i = i + 1)

Similar descriptions follow for the remaining arithmetic operators (i.e., -=, *=, /=, %=, **=, //=)

Python Logical Operators

Operator Description
and Logical AND operator - If both the operands are true then condition becomes true.
or Logical OR Operator - If any of the two operands is true (non zero) then condition becomes true.
not Logical NOT Operator - Reverses the logical state of its operand. If an expression is true then Logical NOT of that is false.

Python Membership Operators

Python has membership operators, which test for membership in a sequence, such as strings, lists, or tuples. There are two membership operators.

Operator Description
in Evaluates to true if it finds a variable in the specified sequence and false otherwise.
not in Evaluates to true if it does not finds a variable in the specified sequence and false otherwise.

Python Identity Operators

Operator Description
is Evaluates to true if the variables on either side of the operator point to the same object and false otherwise.
is not Evaluates to false if the variables on either side of the operator point to the same object and true otherwise.

Operator Precedence in Python

The following table lists all operators we discussed in this Chapter, from highest precedence to lowest.

Operator Description
** Exponentiation
~, +, - Complement, unary plus and minus (method names for the last two are +@ and -@)
*, /, %, // Multiply, divide, modulo and floor division
+, - Addition and subtraction
>>, << Right and left bitwise shift
& Bitwise 'AND'
^, | Bitwise exclusive `OR' and regular `OR'
<=, <,>, >= Comparison operators
<>, ==, != Equality operators
= , %=, /=, //=, -=, +=, *=, **= Assignment operators
is, is not Identity operators
in, not in Membership operators
not, or, and Logical operators

Control Structures in Python

In a program, control flow (or flow of control) refers to the order in which individual statements of the program are executed. Similarly, control flow in an algorithm is the order in which individual steps of the algorithm are executed.

So far, we have considered sequential control flow, i.e., statements getting executed from top to bottom, in the order they appear in the program. The sequential flow of control is the default behavior. However, we often need to alter this flow when we write programs, because the problems we can solve with sequential control flow alone are limited to simple (or, as one might say, trivial) problems. In other words, there are many problems that cannot be solved with the sequential control flow alone.

Many problems that we encounter are complex enough that they require programs with enhanced control flows. For this, most programming languages provide at least three control structures for altering the default sequential flow. These control structures are known as selection, loop, and subprogram. Together with the default sequential flow, we have four control structures for specifying the control flow as shown below.

Selection Control Structure

if structure

The if structure in Python is similar to that of other languages. It contains an expression followed by a set of statements to be executed if the expression is evaluated as true.

if expression:
  statement_1
  statement_2
  ...
  statement_n

Note that, in Python, all statements indented by the same number of character spaces after a programming construct are considered to be part of a single block of code. Python uses indentation as its method of grouping statements.

if ... else structure

To implement the selection control structure shown in subfigure (b) above with both blocks A and B specified, the else keyword can be combined with the if keyword. The else keyword is followed by the code that gets executed if the if-body does not get executed (i.e., conditional expression is not evaluated to true).

The else part is optional and there could be at most one else part following an if part. Further, an else part cannot exist alone; it must be paired with an if part.

if expression:
  statement(s)
else:
  statement(s)

Multi-way Selection with the elif Keyword

The elif keyword (meaning “else-if”) allows us to implement multi-way selection, going beyond the two-way selection in the if-else structure. This means, we can select one block of code for execution from among many (> 2). For this, we need to specify multiple conditional expressions for truth value and execute a block of code as soon as the corresponding expression evaluates to true.

An elif part is optional and there can be an arbitrary number of elif parts following an if part.

if expression_1:
  statement(s)
elif expression_2:
  statement(s)  
elif expression_3:
  statement(s)  
...
else:
  statement(s)

The if...elif structure is a substitute for the “switch-case” structure in some other languages such as C.

Loop Control Structure

Python provides two loop structures: the for loop and the while loop. We can also have nested loops.

The for loop

The for loop construct is used to repeat a statement or block of statements specified number of times. The for loop can also iterate over the items of any sequence (a list or a string), in the order that they appear in the sequence.

for iterating_var in sequence:
   statements(s)

The block of statements executed repeatedly is called the loop body. The loop body is indented.

If the sequence contains an expression list, it is evaluated first. Then, the first item in the sequence is assigned to the iterating variable iterating_var and the loop body is executed. This concludes one iteration of the loop. Next the second iteration of the loop body is executed after the second item is assigned to the iterating variable iterating_var. Similarly, the loop body is executed repeatedly, with a unique item in the list assigned to iterating_var in each iteration, until the entire sequence is exhausted.

The range() function: If we do need to iterate over a sequence of numbers, the built-in function range() comes in handy. It generates lists containing arithmetic progressions. Implementation of range() is as either range(stop) or range(start, stop[, step]). Here are four examples.

for i in range(10):
    print(i, end=' ')
0 1 2 3 4 5 6 7 8 9 
for i in range(5, 10):
    print(i, end=' ')
5 6 7 8 9 
for i in range(0, 10, 3):
    print(i, end=' ')
0 3 6 9 
for i in range(-10, -100, -30):
    print(i, end=' ')
-10 -40 -70 

To iterate over the indices of a list or sequence using a for loop, you can combine range() and len() functions as follows:

list_a = ['John', 'had', 'a', 'little', 'puppy']
# using range and len functions
for i in range(len(list_a)):
  print(i, list_a[i])
0 John
1 had
2 a
3 little
4 puppy

Or using enumerate() function:

# using enumerate function
for elem in enumerate(list_a):
  print(elem)
(0, 'John')
(1, 'had')
(2, 'a')
(3, 'little')
(4, 'puppy')

The while loop

A while loop in Python repeatedly executes the loop body as long as a given condition is true. The condition is specified by an expression.

while expression:
  statement(s)

The block of statements executed repeatedly is the loop body, which is indented, as in the for loop.

The condition to execute the loop body is considered true if the expression is true or it is any non-zero value. The loop iterates while the condition is true. When the condition becomes false, program control passes to the line immediately following the loop body.

Note that the while loop might not ever run. When the condition is tested and the result is false, the loop body will be skipped and the first statement after the while loop will be executed.

The break keyword

The break keyword is used inside a loop and is used for terminating the current iteration of the loop body immediately; i.e., to break out of the smallest enclosing for or while loop. The control will be transferred to the first statement following the loop body. If you are inside the inner loop of a nested loop, then the break statement inside that inner loop transfers the control to the immediate outer loop. The break statement can be used to terminate an infinite loop or to force a loop to end before its normal termination.

n = 10;
for var in range(0, n):
    print(var)
    if (var == 5):
        print("Countdown Aborted")
        break;
0
1
2
3
4
5
Countdown Aborted

The continue keyword

The continue keyword inside a loop causes the program to skip the rest of the loop body in the current iteration, causing it to continue with the next iteration of the loop.

for i in range(-2,3):
  if i == 0 :
      continue
  print("5 divided by ", i, " is: ", (5.0/i))
5 divided by  -2  is:  -2.5
5 divided by  -1  is:  -5.0
5 divided by  1  is:  5.0
5 divided by  2  is:  2.5

Functions

A function is a block of organized, reusable code that is used to perform a single task. Functions are the subprogram control structure in Python. Functions provide better modularity for our programs and a high degree of code reuse.

As you already know, Python gives you many built-in functions like print(), etc. But you can also create your own functions which are called user-defined functions.

def function_name( parameters ): 
  function_suite
return [expression]

By default, parameters have a positional behavior; thus when invoking (calling) the function you need to list them in the same order that they were defined. Defining a function only gives it a name, specifies the parameters that are to be included in the function and structures the blocks of code. Once the function is defined, you can execute it by calling it from your (main) program, another function or directly from the Python prompt.

In the following example, we define and call the readDataset() function.

# Function definition to read the cms_hospital_patient_satisfaction_2016_sampled.csv file
# This function does not require any parameters 
def readDataset():
  with open('data/patient_satisfaction/cms_hospital_patient_satisfaction.csv') as f:
    cms = f.read().splitlines()
  return cms

# Now you can call readDataset function
cms = readDataset()
print(cms)
['ID,Facility Name,County,Hospital Type,Star Rating,No of Surveys,Response Rate,Overall Rating', '050424,SCRIPPS GREEN HOSPITAL,SAN DIEGO,Acute Care Hospital,4,3110,41,5', '140103,ST BERNARD HOSPITAL,COOK,Acute Care Hospital,1,264,6,2', '100051,SOUTH LAKE HOSPITAL,LAKE,Acute Care Hospital,2,1382,20,2', '040062,MERCY HOSPITAL FORT SMITH,SEBASTIAN,Acute Care Hospital,3,2506,35,3', '440048,BAPTIST MEMORIAL HOSPITAL,SHELBY,Acute Care Hospital,2,1799,18,2', '450011,ST JOSEPH REGIONAL HEALTH CENTER,BRAZOS,Acute Care Hospital,3,1379,24,3', '151317,GREENE COUNTY GENERAL HOSPITAL,GREENE,Critical Access Hospital,3,114,22,3', '061327,SOUTHWEST MEMORIAL HOSPITAL,MONTEZUMA,Critical Access Hospital,4,247,34,3', '490057,SENTARA GENERAL HOSPITAL,VIRGINIA BEACH,Acute Care Hospital,4,619,32,3', '110215,PIEDMONT FAYETTE HOSPITAL,FAYETTE,Acute Care Hospital,2,1714,21,2', '050704,MISSION COMMUNITY HOSPITAL,LOS ANGELES,Acute Care Hospital,3,241,14,3', '100296,DOCTORS HOSPITAL,MIAMI-DADE,Acute Care Hospital,4,393,24,3', '440003,SUMNER REGIONAL MEDICAL CENTER,SUMNER,Acute Care Hospital,4,680,35,2', '501339,WHIDBEY GENERAL HOSPITAL,ISLAND,Critical Access Hospital,3,389,29,3', '050116,NORTHRIDGE MEDICAL CENTER,LOS ANGELES,Acute Care Hospital,3,1110,20,2']

In the following example, we define two functions printHead() and printTail() to print the top 5 and bottom 5 rows of a list. Note that in this example, the return is optional (the program will work even without the return).

# function definition to print the top 5 elements in a list
def printHead(inp_list):
  for i in range(5):
    print(inp_list[i])

# function definition to print the bottom 5 elements in a list
def printTail(inp_list):
  for i in range(len(inp_list)-5, len(inp_list)):
    print(inp_list[i])
# function call to printHead with melanoma dataset as an input parameter to the function
printHead(cms)
ID,Facility Name,County,Hospital Type,Star Rating,No of Surveys,Response Rate,Overall Rating
050424,SCRIPPS GREEN HOSPITAL,SAN DIEGO,Acute Care Hospital,4,3110,41,5
140103,ST BERNARD HOSPITAL,COOK,Acute Care Hospital,1,264,6,2
100051,SOUTH LAKE HOSPITAL,LAKE,Acute Care Hospital,2,1382,20,2
040062,MERCY HOSPITAL FORT SMITH,SEBASTIAN,Acute Care Hospital,3,2506,35,3
# function call to printTail with melanoma dataset as an input parameter to the function
printTail(cms)
050704,MISSION COMMUNITY HOSPITAL,LOS ANGELES,Acute Care Hospital,3,241,14,3
100296,DOCTORS HOSPITAL,MIAMI-DADE,Acute Care Hospital,4,393,24,3
440003,SUMNER REGIONAL MEDICAL CENTER,SUMNER,Acute Care Hospital,4,680,35,2
501339,WHIDBEY GENERAL HOSPITAL,ISLAND,Critical Access Hospital,3,389,29,3
050116,NORTHRIDGE MEDICAL CENTER,LOS ANGELES,Acute Care Hospital,3,1110,20,2

Back to top