Supplements

The time given was not sufficient to cover all important issues.

What is missing from the preceding parts of this course, fast and in brief:

– Please loosen up this talk with many comments and questions! –

Function definitions with »`def`«

In the console, an empty line is required after the definition. This empty line is not required when the text is written into a text file.

console transcript

def hello_world(): print( "Hello, world!" )

hello_world()

Hello, world!

console transcript

def hello_world(): 
    print( "Hello, world!" )

hello_world()

Hello, world!

Returning values

console transcript

def hello_world():
    return "Hello, world!"

print( hello_world() + hello_world() )

Hello, world!Hello, world!

Parameters

console transcript

def hello( what ):
    print( f"Hello {what}!" )

hello( "you" )

Hello, you!

console transcript

def next( n ):
    return n + 1

print( next( 7 ))

More than one parameter

console transcript

def write( x, y ):
    print( f"{x} {y}" )

write( 2, 3 )

2 3

Default values

console transcript

def write( x, y=0 ):
    print( f"{x} {y}" )

write( 2 )

2 0

The if-else expression

The central expression (»t > 20«) is evaluated and converted to ›bool‹. The left (right) expression is evaluated if it is true (false), and the result of that evaluation becomes the value of the whole if expression.

console transcript

def judge( t ):
    print( f"It's {'ok' if t > 20 else 'too cold'}." )

judge( 18 )

It's too cold.

judge( 22 )

It's ok.

The if statement

console transcript

def judge( t ):
    if t <= 20:
        print( f"It's too cold." )

judge( 18 )

It's too cold.

judge( 22 )

(no output)

console transcript

def judge( t ):
    if t <= 20:
        print( f"It's too cold." )
    elif 20 < t <= 24:
        print( f"It's ok." )

judge( 18 )

It's too cold.

judge( 22 )

It's ok.

judge( 30 )

(no output)

console transcript

def judge( t ):
    if t <= 20:
        print( f"It's too cold." )
    elif 20 < t <= 24:
        print( f"It's ok." )
    else:
        print( f"It's too hot." )

judge( 18 )

It's too cold.

judge( 22 )

It's ok.

judge( 30 )

It's too hot.

The while-loop statement

console transcript

from random import random
def times_for( x ):
    looping = True
    count = 0
    while looping:
        r = random()
        count += 1
        if r >= x:
            looping = False
    return count

print( times_for( 0.5 ))

print( times_for( 0.99999 ))

The break statement

console transcript

from random import random
def times_for( x ):
    count = 0
    while True:
        r = random()
        count += 1
        if r >= x:
            break
    return count

print( times_for( 0.5 ))

print( times_for( 0.99999 ))

Exceptions
The for loop
The comma operator

Tuples can be written using the comma operator. ₍pəˈrɛnθəˌsiz₎ Parentheses are not always required, but in case of doubt, you should add them.

console transcript

()

()

type( _ )

<class 'tuple'>

console transcript

1,

(1,)

type( _ )

<class 'tuple'>

console transcript

1, 2

(1, 2)

type( _ )

<class 'tuple'>

console transcript

sorted( 3, 1, 4, 1 )

TypeError: sorted expected 1 argument, got 4

sorted( ( 3, 1, 4, 1 ))

[1, 1, 3, 4]

Expressing a tuple with four zeros and then two ones.

console transcript

4 *( 0, )+ 2 *( 1, )

(0, 0, 0, 0, 1, 1)

List literals

Lists can be written using brackets and commas.

console transcript

[]

[]

type( _ )

<class 'list'>

console transcript

[ 1 ]

[1]

type( _ )

<class 'list'>

console transcript

[ 1, 2 ]

[1, 2]

type( _ )

<class 'list'>

Expressing a list with four zeros and then two ones.

console transcript

4 *[ 0 ]+ 2 *[ 1 ]

[0, 0, 0, 0, 1, 1]

Unpacking

Tuple notation can be used on the left or the right of an assignment operator.

console transcript

p = 1, 2
print( p )

(1, 2)

x, y = p
print( x )

print( y )

console transcript

a, b = 7, 4
print( a )

print( b )

a, b = b, a
print( a )

print( b )

console transcript

l = enumerate( ( 'Adam', 'Baker', 'Charlie' ))
for i, s in l: print( i, s )

0 Adam

1 Baker

2 Charlie

Attribute expressions

»a.b « means „the object of the attribute b of the object a «. Examples below.

Using module attributes

From now on, we prefer to use attribte notation for the attributes of a module.

console transcript

import math
print( math.floor( 2.3 ))

Methods

Functions which are attributes of objects often are called methods.

For example, str-strings have a method »split«. Examples below.

`str`-Methods

evaluation

help( str )

Help on class str in module builtins:
…

evaluation

tuple( filter( lambda s: '_' not in s, dir( "example" )))

('capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'index', 'isalnum', 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill')

evaluation

"Adam, Baker, Charlie".split( ", " )

['Adam', 'Baker', 'Charlie']

evaluation

", ".join( ['Adam', 'Baker', 'Charlie'] )

'Adam, Baker, Charlie'

evaluation

"Adam, Baker, Charlie".replace( ", ", " - " )

'Adam - Baker - Charlie'

evaluation

"Adam, Baker, Charlie".startswith( "Charlie" )

False

evaluation

"Adam, Baker, Charlie".endswith( "Charlie" )

True

evaluation

"Adam, Baker, Charlie".find( "Baker" )

evaluation

"Adam, Baker, Charlie".index( "Baker" )

evaluation

"    Adam, Baker, Charlie    ".strip()

'Adam, Baker, Charlie'

evaluation

"Adam, Baker, Charlie".count( "Ch" )

Counts non-overlapping matches!

evaluation

"BBBBB".count( "BBB" )

`tuple`-Methods

evaluation

help( () )

Help on tuple object:
…

evaluation

help( tuple )

Help on class tuple in module builtins:
…

evaluation

tuple( filter( lambda s: '_' not in s, dir( () )))

('count', 'index')

( 0, 2, 4, 6, 8 ).index( 4 )

`list`-Methods

evaluation

help( [] )

Help on list object:
…

evaluation

help( list )

Help on class list in module builtins:
…

evaluation

tuple( filter( lambda s: '_' not in s, dir( [] )))

('append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort')

evaluation

[ 0, 2, 4, 6, 8 ].index( 4 )

console transscript

l = [ 0, 2, 4, 6, 8 ]
l.append( 10 )
print( l )

[0, 2, 4, 6, 8, 10]

Lists are mutable. Tuples are not. So this is the difference between Lists and Tuples! Above the list object »l« was modified using »append«.

Many list methods are mutators, changing their list.

console transscript

print( l.pop() )

print( l )

[0, 2, 4, 6, 8]

console transscript

l.reverse()
print( l )

[8, 6, 4, 2, 0]

console transscript

l.sort()
print( l )

[0, 2, 4, 6, 8]

The alias effect

console transscript

l =[ 0, 2, 4, 6, 8 ]
s = l
l.append( 10 )
print( s )

[0, 2, 4, 6, 8, 10]

Values versus effects

Values are ~~computed~~expressed, no object is ~~harmed~~modified.

console transscript

tuple( reversed( ( 0, 1, 2 )))

(2, 1, 0)

tuple( sorted( ( 2, 0, 1 )))

(0, 1, 2)

list( reversed( [ 0, 1, 2 ]))

[2, 1, 0]

list( sorted( [ 2, 0, 1 ]))

[0, 1, 2]

“destructive operations”, objects are being modified “in place”.

console transscript

l =[ 0, 1, 2 ]
l.reverse()
print( l )

[2, 1, 0]

l.sort()
print( l )

[0, 1, 2]

Exercises

/ Reversing

Given the following string called “source ”, use Python to print the string “result ” (as given below) where the order of the quoted words is reversed. (For example, the quoted word »"PLEASED"« was first, but should be last.)

The Python script to be written should contain the string “source ” and then use means of the language to print the string “result ”. The string “result ” should not be a part of the source text of this Python script.

the string “source ”

"PLEASED" "NICE" "REACHING" "LINKED" "SMOOTH" "TALKED" "THROWN" "POSSESS" "EATING" "FRIENDLY" "REJECTED" "FAULT" "DENIED" "HABITS" "ROUGH" "SORRY" "DISORDER" "AWARENESS" "WORST" "LIKED" "INTENSE" "AMONGST" "SELDOM" "NOBODY" "DREAMS" "GUESS" "MEANWHILE" "ACTED" "ACCEPTABLE" "SOMEWHERE" "SPEAKS" "CAUSING" "HELPING" "WIDTH" "TINY" "DROVE" "TIRED" "MEAL" "SOLE" "DENY" "THREW" "ALIKE" "MAGIC" "DESK" "DESIRES" "ENDING" "ANYWAY" "WHEREVER" "INTENT" "MADAME" "FLOOD" "APPRECIATE" "HI" "SIZES" "SUBTLE" "TIDE" "DIARY" "NIGHTS" "ANYBODY" "GUEST" "PROSE" "LAUGHTER" "CRUEL" "RHYTHM" "VAGUE" "UTMOST" "FOOL" "IGNORE"

the string “result ”

"IGNORE" "FOOL" "UTMOST" "VAGUE" "RHYTHM" "CRUEL" "LAUGHTER" "PROSE" "GUEST" "ANYBODY" "NIGHTS" "DIARY" "TIDE" "SUBTLE" "SIZES" "HI" "APPRECIATE" "FLOOD" "MADAME" "INTENT" "WHEREVER" "ANYWAY" "ENDING" "DESIRES" "DESK" "MAGIC" "ALIKE" "THREW" "DENY" "SOLE" "MEAL" "TIRED" "DROVE" "TINY" "WIDTH" "HELPING" "CAUSING" "SPEAKS" "SOMEWHERE" "ACCEPTABLE" "ACTED" "MEANWHILE" "GUESS" "DREAMS" "NOBODY" "SELDOM" "AMONGST" "INTENSE" "LIKED" "WORST" "AWARENESS" "DISORDER" "SORRY" "ROUGH" "HABITS" "DENIED" "FAULT" "REJECTED" "FRIENDLY" "EATING" "POSSESS" "THROWN" "TALKED" "SMOOTH" "LINKED" "REACHING" "NICE" "PLEASED"

/ Sorting

Given the following string called “source ”, use Python to print the string “result ” (as given below) where the quoted words are sorted lexicographically.

the string “source ”

"PLEASED" "NICE" "REACHING" "LINKED" "SMOOTH" "TALKED" "THROWN" "POSSESS" "EATING" "FRIENDLY" "REJECTED" "FAULT" "DENIED" "HABITS" "ROUGH" "SORRY" "DISORDER" "AWARENESS" "WORST" "LIKED" "INTENSE" "AMONGST" "SELDOM" "NOBODY" "DREAMS" "GUESS" "MEANWHILE" "ACTED" "ACCEPTABLE" "SOMEWHERE" "SPEAKS" "CAUSING" "HELPING" "WIDTH" "TINY" "DROVE" "TIRED" "MEAL" "SOLE" "DENY" "THREW" "ALIKE" "MAGIC" "DESK" "DESIRES" "ENDING" "ANYWAY" "WHEREVER" "INTENT" "MADAME" "FLOOD" "APPRECIATE" "HI" "SIZES" "SUBTLE" "TIDE" "DIARY" "NIGHTS" "ANYBODY" "GUEST" "PROSE" "LAUGHTER" "CRUEL" "RHYTHM" "VAGUE" "UTMOST" "FOOL" "IGNORE"

the string “result ”

"ACCEPTABLE" "ACTED" "ALIKE" "AMONGST" "ANYBODY" "ANYWAY" "APPRECIATE" "AWARENESS" "CAUSING" "CRUEL" "DENIED" "DENY" "DESIRES" "DESK" "DIARY" "DISORDER" "DREAMS" "DROVE" "EATING" "ENDING" "FAULT" "FLOOD" "FOOL" "FRIENDLY" "GUESS" "GUEST" "HABITS" "HELPING" "HI" "IGNORE" "INTENSE" "INTENT" "LAUGHTER" "LIKED" "LINKED" "MADAME" "MAGIC" "MEAL" "MEANWHILE" "NICE" "NIGHTS" "NOBODY" "PLEASED" "POSSESS" "PROSE" "REACHING" "REJECTED" "RHYTHM" "ROUGH" "SELDOM" "SIZES" "SMOOTH" "SOLE" "SOMEWHERE" "SORRY" "SPEAKS" "SUBTLE" "TALKED" "THREW" "THROWN" "TIDE" "TINY" "TIRED" "UTMOST" "VAGUE" "WHEREVER" "WIDTH" "WORST"

Item notation

console transscript

l =[ 0, 2, 4, 6, 8 ]
print( l[ 2 ])

Slice notation

console transscript

l =[ 0, 2, 4, 6, 8 ]
print( l[ 2: 4 ])

[4, 6]

print( l[ 2: ])

[4, 6, 8]

print( l[ :2 ])

[0, 2]

print( l[:] )

[0, 2, 4, 6, 8]

Remember not to write »2, 4«, but »2: 4«!

console transscript

l =[ 0, 2, 4, 6, 8 ]
print( l[ 2, 4 ])

TypeError: list indices must be integers or slices, not tuple

print( l[ 2: 4 ])

[4, 6]

console transscript

copy

l =[ 0, 2, 4, 6, 8 ]
s = l[:]
l.append( 10 )
print( s )

[0, 2, 4, 6, 8]

Assignments to items and to slices

console transscript

l =[ 0, 2, 4, 6, 8 ]
l[ 2 ]= 7
print( l )

[0, 2, 7, 6, 8]

console transscript

l =[ 0, 2, 4, 6, 8 ]
l[ 2, 4 ]= []
print( l )

[0, 2, 8]

Counters

Counting with lists:

console transscript or script file

l =[ 0 ]* 1000
l[ 100 ] += 1
l[ 100 ] += 1
l[ 150 ] += 1
print( tuple( filter( lambda x: x, l )))
print( tuple( filter( lambda x: x[ 1 ], enumerate( l ))))
((100, 2), (150, 1))

Counting with a special Counter class:

console transscript or script file

import collections
c = collections.Counter()
c[ 100 ] += 1; print( c ) # Counter({100: 1})
c[ 100 ] += 1; print( c ) # Counter({100: 2})
c[ 150 ] += 1; print( c ) # Counter({100: 2, 150: 1})

You do not have to initialize the counter with zeros!

You can also count letters or words!

console transscript or script file

import collections
c = collections.Counter()
c[ "a"     ] += 1; print( c ) # Counter({'a': 1})
c[ "a"     ] += 1; print( c ) # Counter({'a': 2})
c[ "Baker" ] += 1; print( c ) # Counter({'a': 2, 'Baker': 1})
print( c[ "a" ])              # 2

There actually is a more terse way.

console transscript

import collections
print( collections.Counter( ( "a", "a", "Baker" )))

Counter({'a': 2, 'Baker': 1})

console transscript

import collections

print( collections.Counter( "beispielsweise" ))
Counter({'e': 4, 'i': 3, 's': 3, 'b': 1, 'p': 1, 'l': 1, 'w': 1})

console transscript

import collections
print( collections.Counter( "Adam, Baker, Charlie, Baker, Adam, Adam, Adam".split( ", " )))

Counter({'Adam': 4, 'Baker': 2, 'Charlie': 1})

Dictionaries

Dictionaries allow arbitrary mappings.

console transscript

d = dict()
d[ "Adam" ]= "Baker"
print( d[ "Adam" ])

Baker

console transscript

d = { 'David': [], (): 'Henry' }
d[ 'David' ]

[]

d[ () ]

'Henry'

console transcript

d ={ "zero": 0, "one": 1, "two": 2, "three": 3 }
d[ "zero" ]

console transcript

d ={ "zero": 0, "one": 1, "two": 2, "three": 3 }
tuple( d )

('zero', 'one', 'two', 'three')

console transcript

values

d ={ "zero": 0, "one": 1, "two": 2, "three": 3 }
d.values()

dict_values([0, 1, 2, 3])

console transcript

items

d ={ "zero": 0, "one": 1, "two": 2, "three": 3 }
d.items()

dict_items([('zero', 0), ('one', 1), ('two', 2), ('three', 3)])

console transcript

d ={ "zero": 0, "one": 1, "two": 2, "three": 3 }
max( d.items(), key=lambda p: p[ 1 ])

('three', 3)

Reading a text file and the `with`-statement

»with« will automatically close the file after the end of the suite.

The script writes a text file and then reads and prints it.

main.py

with open( "gettysburg.txt", "w" )as file:
     file.write( '''
Four score and seven years ago,
our fathers brought forth on this continent
a new nation, 
conceived in liberty
and dedicated to the proposition
that all men are created equal.
'''[ 1: ])
with open( "gettysburg.txt" )as f:
    print( f.read() )

transcript

Four score and seven years ago,
our fathers brought forth on this continent
a new nation, 
conceived in liberty
and dedicated to the proposition
that all men are created equal.

Iterating a text file

Iterating over a text file, gives the individual lines. (The file read from is an iterator.)

main.py

with open( "gettysburg.txt", "w" )as file:
     file.write( '''
Four score and seven years ago,
our fathers brought forth on this continent
a new nation, 
conceived in liberty
and dedicated to the proposition
that all men are created equal.
'''[ 1: ])
with open( "gettysburg.txt" )as f:
    for line in f:
        print( repr( line ))

transcript

'Four score and seven years ago,\n'
'our fathers brought forth on this continent\n'
'a new nation, \n'
'conceived in liberty\n'
'and dedicated to the proposition\n'
'that all men are created equal.\n'

main.py

with open( "code.txt", "w" )as file:
     file.write( '''
            *** CODE TABLE ***
CHARACTER VALUE
        A    65
        B    66
        C    67
'''[ 1: ])
with open( "code.txt" )as f:
    next( f )
    next( f )
    for line in f: print( line.strip() )

transcript

A    65
B    66
C    67

File encodings

When processing text files in linguistics it is important to use the correct encoding to read or write the file.

The three most important encodings are: ASCII (outdated, but still in use), ISO-8859-1 (also outdated, but still in use) and UTF-8 (the current standard, recommended).

You can and should state the encoding explicitly when working with text files.

main.py

from sys import stdout
with open( "tmp20200213192506.utx", "w", encoding="utf-8" )as file:
     file.write( '''
Ooh, look at me, I’m a chic umlaut. 
I make girls’ names look modish, like Zoë and Chloë.
And the extent of all-possible-orbits, I call the etendue. 
It sounds like "Ed Tondue" as in rhymes with "fondue". 
But also with an accent over the first e, so that it's really étendue. 
それをチェックしよう{$c65281}
'''[ 1: ])
with open( "tmp20200213192506.utx", encoding="utf-8" )as f:
    text = f.read()
    # print( text )
    stdout.buffer.write( text.encode( 'utf-8' ))

transcript

Ooh, look at me, I’m a chic umlaut. 
I make girls’ names look modish, like Zoë and Chloë.

And the extent of all-possible-orbits, I call the etendue. 
It sounds like "Ed Tondue" as in rhymes with "fondue". 
But also with an accent over the first e, so that it's really étendue. 

それをチェックしよう{$c65281}

Usually Unicode can be written with »print«, but I am using a special system to capture the output of Python scripts on Windows that only works when I use »stdout.buffer.write( text.encode( 'utf-8' ))« to output Unicode text to the console. This effectively forces Python to write the text encoded using UTF-8, while »print« uses an encoding it deems right, but which might not be the encoding needed here.

Adding line numbers

main.py

with open( "gettysburg.txt", "w" )as file:
     file.write( '''
Four score and seven years ago,
our fathers brought forth on this continent
a new nation, 
conceived in liberty
and dedicated to the proposition
that all men are created equal.
'''[ 1: ])
with open( "gettysburg.txt" )as f:
    for line in enumerate( f, start=1 ):
        print( line[ 0 ], line[ 1 ], end="" )

transcript

1 Four score and seven years ago,
2 our fathers brought forth on this continent
3 a new nation, 
4 conceived in liberty
5 and dedicated to the proposition
6 that all men are created equal.

Zipping two files

main.py

with open( "gettysburg.txt", "w" )as file:
     file.write( '''
four 
score 
and 
seven
'''[ 1: ])
with open( "translation.txt", "w" )as file:
     file.write( '''
vier
zwanzig
und
sieben
'''[ 1: ])
with open( "gettysburg.txt" )as f:
    with open( "translation.txt" )as g:
        for line in zip( f, g ):
            print( f"{line[ 0 ][ :-1 ]:10s}{line[ 1 ][ :-1 ]:10s}" )

transcript

four      vier      
score     zwanzig   
and       und       
seven     sieben

Building a dict

main.py

with open( "gettysburg.txt", "w" )as file:
     file.write( '''
four
score
and
seven
'''[ 1: ])
with open( "translation.txt", "w" )as file:
     file.write( '''
vier
zwanzig
und
sieben
'''[ 1: ])
def strip( s ):
    return s.strip()
with open( "gettysburg.txt" )as f:
    with open( "translation.txt" )as g:
        d = dict( zip( map( strip, f ), map( strip, g )))
print( d[ 'and' ])

transcript

und

Regular Expressions

[ðəˈɑɚˈiˈmɑʤul] the re module

[ˈɹɛˌɡɛks] regex, abbreviation of “regular expression”

[ˈfaɪndˈɪṭɚ] »finditer«

for match in re.finditer( pattern, string ):
    # once for each regex match

Supplements

Function definitions with »def«

Returning values

Parameters

More than one parameter

Default values

The if-else expression

The if statement

The while-loop statement

The break statement

ExceptionsThe for loopThe comma operator

List literals

Unpacking

Attribute expressions

Using module attributes

Methods

str-Methods

tuple-Methods

list-Methods

The alias effect

Values versus effects

Exercises

/ Reversing

/ Sorting

Item notation

Slice notation

Assignments to items and to slices

Counters

Dictionaries

Reading a text file and the with-statement

Iterating a text file

File encodings

Adding line numbers

Zipping two files

Building a dict

Regular Expressions

Classes

Function definitions with »`def`«

Exceptions
The for loop
The comma operator

`str`-Methods

`tuple`-Methods

`list`-Methods

Reading a text file and the `with`-statement