Supplements
The time given was not sufficient to cover all important issues.
What is missing from the preceding parts of this course, fast and in brief:
– Please loosen up this talk with many comments and questions! –
Function definitions with »def«
In the console, an empty line is required after the definition. This empty line is not required when the text is written into a text file.
- console transcript
def hello_world(): print( "Hello, world!" )
hello_world()Hello, world!
- console transcript
def hello_world():
print( "Hello, world!" )
hello_world()Hello, world!
Returning values
- console transcript
def hello_world():
return "Hello, world!"
print( hello_world() + hello_world() )Hello, world!Hello, world!
Parameters
- console transcript
def hello( what ):
print( f"Hello {what}!" )
hello( "you" )Hello, you!
- console transcript
def next( n ):
return n + 1
print( next( 7 ))8
More than one parameter
- console transcript
def write( x, y ):
print( f"{x} {y}" )
write( 2, 3 )2 3
Default values
- console transcript
def write( x, y=0 ):
print( f"{x} {y}" )
write( 2 )2 0
The if-else expression
The central expression (»t > 20«) is evaluated and converted to ›bool‹. The left (right) expression is evaluated if it is true (false), and the result of that evaluation becomes the value of the whole if expression.
- console transcript
def judge( t ):
print( f"It's {'ok' if t > 20 else 'too cold'}." )
judge( 18 )It's too cold.
judge( 22 )
It's ok.
The if statement
- console transcript
def judge( t ):
if t <= 20:
print( f"It's too cold." )
judge( 18 )It's too cold.
judge( 22 )
(no output)
- console transcript
def judge( t ):
if t <= 20:
print( f"It's too cold." )
elif 20 < t <= 24:
print( f"It's ok." )
judge( 18 )It's too cold.
judge( 22 )
It's ok.
judge( 30 )
(no output)
- console transcript
def judge( t ):
if t <= 20:
print( f"It's too cold." )
elif 20 < t <= 24:
print( f"It's ok." )
else:
print( f"It's too hot." )
judge( 18 )It's too cold.
judge( 22 )
It's ok.
judge( 30 )
It's too hot.
The while-loop statement
- console transcript
from random import random
def times_for( x ):
looping = True
count = 0
while looping:
r = random()
count += 1
if r >= x:
looping = False
return count
print( times_for( 0.5 ))1
print( times_for( 0.99999 ))
7982
The break statement
- console transcript
from random import random
def times_for( x ):
count = 0
while True:
r = random()
count += 1
if r >= x:
break
return count
print( times_for( 0.5 ))2
print( times_for( 0.99999 ))
43030
Exceptions
The for loop
The comma operator
Tuples can be written using the comma operator. ₍pəˈrɛnθəˌsiz₎ Parentheses are not always required, but in case of doubt, you should add them.
- console transcript
()
()
type( _ )
<class 'tuple'>
- console transcript
1,
(1,)
type( _ )
<class 'tuple'>
- console transcript
1, 2
(1, 2)
type( _ )
<class 'tuple'>
- console transcript
sorted( 3, 1, 4, 1 )
TypeError: sorted expected 1 argument, got 4
sorted( ( 3, 1, 4, 1 ))
[1, 1, 3, 4]
Expressing a tuple with four zeros and then two ones.
- console transcript
4 *( 0, )+ 2 *( 1, )
(0, 0, 0, 0, 1, 1)
List literals
Lists can be written using brackets and commas.
- console transcript
[]
[]
type( _ )
<class 'list'>
- console transcript
[ 1 ]
[1]
type( _ )
<class 'list'>
- console transcript
[ 1, 2 ]
[1, 2]
type( _ )
<class 'list'>
Expressing a list with four zeros and then two ones.
- console transcript
4 *[ 0 ]+ 2 *[ 1 ]
[0, 0, 0, 0, 1, 1]
Unpacking
Tuple notation can be used on the left or the right of an assignment operator.
- console transcript
p = 1, 2
print( p )
(1, 2)
x, y = p
print( x )
1
print( y )
2
- console transcript
a, b = 7, 4
print( a )
7
print( b )
4
a, b = b, a
print( a )
4
print( b )
7
- console transcript
l = enumerate( ( 'Adam', 'Baker', 'Charlie' ))
for i, s in l: print( i, s )
0 Adam
1 Baker
2 Charlie
Attribute expressions
»a.b « means „the object of the attribute b of the object a «. Examples below.
Using module attributes
From now on, we prefer to use attribte notation for the attributes of a module.
- console transcript
import math
print( math.floor( 2.3 ))
2
Methods
Functions which are attributes of objects often are called methods.
For example, str-strings have a method »split«. Examples below.
str-Methods
- evaluation
help( str )
Help on class str in module builtins:
…- evaluation
tuple( filter( lambda s: '_' not in s, dir( "example" )))
('capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'index', 'isalnum', 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill')
- evaluation
"Adam, Baker, Charlie".split( ", " )
['Adam', 'Baker', 'Charlie']
- evaluation
", ".join( ['Adam', 'Baker', 'Charlie'] )
'Adam, Baker, Charlie'
- evaluation
"Adam, Baker, Charlie".replace( ", ", " - " )
'Adam - Baker - Charlie'
- evaluation
"Adam, Baker, Charlie".startswith( "Charlie" )
False
- evaluation
"Adam, Baker, Charlie".endswith( "Charlie" )
True
- evaluation
"Adam, Baker, Charlie".find( "Baker" )
6
- evaluation
"Adam, Baker, Charlie".index( "Baker" )
6
- evaluation
" Adam, Baker, Charlie ".strip()
'Adam, Baker, Charlie'
- evaluation
"Adam, Baker, Charlie".count( "Ch" )
1
Counts non-overlapping matches!
- evaluation
"BBBBB".count( "BBB" )
1
tuple-Methods
- evaluation
help( () )
Help on tuple object:
…- evaluation
help( tuple )
Help on class tuple in module builtins:
…- evaluation
tuple( filter( lambda s: '_' not in s, dir( () )))
('count', 'index')
( 0, 2, 4, 6, 8 ).index( 4 )
2
list-Methods
- evaluation
help( [] )
Help on list object:
…- evaluation
help( list )
Help on class list in module builtins:
…- evaluation
tuple( filter( lambda s: '_' not in s, dir( [] )))
('append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort')
- evaluation
[ 0, 2, 4, 6, 8 ].index( 4 )
2
- console transscript
l = [ 0, 2, 4, 6, 8 ]
l.append( 10 )
print( l )
[0, 2, 4, 6, 8, 10]
Lists are mutable. Tuples are not. So this is the difference between Lists and Tuples! Above the list object »l« was modified using »append«.
Many list methods are mutators, changing their list.
- console transscript
print( l.pop() )
10
print( l )
[0, 2, 4, 6, 8]
- console transscript
l.reverse()
print( l )
[8, 6, 4, 2, 0]
- console transscript
l.sort()
print( l )
[0, 2, 4, 6, 8]
The alias effect
- console transscript
l =[ 0, 2, 4, 6, 8 ]
s = l
l.append( 10 )
print( s )
[0, 2, 4, 6, 8, 10]
Values versus effects
Values are computedexpressed, no object is harmedmodified.
- console transscript
tuple( reversed( ( 0, 1, 2 )))
(2, 1, 0)
tuple( sorted( ( 2, 0, 1 )))
(0, 1, 2)
list( reversed( [ 0, 1, 2 ]))
[2, 1, 0]
list( sorted( [ 2, 0, 1 ]))
[0, 1, 2]
“destructive operations”, objects are being modified “in place”.
- console transscript
l =[ 0, 1, 2 ]
l.reverse()
print( l )
[2, 1, 0]
l.sort()
print( l )
[0, 1, 2]
Exercises
/ Reversing
Given the following string called “source ”, use Python to print the string “result ” (as given below) where the order of the quoted words is reversed. (For example, the quoted word »"PLEASED"« was first, but should be last.)
The Python script to be written should contain the string “source ” and then use means of the language to print the string “result ”. The string “result ” should not be a part of the source text of this Python script.
- the string “source ”
"PLEASED" "NICE" "REACHING" "LINKED" "SMOOTH" "TALKED" "THROWN" "POSSESS" "EATING" "FRIENDLY" "REJECTED" "FAULT" "DENIED" "HABITS" "ROUGH" "SORRY" "DISORDER" "AWARENESS" "WORST" "LIKED" "INTENSE" "AMONGST" "SELDOM" "NOBODY" "DREAMS" "GUESS" "MEANWHILE" "ACTED" "ACCEPTABLE" "SOMEWHERE" "SPEAKS" "CAUSING" "HELPING" "WIDTH" "TINY" "DROVE" "TIRED" "MEAL" "SOLE" "DENY" "THREW" "ALIKE" "MAGIC" "DESK" "DESIRES" "ENDING" "ANYWAY" "WHEREVER" "INTENT" "MADAME" "FLOOD" "APPRECIATE" "HI" "SIZES" "SUBTLE" "TIDE" "DIARY" "NIGHTS" "ANYBODY" "GUEST" "PROSE" "LAUGHTER" "CRUEL" "RHYTHM" "VAGUE" "UTMOST" "FOOL" "IGNORE"
- the string “result ”
"IGNORE" "FOOL" "UTMOST" "VAGUE" "RHYTHM" "CRUEL" "LAUGHTER" "PROSE" "GUEST" "ANYBODY" "NIGHTS" "DIARY" "TIDE" "SUBTLE" "SIZES" "HI" "APPRECIATE" "FLOOD" "MADAME" "INTENT" "WHEREVER" "ANYWAY" "ENDING" "DESIRES" "DESK" "MAGIC" "ALIKE" "THREW" "DENY" "SOLE" "MEAL" "TIRED" "DROVE" "TINY" "WIDTH" "HELPING" "CAUSING" "SPEAKS" "SOMEWHERE" "ACCEPTABLE" "ACTED" "MEANWHILE" "GUESS" "DREAMS" "NOBODY" "SELDOM" "AMONGST" "INTENSE" "LIKED" "WORST" "AWARENESS" "DISORDER" "SORRY" "ROUGH" "HABITS" "DENIED" "FAULT" "REJECTED" "FRIENDLY" "EATING" "POSSESS" "THROWN" "TALKED" "SMOOTH" "LINKED" "REACHING" "NICE" "PLEASED"
/ Sorting
Given the following string called “source ”, use Python to print the string “result ” (as given below) where the quoted words are sorted lexicographically.
The Python script to be written should contain the string “source ” and then use means of the language to print the string “result ”. The string “result ” should not be a part of the source text of this Python script.
- the string “source ”
"PLEASED" "NICE" "REACHING" "LINKED" "SMOOTH" "TALKED" "THROWN" "POSSESS" "EATING" "FRIENDLY" "REJECTED" "FAULT" "DENIED" "HABITS" "ROUGH" "SORRY" "DISORDER" "AWARENESS" "WORST" "LIKED" "INTENSE" "AMONGST" "SELDOM" "NOBODY" "DREAMS" "GUESS" "MEANWHILE" "ACTED" "ACCEPTABLE" "SOMEWHERE" "SPEAKS" "CAUSING" "HELPING" "WIDTH" "TINY" "DROVE" "TIRED" "MEAL" "SOLE" "DENY" "THREW" "ALIKE" "MAGIC" "DESK" "DESIRES" "ENDING" "ANYWAY" "WHEREVER" "INTENT" "MADAME" "FLOOD" "APPRECIATE" "HI" "SIZES" "SUBTLE" "TIDE" "DIARY" "NIGHTS" "ANYBODY" "GUEST" "PROSE" "LAUGHTER" "CRUEL" "RHYTHM" "VAGUE" "UTMOST" "FOOL" "IGNORE"
- the string “result ”
"ACCEPTABLE" "ACTED" "ALIKE" "AMONGST" "ANYBODY" "ANYWAY" "APPRECIATE" "AWARENESS" "CAUSING" "CRUEL" "DENIED" "DENY" "DESIRES" "DESK" "DIARY" "DISORDER" "DREAMS" "DROVE" "EATING" "ENDING" "FAULT" "FLOOD" "FOOL" "FRIENDLY" "GUESS" "GUEST" "HABITS" "HELPING" "HI" "IGNORE" "INTENSE" "INTENT" "LAUGHTER" "LIKED" "LINKED" "MADAME" "MAGIC" "MEAL" "MEANWHILE" "NICE" "NIGHTS" "NOBODY" "PLEASED" "POSSESS" "PROSE" "REACHING" "REJECTED" "RHYTHM" "ROUGH" "SELDOM" "SIZES" "SMOOTH" "SOLE" "SOMEWHERE" "SORRY" "SPEAKS" "SUBTLE" "TALKED" "THREW" "THROWN" "TIDE" "TINY" "TIRED" "UTMOST" "VAGUE" "WHEREVER" "WIDTH" "WORST"
Item notation
- console transscript
l =[ 0, 2, 4, 6, 8 ]
print( l[ 2 ])
4
Slice notation
- console transscript
l =[ 0, 2, 4, 6, 8 ]
print( l[ 2: 4 ])
[4, 6]
print( l[ 2: ])
[4, 6, 8]
print( l[ :2 ])
[0, 2]
print( l[:] )
[0, 2, 4, 6, 8]
Remember not to write »2, 4«, but »2: 4«!
- console transscript
l =[ 0, 2, 4, 6, 8 ]
print( l[ 2, 4 ])
TypeError: list indices must be integers or slices, not tuple
print( l[ 2: 4 ])
[4, 6]
{$ugxxo~A slice is a copy.}- console transscript
l =[ 0, 2, 4, 6, 8 ]
s = l[:]
l.append( 10 )
print( s )
[0, 2, 4, 6, 8]
Assignments to items and to slices
- console transscript
l =[ 0, 2, 4, 6, 8 ]
l[ 2 ]= 7
print( l )
[0, 2, 7, 6, 8]
- console transscript
l =[ 0, 2, 4, 6, 8 ]
l[ 2, 4 ]= []
print( l )
[0, 2, 8]
Counters
Counting with lists:
- console transscript or script file
l =[ 0 ]* 1000
l[ 100 ] += 1
l[ 100 ] += 1
l[ 150 ] += 1
print( tuple( filter( lambda x: x, l )))
print( tuple( filter( lambda x: x[ 1 ], enumerate( l ))))
((100, 2), (150, 1))
Counting with a special Counter class:
- console transscript or script file
import collections
c = collections.Counter()
c[ 100 ] += 1; print( c ) # Counter({100: 1})
c[ 100 ] += 1; print( c ) # Counter({100: 2})
c[ 150 ] += 1; print( c ) # Counter({100: 2, 150: 1})
You do not have to initialize the counter with zeros!
You can also count letters or words!
- console transscript or script file
import collections
c = collections.Counter()
c[ "a" ] += 1; print( c ) # Counter({'a': 1})
c[ "a" ] += 1; print( c ) # Counter({'a': 2})
c[ "Baker" ] += 1; print( c ) # Counter({'a': 2, 'Baker': 1})
print( c[ "a" ]) # 2
There actually is a more terse way.
- console transscript
import collections
print( collections.Counter( ( "a", "a", "Baker" )))
Counter({'a': 2, 'Baker': 1})
- console transscript
import collections
print( collections.Counter( "beispielsweise" ))
Counter({'e': 4, 'i': 3, 's': 3, 'b': 1, 'p': 1, 'l': 1, 'w': 1})
- console transscript
import collections
print( collections.Counter( "Adam, Baker, Charlie, Baker, Adam, Adam, Adam".split( ", " )))
Counter({'Adam': 4, 'Baker': 2, 'Charlie': 1})
Dictionaries
Dictionaries allow arbitrary mappings.
- console transscript
d = dict()
d[ "Adam" ]= "Baker"
print( d[ "Adam" ])
Baker
- console transscript
d = { 'David': [], (): 'Henry' }
d[ 'David' ]
[]
d[ () ]
'Henry'
- console transcript
d ={ "zero": 0, "one": 1, "two": 2, "three": 3 }
d[ "zero" ]
0
{$ugxxo~When dictionaries are being used as iterables, they yield their names.}- console transcript
d ={ "zero": 0, "one": 1, "two": 2, "three": 3 }
tuple( d )
('zero', 'one', 'two', 'three')
{$ugxxo~The values can be expressed using the »values« method. The result can be used like tuple.}- console transcript
d ={ "zero": 0, "one": 1, "two": 2, "three": 3 }
d.values()
dict_values([0, 1, 2, 3])
{$ugxxo~The name-value pairs can be expressed using the »items« method. The result can be used like tuple of pairs.}- console transcript
d ={ "zero": 0, "one": 1, "two": 2, "three": 3 }
d.items()
dict_items([('zero', 0), ('one', 1), ('two', 2), ('three', 3)])
- console transcript
d ={ "zero": 0, "one": 1, "two": 2, "three": 3 }
max( d.items(), key=lambda p: p[ 1 ])
('three', 3)
Reading a text file and the with-statement
»with« will automatically close the file after the end of the suite.
The script writes a text file and then reads and prints it.
main.py
with open( "gettysburg.txt", "w" )as file:
file.write( '''
Four score and seven years ago,
our fathers brought forth on this continent
a new nation,
conceived in liberty
and dedicated to the proposition
that all men are created equal.
'''[ 1: ])with open( "gettysburg.txt" )as f:
print( f.read() )- transcript
Four score and seven years ago,
our fathers brought forth on this continent
a new nation,
conceived in liberty
and dedicated to the proposition
that all men are created equal.
Iterating a text file
Iterating over a text file, gives the individual lines. (The file read from is an iterator.)
main.py
with open( "gettysburg.txt", "w" )as file:
file.write( '''
Four score and seven years ago,
our fathers brought forth on this continent
a new nation,
conceived in liberty
and dedicated to the proposition
that all men are created equal.
'''[ 1: ])with open( "gettysburg.txt" )as f:
for line in f:
print( repr( line ))- transcript
'Four score and seven years ago,\n'
'our fathers brought forth on this continent\n'
'a new nation, \n'
'conceived in liberty\n'
'and dedicated to the proposition\n'
'that all men are created equal.\n'main.py
with open( "code.txt", "w" )as file:
file.write( '''
*** CODE TABLE ***
CHARACTER VALUE
A 65
B 66
C 67
'''[ 1: ])with open( "code.txt" )as f:
next( f )
next( f )
for line in f: print( line.strip() )- transcript
A 65
B 66
C 67
File encodings
When processing text files in linguistics it is important to use the correct encoding to read or write the file.
The three most important encodings are: ASCII (outdated, but still in use), ISO-8859-1 (also outdated, but still in use) and UTF-8 (the current standard, recommended).
You can and should state the encoding explicitly when working with text files.
main.py
from sys import stdout
with open( "tmp20200213192506.utx", "w", encoding="utf-8" )as file:
file.write( '''Ooh, look at me, I’m a chic umlaut.
I make girls’ names look modish, like Zoë and Chloë.And the extent of all-possible-orbits, I call the etendue.
It sounds like "Ed Tondue" as in rhymes with "fondue".
But also with an accent over the first e, so that it's really étendue.それをチェックしよう{$c65281}
'''[ 1: ])
with open( "tmp20200213192506.utx", encoding="utf-8" )as f:
text = f.read()
# print( text )
stdout.buffer.write( text.encode( 'utf-8' ))- transcript
Ooh, look at me, I’m a chic umlaut.
I make girls’ names look modish, like Zoë and Chloë.
And the extent of all-possible-orbits, I call the etendue.
It sounds like "Ed Tondue" as in rhymes with "fondue".
But also with an accent over the first e, so that it's really étendue.
それをチェックしよう{$c65281}
Usually Unicode can be written with »print«, but I am using a special system to capture the output of Python scripts on Windows that only works when I use »stdout.buffer.write( text.encode( 'utf-8' ))« to output Unicode text to the console. This effectively forces Python to write the text encoded using UTF-8, while »print« uses an encoding it deems right, but which might not be the encoding needed here.
Adding line numbers
main.py
with open( "gettysburg.txt", "w" )as file:
file.write( '''
Four score and seven years ago,
our fathers brought forth on this continent
a new nation,
conceived in liberty
and dedicated to the proposition
that all men are created equal.
'''[ 1: ])with open( "gettysburg.txt" )as f:
for line in enumerate( f, start=1 ):
print( line[ 0 ], line[ 1 ], end="" )- transcript
1 Four score and seven years ago,
2 our fathers brought forth on this continent
3 a new nation,
4 conceived in liberty
5 and dedicated to the proposition
6 that all men are created equal.
Zipping two files
main.py
with open( "gettysburg.txt", "w" )as file:
file.write( '''
four
score
and
seven
'''[ 1: ])with open( "translation.txt", "w" )as file:
file.write( '''
vier
zwanzig
und
sieben
'''[ 1: ])with open( "gettysburg.txt" )as f:
with open( "translation.txt" )as g:
for line in zip( f, g ):
print( f"{line[ 0 ][ :-1 ]:10s}{line[ 1 ][ :-1 ]:10s}" )- transcript
four vier
score zwanzig
and und
seven sieben
Building a dict
main.py
with open( "gettysburg.txt", "w" )as file:
file.write( '''
four
score
and
seven
'''[ 1: ])with open( "translation.txt", "w" )as file:
file.write( '''
vier
zwanzig
und
sieben
'''[ 1: ])def strip( s ):
return s.strip()with open( "gettysburg.txt" )as f:
with open( "translation.txt" )as g:
d = dict( zip( map( strip, f ), map( strip, g )))print( d[ 'and' ])
- transcript
und
Regular Expressions
[ðəˈɑɚˈiˈmɑʤul] the re module
[ˈɹɛˌɡɛks] regex, abbreviation of “regular expression”
[ˈfaɪndˈɪṭɚ] »finditer«
for match in re.finditer( pattern, string ):
# once for each regex match
Classes