Lists and indexes¶

Credits: Institute of Maritime Logistics – Hamburg University of Technology

Lists are the most commonly used data structure. A list is as a sequence of data that is enclosed in square brackets and separated by a comma. Individual data pieces can be accessed by calling its index value.

Lists are declared as follows:

x = [7, 4.5, 'apple'] # a list with three different types of data
y = [1, 7, 9]         # a list with three identical types of data
z = []                # an empty list

type(x), type(y), type(z) # "type" asks python to specify what a variable represents

(list, list, list)

1 Indexing ¶

The individual elements within a list are numbered consecutively and the numbering starts again at 0, as with strings. This allows you to call each value in a list by its position (called Index) in the list. This is called indexing and it works just like calling values in a string:

x[0], x[1], x[2]

(7, 4.5, 'apple')

Thus the number 7 is at index 0 of the list x. The number 4.5 is at index 1 and ‘apple’ at index 2.

Indexing can also be done in reverse order. That is the last element can be accessed first. Here, indexing starts from -1. Thus index value -1 will be apple and index -2 will be 4.5.

x[-1], x[-2], x[-3]

('apple', 4.5, 7)

As you might have already guessed, x[0] = x[-3], x[1] = x[-2] and x[2] = x[-1] in a list with three elements. This concept can be extended towards lists with more elements. For a list with n elements x[0] = x[-n] applies.

Lists can not only have the known data types like string or numbers as entries. A list can also have another list as an entry:

a = ['apple', 'orange']
b = ['carrot','potato']
c = [a,b]
print("c=", c)

c= [['apple', 'orange'], ['carrot', 'potato']]

The lists a and b each contain two elements of type string. The list c contains two elements, namely the lists a and b. Such a list within another list is called a nested list.

Indexing in nested lists can be quite confusing if you do not understand how indexing works in python. So let us break it down and then arrive at a conclusion.

Let us access the data ‘orange’ in the above nested list c. First, at index 0 there is a list [‘apple’,’orange’] and at index 1 there is another list [‘carrot’,’potato’]. Hence c[0] should give us the first list which contains ‘apple’ and ‘orange’. From this list we can take the second element (index 1) to get ‘orange’:

c[0][1]

'orange'

print(c[0], "is the element with index 0 in c")
print(c[0][1], " is the element with index 1 in the list which has index 0 in c")
print(c[1][0], " is the element with index 0 in the list which has index 1 in c") 
# The order is important! c[0][1] and c[1][0] are different!

['apple', 'orange'] is the element with index 0 in c
orange  is the element with index 1 in the list which has index 0 in c
carrot  is the element with index 0 in the list which has index 1 in c

2 Slicing ¶

Indexing was only limited to accessing a single element, Slicing on the other hand is accessing a sequence of data inside the list. In other words “slicing” the list.

Slicing is again analogous to displaying a substring. It is used [x:y] to specify all data from indexx to index y-1. Since only data up to index y-1 is output, it is again guaranteed that the number of data in the list [x:y] is exactly y-x.

As with substrings, an omitted first value refers to the index 0 and an omitted last value to the last index of the list.

ch = ['a','b','c','d','e','f','g','h','i']
ch[0:4]

['a', 'b', 'c', 'd']

ch[4:]

['e', 'f', 'g', 'h', 'i']

You can also slice a parent list with a fixed length or step length, which is indicated after another colon:

ch[0:9:3] #"0" defines the index 0, "9" defines the end index, 3 defines the increment between steps

['a', 'd', 'g']

3 Built in List Functions ¶

To find the length of the list or the number of elements in a list, len( ) is used.

len(ch)

If the list consists of all integer elements then min( ) and max( ) gives the minimum and maximum value in the list. Similarly sum is the sum

num = [1,2,3,4,5,6,7,8,9]
print("min =",min(num),"  max =",max(num),"  total =",sum(num))

min = 1   max = 9   total = 45

Lists can be concatenated by adding, ‘+’ them. The resultant list will contain all the elements of the lists that were added. The resultant list will not be a nested list.

[1,2,3] + [5,4,7]

[1, 2, 3, 5, 4, 7]

There might arise a requirement where you need to check if a particular element is there in a predefined list. Consider the below list.

names = ['Earth','Air','Fire','Water']

To check if ‘Fire’ and ‘Space’ is present in the list names, a conventional approach would be to use a for loop and iterate over the list and use the if condition. But in python you can use ‘a in b’ concept which would return ‘True’ if a is present in b and ‘False’ if not.

'Fire' in names

True

'Space' in names

False

In a list with string elements, max( ) and min( ) are still applicable and return the first/last element in lexicographical order.

mlist = ['bzaa', 'acs', 'ac', 'az', 'zg', 'k']
print("max =",max(mlist))
print("min =",min(mlist))

max = zg
min = ac

When comparing ASCII values, the system proceeds character by character, starting with the first character. If there is exactly one element whose first character has the lowest / highest ASCII value, min/max outputs this element. In the above example, this is the case for the “maximum string” ‘zg’. For the “minimum string” there are several elements with an ‘a’ at the first place, so the min function compares the second places here. With ‘acs’ and ‘ac’ there are two possible candidates for the “minimal string”. If a string “stops”, it is automatically smaller than a string that continues, i.e. ‘ac’ is smaller than ‘acs’. This procedure is the same as the alphabetical sorting.

However, if you write numbers as strings and want to output the minimum or maximum, they are also compared character by character. Thereby it is ignored how many characters the number contains, i.e. whether it is written in tens, hundreds, thousands,… range. This explains strange “errors”, as the following example shows:

nlist = ['5', '10', '93', '94', '1000']
print("max =",max(nlist))
print('min =',min(nlist))

max = 94
min = 10

If you want to find the max( ) string element based on the length of the string then another parameter key can be used to specify the function to use for generating the value on which to sort. Hence finding the longest and shortest string in mlist can be done using the len function:

print('longest =',max(mlist, key=len))
print('shortest =',min(mlist, key=len))

longest = bzaa
shortest = k

Any other built-in or user defined function can be used.

A string can be converted into a list by using the list() function, or more usefully using the split() method, which breaks strings up based on spaces.

print(list('hello world !'),'Hello   World !!'.split())

['h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', ' ', '!'] ['Hello', 'World', '!!']

append( ) is used to add a single element at the end of the list.

lst = [1,1,4,8,7]
lst.append(1)
lst

[1, 1, 4, 8, 7, 1]

Appending multiple elements to a list would create a sublist. To avoid a nested list then the extend( ) function can be used.

lst.extend([10,11,12])
lst

[1, 1, 4, 8, 7, 1, 10, 11, 12]

count( ) is used to count the number of a particular element that is present in the list.

lst.count(1)

index( ) is used to find the index value of a particular element. Note that if there are multiple elements of the same value then the first index value of that element is returned.

lst.index(1), lst.index(11)

(0, 7)

insert(x,y) is used to insert a element y at a specified index value x. append( ) function made it only possible to insert at the end.

lst.insert(5, 'name')
lst

[1, 1, 4, 8, 7, 'name', 1, 10, 11, 12]

insert(x,y) inserts but does not replace elements. If you want to replace an element with another element you simply assign the value to that particular index.

lst[5] = 'Python'
lst

[1, 1, 4, 8, 7, 'Python', 1, 10, 11, 12]

pop( ) removes and returns the last element in the list.

lst.pop()

The Index value can be specified to pop a ceratin element corresponding to that index value.

lst.pop(0)

pop( ) is used to remove an element based on it’s index value. One can also remove element by specifying the element itself using the remove( ) function.

lst.remove('Python')
lst

[1, 4, 8, 7, 1, 10, 11]

Alternative to remove function but with using index value is del. It is basically the same as pop without returning it.

del lst[1]
lst

[1, 8, 7, 1, 10, 11]

The entire elements present in the list can be reversed by using the reverse() function.

lst.reverse()
lst

[11, 10, 1, 7, 8, 1]

Note that in case of a nested list an element like [5,4,2,8] is treated as a single element of the parent list lst. Thus the elements inside the nested list is not reversed.

lst2 = [['a','b'], [5,4,2,8], [1], []]
lst2.reverse()
lst2

[[], [1], [5, 4, 2, 8], ['a', 'b']]

Python offers built in operation sort( ) to arrange the elements in ascending order. Alternatively sorted() can be used to construct a copy of the list in sorted order without changing the original list.

lst.sort()
lst

[1, 1, 7, 8, 10, 11]

a = [3,4,1]
b = sorted(a)
a, b

([3, 4, 1], [1, 3, 4])

For descending order use the parameter “reverse” and set it to “True”.

lst.sort(reverse=True)
lst

[11, 10, 8, 7, 1, 1]

Similarly for lists containing string elements, sort( ) would sort the elements based on it’s ASCII value in ascending and by specifying reverse=True in descending.

names.sort()
names

['Air', 'Earth', 'Fire', 'Water']

names.sort(reverse=True)
names

['Water', 'Fire', 'Earth', 'Air']

To sort based on length key=len should be specified as shown.

names.sort(key=len)
names

['Air', 'Fire', 'Water', 'Earth']

print(sorted(names,key=len,reverse=True))

['Water', 'Earth', 'Fire', 'Air']

4 Copying a list ¶

Assignment of a list does not imply copying. It simply creates a second reference to the same list. Most of new python programmers get caught out by this initially. Consider the following,

lista= [2,1,4,3]
listb = lista
listb

[2, 1, 4, 3]

Here, We have declared a list, lista = [2,1,4,3]. This list is copied to listb by assigning its value and it gets copied as seen. Now we perform some random operations on lista.

lista.sort()
lista.pop()
lista.append(9)
print("A =",lista)
print("B =",listb)

A = [1, 2, 3, 9]
B = [1, 2, 3, 9]

listb has also changed though no operation has been performed on it. This is because you have assigned the same memory space of lista to listb. So how do fix this?

If you recall, in slicing we had seen that parentlist[a:b] returns a list from parent list with start index a and end index b and if a and b is not mentioned then by default it considers the first and last element. We use the same concept here. By doing so, we are assigning the data of lista to listb as a variable.

lista = [2,1,4,3]
listb = lista[:]           # make a new list by taking a slice from beginning to end of lista
print("Starting with:")
print("A =",lista)
print("B =",listb)
lista.sort()
lista.pop()
lista.append(9)
print("Finished with:")
print("A =",lista)
print("B =",listb)

Starting with:
A = [2, 1, 4, 3]
B = [2, 1, 4, 3]
Finished with:
A = [1, 2, 3, 9]
B = [2, 1, 4, 3]

Data Quality Explored