Reference: Strings
From 6.00 reference wiki
Strings | Lists | Dictionaries
Contents |
Basic operations
Length of a String
len allows you to count the number of characters in a string:
>>> len("abc") 3 >>> len("") 0
Equality
Two strings are equal if and only if they have exactly the same contents, meaning that they are both the same length and each character has a one-to-one positional correspondence. Example:
>>> a = 'hello'; b = 'hello' # Assign 'hello' to a and b. >>> print a == b # True True >>> print a == 'hello' # True >>> print a == "hello" # (choice of delimiter is unimportant) True >>> print a == 'hello ' # (extra space) False >>> print a == 'Hello' # (wrong case) False
Iterating over
A string is a sequence of characters. Using a for loop, we can iterate over the characters in this sequence, in order.
>>> str = "abcdef" >>> for x in str: print x a b c d e f
The above for loop is equivalent to the following while loop:
str = "abcdef" i = 0 while i < len(str): x = str[i] print x i = i + 1
The for loop is clearer and easier to write down.
Concatenation
You can concatenate two strings together to produce a new string whose value is the first string followed by the second string.
>>> "abc" + "def" "abcdef"
In fact, you can concatenate an aribtrary number of strings together:
>>> "this" + " " + "is" + " " + "an" + " " + "example" "this is an example"
You can also concatenate integers and floats to strings, by first converting the numbers to strings using the str function:
>>> counter = 10 >>> "counter : " + str(counter) 'counter : 10'
Multiplication
'Multiplying' a string by an integer value n returns a new string whose value is the original string repeated n times.
>>> 'a' * 5 'aaaaa' >>> 'abc' * 5 'abcabcabcabcabc'
Indexing and Slicing
Indexing
The individual characters in a string can be accessed by an integer representing its position in the string. Python uses zero-based indices. The first character in string s is s[0] and the nth character is indexed as s[n-1].
>>> s = “Xanadu” >>> s[1] ‘a’
Python also indexes the arrays backwards, using negative numbers. The last character has index -1, the second to last character has index -2, and so on.
>>> s[-4] ‘n’
For example, the string "Xanadu" is indexed like this:
Element: 'X' 'a' 'n' 'a' 'd' 'u' Index: 0 1 2 3 4 5 -6 -5 -4 -3 -2 -1
Range errors:
If you provide a positive index greater than or equal to the length of the string, you will get an index-out-of-range error.
>>> s = "Xanadu" >>> len(s) 6 >>> s[5] 'u' >>> s[6] Traceback (most recent call last): File "<pyshell#8>", line 1, in -toplevel- s[6] IndexError: string index out of range
Similarly, if you provide a negative integer whose magnitude is greater than the length of the string, you will get an index-out-of-range error.
>>> s[-6] 'X' >>> s[-7] Traceback (most recent call last): File "<pyshell#11>", line 1, in -toplevel- s[-7] IndexError: string index out of range
Slicing
We can also use slices to access a substring of s. s[a:b] will give us a string starting with s[a] and ending with s[b-1]. Notice that s[b] is not returned.
>>> s = “Xanadu” >>> s[1:4] ‘ana’ >>> s[0:len(s)] 'Xanadu'
Another feature of slices is that if the beginning or end is left empty, it will default to the first or last index, depending on context:
>>> s[2:] ‘nadu’ >>> s[:3] ‘Xan’ >>> s[:] ‘Xanadu’
You can also use negative numbers in slices:
>>> print s[-2:] ‘du’
Range errors:
You do not get index-out-of-bounds errors with slices.
>>> s[-77:88] 'Xanadu'
Immutability
Since strings are immutable, we cannot assign to indexes or slices.
>>> s[0] = 'J' Traceback (most recent call last): File “<stdin>”, line 1, in ? TypeError: object does not support item assignment >>> s[1:3] = "up" Traceback (most recent call last): File “<stdin>”, line 1, in ? TypeError: object does not support slice assignment
Searching Strings
Containment
The in operator returns True if the first operand is contained in the second. When x and y are strings, the expression x in y is True if-and-only-if the value of x is a substring of the value of y; otherwise, the expression is False.
>>> x = 'hello' >>> y = 'll' >>> x in y False >>> y in x True >>> x in x True >>> z = 'hex' >>> z in x False >> z = "Hel" # (uppercase 'H') False
find, index, rfind, rindex
The find and index methods take a string as input, and return the index of the first found occurrence of that string.
>>> s = 'Hello, world' >>> s.find('l') 2 >>> s.find('He') 0 >>> s.find('Hex') -1
If the given string is not a substring is not found, find returns -1 but index raises a ValueError.
rfind and rindex are the same as find and index except that they search through the string from right to left (i.e. they find the last occurance).
>>> s.rfind('l') 10
Sometimes it is useful to use these functions to slice out substrings:
>>> s[s.index('l'):] 'llo, world' >>> s[:s.rindex('l')] 'Hello, wor' >>> s[s.index('l'):s.rindex('l')] 'llo, wor'
Because Python strings accept negative subscripts, index is probably better used in situations like the one shown because using find instead would yield an incorrect value. However, using index requires us to surround the code in a try block.
Converting Strings to Lists and back
Lists from Strings (split, splitlines)
The split method returns a list of the words in the string.
>>> s = 'Hello, world' >>> s.split() ['Hello, ',', 'world'] '''split''' can take a separator argument to use instead of whitespace. >>> s.split('l') ['He', '', 'o, wor', 'd']
Note that in neither case is the separator included in the split strings, but empty strings are allowed.
Splitting the empty string gives us an empty list:
>>> s = >>> s.split() []
The splitlines method breaks a multiline string into many single line strings. It is analogous to split('\n') (but accepts '\r' and '\r\n' as delimiters as well) except that if the string ends in a newline character, splitlines ignores that final character (see example).
>>> s = """ ... One line ... Two lines ... Red lines ... Blue lines ... Green lines ... """ >>> s.split('\n') ['', 'One line', 'Two lines', 'Red lines', 'Blue lines', 'Green lines', ''] >>> s.splitlines() ['', 'One line', 'Two lines', 'Red lines', 'Blue lines', 'Green lines']
Strings from Lists (join)
Joins together the given sequence with the string as separator:
>>> seq = ['1', '2', '3', '4', '5'] >>> ' '.join(seq) '1 2 3 4 5' >>> '+'.join(seq) '1+2+3+4+5'
Miscellaneous operations
Changing case (uppercase/lowercase/capitalization)
The lower method returns a version of the string with all lowercase letters; the upper method returns a string with all uppercase letters.
>>> s = "Hello World" >>> s.lower() 'hello world' >>> s.upper() 'HELLO WORLD'
The title method capitalizes the first letter of each word in the string (and makes the rest lower case). Words are identified as substrings of alphabetic characters that are separated by non-alphabetic characters. This can lead to some unexpected behavior. For example, the string "x1x" will be converted to "X1X" instead of "X1x".
The swapcase method makes all uppercase letters lowercase and vice versa.
The capitalize method is like title except that it considers the entire string to be a word. (i.e. it makes the first character upper case and the rest lower case)
Example:
>>> s = 'Hello, wOrLD' >>> s 'Hello, wOrLD' >>> s.title() 'Hello, World' >>> s.upper() 'HELLO, WORLD' >>> s.lower() 'hello, world' >>> s.swapcase() 'hELLO, WoRld' >>> s.capitalize() 'Hello, world'
The is* methods (isalnum, isalpha, isdigit, islower, ...)
- isalnum returns True if the string is entirely composed of alphabetic or numeric characters (i.e. no punctuation).
- isalpha and isdigit work similarly for alphabetic characters or numeric characters only.
- islower, isupper, and istitle return True if the string is in lowercase, uppercase, or titlecase respectively (titlecase, means the first character of each word is uppercase and the rest are lowercase).
- isspace returns True if the string is composed entirely of whitespace.
Counting repeated substrings
The count method returns the number of the specified substrings in the string. For example:
>>> s = 'Hello, world' >>> s.count('l') # print the number of 'l's in 'Hello, World' (3) 3
Replacing substrings
replace returns a copy of the string with all occurrences of the first parameter replaced with the second parameter.
>>> str = 'Hello, world' >>> newStr = str.replace('o', 'X') >>> print str 'Hello, world' >>> print newStr 'HellX, wXrld'
Or, without variables:
>>> 'Hello, world'.replace('o', 'X') 'HellX, wXrld'
String Immutability: Notice, the original variable (str
) remains unchanged after the call to replace.
(these notes have been adapted/expanded from the programming:python wikibook)