
In the previous lecture, we learned about string indexes. In this lecture, we will learn how to use indexes to perform slicing (range specification).
Table of Contents(目次)
String slicing (retrieving a substring)
In the previous lecture, we learned how to retrieve individual characters. Here, we will learn how to use slices to retrieve partial strings.
Basics of string slicing
To specify a range of characters, use the following syntax.
s[n:m]
: Get characters from index “[n]” (including) to “[m]” (NOT including) in the string.
When slicing strings, it is important to note that the element at index "[m]" is not included. This may not be easy to explain, but the following is a more accurate way of stating it.
"Extract characters starting from the element at index "[n]" in order. However, only elements to the left of index "[m]" are included."
This may be difficult for beginners to grasp immediately, but this is how the rule works.
Let's try slicing the range of “Nico” using the following code and running it.
# Slice the range of “Nico”
s = "I'm Nico, 10."
nico = s[4:8]
print(nico)
When you run the above code, “Nico” will be output.
Let's check the behavior of string slicing using the table below. First, the start index for the range specification is [4]. Checking the character corresponding to “[4]” in the table below, we find that it is N
. Python extracts characters from N
onwards, moving to the right. However, this is conditional on the elements being to the left of index [8].
As a result, the characters “Nico” from [4][5][6][7] are extracted and assigned to the variable “nico.”
The extracted characters are copied and assigned to the new variable, so the contents of the variable “s” remain unchanged.
[0] | [1] | [2] | [3] | [4] | [5] | [6] | [7] | [8] | [9] | [10] | [11] | [12] |
I | ' | m | | N | i | c | o | , | | 1 | 0 | . |
When specifying a range such as s[4:12]
for the string “I'm Nico, 10” (the maximum index is [12]), the slice behaves as if it does not include the element [12], resulting in the last period (.
) not being extracted. In other words, this means that the last element cannot be extracted.
We will explain how to deal with this issue in the latter half of this lecture, in the section titled “Syntax that omits index specification.”
At this stage, please make sure you understand that the behavior is such that “slices do not include the [m] elements of s[n:m]
.”
Use negative values to specify slice indexes.
You can also specify the slice number using negative values. To extract the “Nico” part from the previous example, you can use the following notation.
# Slice the range of “Nico”
s = "I'm Nico, 10."
print(s[-9:8])
print(s[4:-5])
print(s[-9:-5])
The string slice formats in the above code are all valid. s[n:m]
is valid if "n" specifies the starting index of the string to be extracted and "m" specifies the position to the right of the starting position n within the string.
[0] | [1] | [2] | [3] | [4] | [5] | [6] | [7] | [8] | [9] | [10] | [11] | [12] |
[-13] | [-12] | [-11] | [-10] | [-9] | [-8] | [-7] | [-6] | [-5] | [-4] | [-3] | [-2] | [-1] |
I | ' | m | | N | i | c | o | , | | 1 | 0 | . |
When specifying “n” and “m” in s[n:m]
, make sure that the element specified by “m” is to the right of the element specified by “n” within the string.
If “n” and “m” point to the same position, or if ‘n’ specifies an element to the right of “m,” an empty string is returned.
# Example of empty string returned
s = "I'm Nico, 10."
print(s[4:4]) # Specifying the same position
print(s[-9:4]) # Specifying the same position
print(s[8:4]) # The order of location specifications is reversed.
print(s[-5:-9]) # The order of location specifications is reversed.(When the value is negative)
Range of indexes that can be specified by slice
In the previous lecture, we explained that when extracting characters using index specification (in the format s[n], not slices), specifying an index outside the string range will result in an error. However, in the case of slices, specifying a value outside the range will not result in an error, and the characters will be extracted.
Let's execute the following code and confirm this.
s = "I'm Nico, 10."
print(s[4:100])
print(s[-100:3])
The above code contains extreme values in the index, but no error occurs when the program is executed.
As shown in the second line of the above code, if the value of “m” in s[n:m]
is specified as a value greater than the maximum index of the string, the behavior will be to extract characters up to the last element. In the case of s[4:100]
, the characters from the [4]th element to the last [12]th element are extracted. This is because, assuming that elements up to [100] exist, [12] is located to the left of [100].
The third line of the above code, s[-100:3]
, extracts characters from the first ([0]) to [2]. (Index [3] is not included.) [-100] specifies a position to the left of the first character of the string. Still, since there is no information at that position, the extraction starts from the first character ([0]) where information exists.
When you want to slice a string up to and including the last element, you will typically use the format explained in the section “Syntax that omits index specification” below. (This is written as s[4:]
.)
As in the previous example, you can extract the string up to the last element by specifying an extremely large index as m in s[n:m]
or specifying the length of the string as m in s[4:len(s)]
. However, this is not a common Python format.
If you have decided at the time of writing the code that you want to extract up to the last element using slicing, simply write it as s[4:]
.
Syntax that omits index specification
You can omit the values of n and m in the slice format (s[n:m]
). In that case, the characters will be extracted as follows.
- When n is omitted: Extract the element from index [0].
- When m is omitted: Extract including the last element.
The important point here is that when m is omitted, the last element is included in the extraction. Please note the difference in behavior between specifying m, which excludes m, and not specifying m, which includes the last element.
Now, let's execute the following code and check the result of the slice.
s = "I'm Nico, 10."
print(s[:5]) # "I'm N" (From index [0] to one before index [5])
print(s[5:]) # "ico, 10." (From index [5] to the last element)
print(s[:]) # "I'm Nico, 10." (From index [0] to the last element)
print(s[-3:]) # "10." (From the third element from the back ([-3]) to the last element)
The output results of the above code and explanations of the extracted ranges are provided in the comments for each line. Please check the table below to confirm the index ranges from which the strings are extracted.
[0] | [1] | [2] | [3] | [4] | [5] | [6] | [7] | [8] | [9] | [10] | [11] | [12] |
[-13] | [-12] | [-11] | [-10] | [-9] | [-8] | [-7] | [-6] | [-5] | [-4] | [-3] | [-2] | [-1] |
I | ' | m | | N | i | c | o | , | | 1 | 0 | . |
As a supplement to the above code, if you write s[:]
, the characters from the beginning to the end of the original string will be extracted. If you only want to print the output, writing “print(s)” will produce the same result. Additionally, there is no advantage to using [:]
with strings, so this is simply an example to show that such notation is possible.
(Strictly speaking, when using s[:]
, the extracted string information is copied to create new data, resulting in different behavior from “print(s)”. Unfortunately, there is no benefit to using [:]
with strings; however, in future lectures on “data structures,” such as lists, we will explain the important role that [:]
plays. The specific scenarios and roles for using [:]
will be explained in detail in a dedicated lecture titled “How to Manage Data and Variables in Python.”)
“Step” in string slices
You can specify a “step” for string slices. Using steps, you can perform operations such as “retrieve every other character” and “reverse a string.”
s[n:m:step]
: Extract characters in the range from element number “[n]” (inclusive) to element number “[m]” (exclusive) in increments of step.
- When using step, the values of n and m can be omitted.
【* Note when the value of step is negative *】
You can also specify a negative value for step, in which case the characters are extracted in reverse order. In this case, note that if the value of [m] is not to the left of the position of the element in [n], an empty string will be returned.
Example: Specify as s[5:0:-1]
.
(The method for specifying the values of n and m when the step value is a positive integer remains the same as before.)
Now, let's run the following code and check the results.
s = '0123456789'
print(s[::2]) # Get even-numbered characters
print(s[1::2]) # Get odd-numbered characters
print(s[::-1]) # retrieve in reverse order
print(s[6:0:-1]) # Get in reverse order (since [m] is not included, it will be in reverse order from 6 to 1)
This time, we will use a string of numbers from 0 to 9 for clarity.
s[::2]
retrieves even-numbered characters. Since the values of n and m are not specified, characters are extracted every two characters, from the first character to the last. The output result is "02468".
s[1::2]
retrieves odd-numbered characters. Characters are extracted every two characters from the element at index [1]. Since the value of m is not specified, the last character is included. The output result is "13579".
s[::-1]
extracts characters every -1 character within the range from the first character to the last character, since the values of n and m are not specified. In other words, characters are extracted one by one in reverse order. The output result is "9876543210".
s[6:0:-1]
specifies the values of n and m and extracts characters in reverse order within the specified range. When the step is a negative value, m must specify a position to the left of n. Since n is index [6], m must specify the range from index [0] to [5], which are positions to the left of n. Note that in this case, the slice range follows the rule of “extracting characters from index [n] (inclusive) to [m] (exclusive).” This means that characters are extracted one by one in reverse order within the range from index [6] (inclusive) to [0] (exclusive), resulting in the output “654321.”