CIT052 Index > Notes 2

Notes 2

Chapter 3

Section 3.1.2

In Table 3.1, you will see that some metacharacters are preceded by a backslash. This is application-dependent. In some applications, you need to precede metacharacters such as { and } with a backslash in order to give them their special meaning. In other applications, they have their special meaning without the backslash. Moral of the story: know which application you are using, and when in doubt, check the man page.

In the examples for vi, you do not need the closing slash on a regular expression. To find the first occurrence of the word computer, you would just type /computer. If you are using regular expressions in a language like Perl, then you do need the closing slash.

The small foonote in the middle of page 73 should be set in large type. In the shell, * is a wildcard. In regular expressions, * is a quantifier that tells how many of the preceding item you want. (In this case, zero or more.)

Section 3.2

Make sure you understand all of the expressions in Example 3.9. You may wonder how the last expression matches the word Dan, especially if you know that * matches as many characters as possible.

^[A-Za-z]*[^,][A-Za-z]*$

The following explanation goes into far more detail than you will see in the book, but without some explanation, that last pattern is a total mystery. If you don’t understand this explanation, don’t worry. I will not give you anything nearly as complex as this on a test! If you wish, I will go into detail about regular expression matching in detail at our next group discussion.

What happens is this: the first time the pattern match engine goes through the expression, the [A-Za-z]* does indeed match the entire string. Doing so, however, doesn't leave anything for the rest of the expression to match. The pattern matcher then “backs off” one character at a time and retries until it can find a match. When all is said and done, the first [A-Za-z]* ends up matching the letters Da and [^,] ends up matching the letter n. The last [A-Za-z]* ends up matching zero characters, which is fine, because * means “zero or more.”

Chapter 4

Section 4.1.1

This is a nice piece of history, but I would never ask about this sort of thing on a test.

Section 4.2

In example 4.11, leave off the backslash before the dot, and you will see that you match more lines. Why does the line starting with eastern match? Because the pattern now is calling for a 5 followed by any two characters, and there is a five followed by two blanks in that line.

Do not misread Example 4.13. This will find all lines beginning with either of the letters w or e, not both letters one after the other. Remember, a series of letters inside square brackets matches exactly one character.

Example 4.19 is a terrible example. Try typing:

grep '\<east' datafile

This pattern will find only the line with eastern, but not northeast or southeast, because the word east is not at the beginning of a word.

In example 4.21, the sentence “Watch the .* symbol. It means any character, including whitespace.” should be in huge, bold letters. So, in fact, in the first line shown, the pattern actually matched the first character in the line (lowercase, on a word boundary) all the way to the n in Main, because .* matches as much as possible of any character. The moral of the story is: if you use .* you may end up matching much more than you really wanted to.

If you wanted a pattern that matched all lowercase words ending in n (a much more reasonable thing to do), this would be the pattern:

grep '\<[a-z][a-z]*n\>' datafile

Quiz Yourself

Given the following lines, which line or lines will each pattern find? Some patterns may not match any of the lines.

  1. I found a bug in the tenth line of my script.
  2. A fairly big southern state is abbreviated as TN.
  3. The teenager's baggage is in your room.
  4. This song's for you.
  5. Arms, legs, and hands.

Here are the patterns:

\([a-z]\)\1

f[^aei]r

h.l

te*n

ro*m

b[aeiou]g\>

s[aeiou]*[s-z]

[ATGI]\{2,3\}

Here are the answers