Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

Share this Page URL

Regular Expressions > Unconstrained Repetitions - Pg. 250

If, for example, the input is supposed to consist of a label, followed by a single space, followed by an equals sign, followed by a single space, followed by an value...don't bet on it. Most users nowadays will--quite reasonably--assume that whitespace is negotiable; nothing more than an elastic formatting medium. So, in a configuration file, you're just as likely to get something like: name = Yossarian, J rank = Captain serial_num = 3192304 The whitespaces in that data might be single tabs, multiple tabs, multiple spaces, sin- gle spaces, or any combination thereof. So matching that data with a pattern that insists on exactly one space character at the relevant points is unlikely to be uni- formly successful: $config_line =~ m{ ($IDENT) [\N{SPACE}] = [\N{SPACE}] (.*) }xms Worse still, it's also unlikely to be uniformly unsuccessful. For instance, in the exam- ple data, it might only match the serial number. And that kind of intermittent suc- cess will make your program much harder to debug. It might also make it difficult to realize that any debugging is required. Unless you're specifically vetting data to verify that it conforms to a required fixed for- mat, it's much better to be very liberal in what you accept when it comes to whitespace. Use \s+ for any required whitespace and \s* for any optional whitespace. For example, it would be far more robust to match the example data against: