Mastering Python Regular Expressions Copyright © 2014 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Credits Authors Project Coordinator Félix López Sageer Parkar Víctor Romero Proofreader Reviewers Linda Morris Mohit Goenka Indexer Jing (Dave) Tian Priya Subramani Acquisition Editors Graphics James Jones Ronak Dhruv Mary Jasmine Nadar Abhinash Sahu Content Development Editor Production Coordinator Rikshith Shetty Nitesh Thakur Technical Editors Cover Work Akashdeep Kundu Nitesh Thakur Faisal Siddiqui Copy Editors Roshni Banerjee Sarang Chari.
About the Authors Félix López started his career in web development before moving to software in the currency exchange market, where there were a lot of new security challenges. Later, he spent four years creating an IDE to develop games for hundreds of different mobile device OS variations, in addition to creating more than 50 games. Before joining ShuttleCloud, he spent two years working on applications with sensor networks, Arduino, ZigBee, and custom hardware. One example is an application that detects the need for streetlight utilities in major cities based on existing atmospheric brightness. His first experience with Python was seven years ago, He used it for small scripts, web scrapping, and so on. Since then, he has used Python for almost all his projects: websites, standalone applications, and so on. Nowadays, he uses Python along with RabbitMQ in order to integrate services.
About the Reviewers Mohit Goenka graduated from the University of Southern California (USC) with an M.Sc. in computer science. His thesis emphasized on Game Theory and Human Behavior concepts as applied in real-world security games. He also received an award for academic excellence from the Office of International Services at USC. He has showcased his presence in various realms of computers, including artificial intelligence, machine learning, path planning, multiagent systems, neural networks, computer vision, computer networks, and operating systems.
www.PacktPub.com Support files, eBooks, discount offers and more You might want to visit www.PacktPub.com for support files and downloads related to your book. Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
Table of Contents Preface 1 Chapter 1: Introducing Regular Expressions 5 History, relevance, and purpose 6 The regular expression syntax 8 Literals 9 Character classes 11 Predefined character classes 12 Alternation 14 Quantifiers 16 Greedy and reluctant quantifiers 19 Boundary Matchers 20 Summary 23 Chapter 2: Regular Expressions with Python 25 A brief introduction 25 Backslash in string literals 27 String Python 2.x 27 Building blocks for Python regex 28 RegexObject 28 Searching 30 Modifying a string 35 MatchObject 39 group([group1, …]) 39 groups([default]) 40 groupdict([default]) 41 start([group]) 41 end([group]) 42 span([group]) 42 expand(template) 42.
Preface Text processing has been one of the most relevant topics since computer science took its very first baby steps. After a few decades of investigation, we now have one of the most versatile and pervasive tools that exist: regular expressions. Validation, search, extraction, and replacement of text are operations that have been simplified thanks to Regular Expressions.
Introducing Regular Expressions Regular expressions are text patterns that define the form a text string should have. Using them, among other usages, it will be possible to do the following activities: • Check if an input honors a given pattern; for example, we can check whether a value entered in a HTML formulary is a valid e-mail address • Look for a pattern appearance in a piece of text; for example, check if either the word "color" or the word "colour" appears in a document with just one scan • Extract specific portions of a text; for example, extract the postal code of an address • Replace portions of text; for example, change any appearance of "color" or "colour" with "red" • Split a larger text into smaller pieces, for example, splitting a text by any appearance of the dot, comma, or newline characters In this chapter, we are going to learn the basics of regular expressions from a language-agnostic point of view. At the end of the chapter, we will understand how regular expressions work, but we won't yet be able to execute a regular expression in Python. This is going to be covered in the next chapter. Because of this reason, the examples in this chapter will be approached from a theoretical point of view rather than being executed in Python.