|
|
- SeeAlso: RegularExpressions. At the moment, this page is a sample of refining a regex until it returns the desired results - all the second lines of a set of files which are something other than plaintext. (i.e., contain HTML or weird control characters.)
-
- bbearnes@wendigo:~/scrape/bankofamerica$ grep -rinL "[a-z.,]*" ./*.dir |less
- bbearnes@wendigo:~/scrape/bankofamerica$ grep -rin "[a-z.,]*" ./*.dir |less
- bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "[a-z.,]*" ./*.dir |less
- bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "^[a-z.,]*" ./*.dir |less
- bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "^[a-z.,]*$" ./*.dir |less
- bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "^[a-z., ]*$" ./*.dir |less
- bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "^[a-z., ]+$" ./*.dir |less
- bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "^[a-z., ]?$" ./*.dir |less
- bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "^[a-z., ]\+$" ./*.dir |less
- bbearnes@wendigo:~/scrape/bankofamerica$ grep -rin "^[a-z., ]\+$" ./*.dir |less
- bbearnes@wendigo:~/scrape/bankofamerica$ egrep -ri "^[a-z., ]+$" ./*.dir |less
- bbearnes@wendigo:~/scrape/bankofamerica$ egrep -ri "^[a-z., ]+$" ./*.dir |less
-
- ...
-
- egrep -vrin "^[0-9a-z.,'()\"#:$ ]+$" ./*.dir |grep ":2:" |less
- egrep -vrin "^[-0-9a-z.,'()\"#:$;%& ]+$" ./*.dir |grep ":2:"
- egrep -vrin "^[-0-9a-z.,'()\"#:$;%&/? ]+$" ./*.dir |grep ":2:"
-
- At this point it gives ''nearly'' every line which is the 2nd in the file and isn't a PlainText headline. This is not quite good enough.
|