|
SeeAlso: RegularExpressions. At the moment, this page is a sample of refining a regex until it returns the desired results - all the second lines of a set of files which are something other than plaintext. (i.e., contain HTML or weird control characters.)
|
|
|
|
bbearnes@wendigo:~/scrape/bankofamerica$ grep -rinL "[a-z.,]*" ./*.dir |less
|
|
bbearnes@wendigo:~/scrape/bankofamerica$ grep -rin "[a-z.,]*" ./*.dir |less
|
|
bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "[a-z.,]*" ./*.dir |less
|
|
bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "^[a-z.,]*" ./*.dir |less
|
|
bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "^[a-z.,]*$" ./*.dir |less
|
|
bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "^[a-z., ]*$" ./*.dir |less
|
|
bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "^[a-z., ]+$" ./*.dir |less
|
|
bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "^[a-z., ]?$" ./*.dir |less
|
|
bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "^[a-z., ]\+$" ./*.dir |less
|
|
bbearnes@wendigo:~/scrape/bankofamerica$ grep -rin "^[a-z., ]\+$" ./*.dir |less
|
|
bbearnes@wendigo:~/scrape/bankofamerica$ egrep -ri "^[a-z., ]+$" ./*.dir |less
|
|
bbearnes@wendigo:~/scrape/bankofamerica$ egrep -ri "^[a-z., ]+$" ./*.dir |less
|
|
|
|
...
|
|
|
|
egrep -vrin "^[0-9a-z.,'()\"#:$ ]+$" ./*.dir |grep ":2:" |less
|
|
egrep -vrin "^[-0-9a-z.,'()\"#:$;%& ]+$" ./*.dir |grep ":2:"
|
|
egrep -vrin "^[-0-9a-z.,'()\"#:$;%&/? ]+$" ./*.dir |grep ":2:"
|
|
|
|
At this point it gives ''nearly'' every line which is the 2nd in the file and isn't a PlainText headline. This is not quite good enough.
|