brennen
/
p1k3-wala


								SeeAlso: RegularExpressions. At the moment, this page is a sample of refining a regex until it returns the desired results - all the second lines of a set of files which are something other than plaintext. (i.e., contain HTML or weird control characters.)


								 bbearnes@wendigo:~/scrape/bankofamerica$ grep -rinL "[a-z.,]*" ./*.dir |less

								 bbearnes@wendigo:~/scrape/bankofamerica$ grep -rin "[a-z.,]*" ./*.dir |less

								 bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "[a-z.,]*" ./*.dir |less

								 bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "^[a-z.,]*" ./*.dir |less

								 bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "^[a-z.,]*$" ./*.dir |less

								 bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "^[a-z., ]*$" ./*.dir |less

								 bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "^[a-z., ]+$" ./*.dir |less

								 bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "^[a-z., ]?$" ./*.dir |less

								 bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "^[a-z., ]\+$" ./*.dir |less

								 bbearnes@wendigo:~/scrape/bankofamerica$ grep -rin "^[a-z., ]\+$" ./*.dir |less

								 bbearnes@wendigo:~/scrape/bankofamerica$ egrep -ri "^[a-z., ]+$" ./*.dir |less

								 bbearnes@wendigo:~/scrape/bankofamerica$ egrep -ri "^[a-z., ]+$" ./*.dir |less


								...


								 egrep -vrin "^[0-9a-z.,'()\"#:$ ]+$" ./*.dir |grep ":2:" |less

								 egrep -vrin "^[-0-9a-z.,'()\"#:$;%& ]+$" ./*.dir |grep ":2:"

								 egrep -vrin "^[-0-9a-z.,'()\"#:$;%&/? ]+$" ./*.dir |grep ":2:"


								At this point it gives ''nearly'' every line which is the 2nd in the file and isn't a PlainText headline.  This is not quite good enough.