WalaWiki content from p1k3.com
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 

22 lines
1.5 KiB

SeeAlso: RegularExpressions. At the moment, this page is a sample of refining a regex until it returns the desired results - all the second lines of a set of files which are something other than plaintext. (i.e., contain HTML or weird control characters.)
bbearnes@wendigo:~/scrape/bankofamerica$ grep -rinL "[a-z.,]*" ./*.dir |less
bbearnes@wendigo:~/scrape/bankofamerica$ grep -rin "[a-z.,]*" ./*.dir |less
bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "[a-z.,]*" ./*.dir |less
bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "^[a-z.,]*" ./*.dir |less
bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "^[a-z.,]*$" ./*.dir |less
bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "^[a-z., ]*$" ./*.dir |less
bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "^[a-z., ]+$" ./*.dir |less
bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "^[a-z., ]?$" ./*.dir |less
bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "^[a-z., ]\+$" ./*.dir |less
bbearnes@wendigo:~/scrape/bankofamerica$ grep -rin "^[a-z., ]\+$" ./*.dir |less
bbearnes@wendigo:~/scrape/bankofamerica$ egrep -ri "^[a-z., ]+$" ./*.dir |less
bbearnes@wendigo:~/scrape/bankofamerica$ egrep -ri "^[a-z., ]+$" ./*.dir |less
...
egrep -vrin "^[0-9a-z.,'()\"#:$ ]+$" ./*.dir |grep ":2:" |less
egrep -vrin "^[-0-9a-z.,'()\"#:$;%& ]+$" ./*.dir |grep ":2:"
egrep -vrin "^[-0-9a-z.,'()\"#:$;%&/? ]+$" ./*.dir |grep ":2:"
At this point it gives ''nearly'' every line which is the 2nd in the file and isn't a PlainText headline. This is not quite good enough.