WalaWiki content from p1k3.com
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

21 lines
1.5 KiB

  1. SeeAlso: RegularExpressions. At the moment, this page is a sample of refining a regex until it returns the desired results - all the second lines of a set of files which are something other than plaintext. (i.e., contain HTML or weird control characters.)
  2. bbearnes@wendigo:~/scrape/bankofamerica$ grep -rinL "[a-z.,]*" ./*.dir |less
  3. bbearnes@wendigo:~/scrape/bankofamerica$ grep -rin "[a-z.,]*" ./*.dir |less
  4. bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "[a-z.,]*" ./*.dir |less
  5. bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "^[a-z.,]*" ./*.dir |less
  6. bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "^[a-z.,]*$" ./*.dir |less
  7. bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "^[a-z., ]*$" ./*.dir |less
  8. bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "^[a-z., ]+$" ./*.dir |less
  9. bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "^[a-z., ]?$" ./*.dir |less
  10. bbearnes@wendigo:~/scrape/bankofamerica$ grep -ri "^[a-z., ]\+$" ./*.dir |less
  11. bbearnes@wendigo:~/scrape/bankofamerica$ grep -rin "^[a-z., ]\+$" ./*.dir |less
  12. bbearnes@wendigo:~/scrape/bankofamerica$ egrep -ri "^[a-z., ]+$" ./*.dir |less
  13. bbearnes@wendigo:~/scrape/bankofamerica$ egrep -ri "^[a-z., ]+$" ./*.dir |less
  14. ...
  15. egrep -vrin "^[0-9a-z.,'()\"#:$ ]+$" ./*.dir |grep ":2:" |less
  16. egrep -vrin "^[-0-9a-z.,'()\"#:$;%& ]+$" ./*.dir |grep ":2:"
  17. egrep -vrin "^[-0-9a-z.,'()\"#:$;%&/? ]+$" ./*.dir |grep ":2:"
  18. At this point it gives ''nearly'' every line which is the 2nd in the file and isn't a PlainText headline. This is not quite good enough.