Browse Source

fragments/most-instances: find line w/most instances of substring

For earthtopus, who asks:

> say I have a big text file and I want to find the line(s) with the most
> instances of a substring on it.  Say the most instances of the letter R.
>
> What's a good (and by good I mean lazy) way to do that?

There's probably a switch to grep or something for this, but whatever.
main
Brennen Bearnes 1 year ago
parent
commit
59aee8b73b
2 changed files with 38 additions and 0 deletions
  1. +8
    -0
      home/fragments/most-instances/example.txt
  2. +30
    -0
      home/fragments/most-instances/most-substrings.pl

+ 8
- 0
home/fragments/most-instances/example.txt View File

@ -0,0 +1,8 @@
123R46R
RRR213R
RRRabcR
R123R
R
RR321
123
abcde

+ 30
- 0
home/fragments/most-instances/most-substrings.pl View File

@ -0,0 +1,30 @@
#!/usr/bin/env perl
use warnings;
use strict;
# Usage: perl ./most-substrings.pl "R" ./example.txt
my $substr = shift @ARGV;
my %counts;
while (my $line = <>) {
chomp $line;
# Get a count of matching substrings:
my @matches = $line =~ m/$substr/g;
my $count = scalar @matches;
$counts{$line} = $count
}
# Find the highest count of substring:
my $max = 0;
foreach my $count (values %counts) {
$max = $count if $count > $max;
}
# For every line, check if its count is the highest we hit:
foreach my $line (keys %counts) {
print "$line\n" if $counts{$line} == $max;
}

Loading…
Cancel
Save