Rebecca Warren via public-inbox
Clogstats tells you statistics about your WeeChat channels by reading your chat logs. It can currently tell you the most active IRC channels and nicks across a given duration (the last 24 hours by default).
The motivation for writing clogstats was to overcome the limitations of its predecessor, chattiest-channels; chattiest-channels served mostly as a proof-of-concept and had very messy date/time handling.
Clogstats requires at least Python 3.6.1.
Without any extras, it supports both CPython and PyPy 7.3.1+.
On Python 3.7+, the only 3rd-party dependency is Pandas.
Python 3.6 users also need a backport of
Python 3.7's dataclasses
library.
For advanced time-series manipulation and forecasting, clogstats can optionally use darts. darts has many large 3rd-party dependencies of its own, most of which do not support PyPy.
Install with pip
:
python3 -m pip install --user git+https://git.sr.ht/~seirdy/clogstats
Install clogstats with support for advanced time-series forecasting (no PyPy support):
python3 -m pip install --user git+https://git.sr.ht/~seirdy/clogstats#egg=clogstats[forecasting]
I recommend trying out pipx to auto-create
virtual environments for Python executables and add them to your $PATH
:
python3 -m pip install --user pipx
# use --system-site-packages if your distro offers packages for pandas/numpy so you don't have to build them yourself
pipx install --system-site-packages git+https://git.sr.ht/~seirdy/clogstats
Mitigating the effects of spam and flooding continues to be an ongoing challenge. Flood mitigation measures include:
BOT_BLACKLISTS
in clogstats/gather_stats.py
Planned areas of improvement for flood mitigation primarily involve filtering out
messages by user-configurable per-network regular expressions and nick blacklists. I
might automate generating nick blacklists with a WeeChat script that runs
/msg botserv botlist
on a list of IRC server buffers and saves the output to a
file.
Clogstats can sort channels by their number of messages or number of non-lurkers (i.e., the number of nicks that actually sent a message). It can also display the top most active nicks for each channel.
Time-series modelling and forecasting requires installation with the "forecasting" dependency. Forecasts are a work in progress; as of right now, they require a lot of tuning to be accurate.
Charting channel activity in Matplotlib, comparing two different forecasts with the actual output:
usage: clogstats [-h] [-d DURATION] [-n NUM] [--min-activity MIN_ACTIVITY] [--min-nicks MIN_NICKS] [--max-topwords MAX_TOPWORDS] [-s {msgs,nicks}]
[--include-channels [INCLUDE_CHANNELS [INCLUDE_CHANNELS ...]]] [--exclude-channels [EXCLUDE_CHANNELS [EXCLUDE_CHANNELS ...]]]
[--disable-bot-filters]
Gather statistics from WeeChat log files.
optional arguments:
-h, --help show this help message and exit
-d DURATION, --duration DURATION
start analyzing messages from DURATION hours ago
-n NUM, --num NUM limit output to the top NUM channels
--min-activity MIN_ACTIVITY
limit output to channels with at least MIN_ACTIVITY messages.
--min-nicks MIN_NICKS
limit output to channels with at least MIN_NICKS nicks
--max-topwords MAX_TOPWORDS
show the nicks and message counts for the MAX_TOPWORDS most active nicks
-s {msgs,nicks}, --sort-by {msgs,nicks}
key to sort channels by
--include-channels [INCLUDE_CHANNELS [INCLUDE_CHANNELS ...]]
only analyze these channels. format: "network.#channel"
--exclude-channels [EXCLUDE_CHANNELS [EXCLUDE_CHANNELS ...]]
list of channels to exclude. format: "network.#channel"
--disable-bot-filters
disable filtering of some known bots
Print the 10 most active IRC channels from the past 24 hours that have at least 40 chatters, along with the top 4 most active nicks per channel:
clogstats -n 10 --sort-by msgs -d 24 --min-nicks 40 --max-topwords 4
Output:
Analyzing logs from 2020-05-18 15:58:14.626763 till 2020-05-19 15:58:14.626763
total messages: 33146
RANK CHANNEL MSGS NICKS TOPWORDS
1. tilde_chat.#meta 2897 63 kumquat: 417, jan6: 410, brendo: 207, ben: 130
2. snoonet.#gnulag 2838 50 browndawg: 592, ldlework: 172, mrneon: 140, iamidly: 134
3. freenode.##linux 2753 147 floridaman: 346, phogg: 245, rascul: 189, lukey: 85
4. freenode.#python 2491 145 corvus-corax: 214, snoopjedi: 191, teut: 167, _habnabit: 145
5. efnet.#lrh 2024 81 rondito: 349, jupedbird: 236, butth0le: 80, \\\\\: 77
6. freenode.#anime 1931 53 luke-jr: 198, amigojapan: 191, emmeka: 140, butternoodle: 129
7. ircnet.#worldchat 1233 62 kanasta: 118, flowergirl42: 104, miri: 97, klywilen: 97
8. quakenet.#quarantine 1149 51 olli: 149, redzain_: 142, chenko: 135, `sun357: 105
9. darkscience.#darkscience 992 44 workinggoose: 145, sun-light: 108, dijit: 76, exusser: 68
10. rizon.#chat 953 51 dorkmund: 113, dfx: 94, irish666: 84, piba: 70
Another example: say I finished an anime episode that just came out and want to talk about it. Clogstats can filter my anime channels to just those that were active in the past 30 minutes:
clogstats -d 0.5 --sort-by msgs --min-activity 1 --include-channels \
"freenode.##anime" "freenode.#anime" "freenode.#reddit-anime" \
"quakenet.#anime" "rizon.#anime" "tilde_chat.#anime"
Output:
Analyzing logs from 2020-05-18 17:08:07.076732 till 2020-05-18 17:38:07.076732
total messages: 66
RANK CHANNEL MSGS NICKS TOPWORDS
1. freenode.#anime 65 11 MootPoot: 16, emmeka: 13, ButterNoodle: 13
2. quakenet.#anime 1 1 Fanen: 1
Looks like the #anime
channels on Freenode and QuakeNet are the only one with
recent activity.
A: Yes.
Copyright (C) 2020 Rohan Kumar
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see https://www.gnu.org/licenses/.