From 57d6c02da7a7aa74b7789d7871e86533b43106fa Mon Sep 17 00:00:00 2001
From: gennyble <gen@nyble.dev>
Date: Sun, 2 Mar 2025 04:04:29 -0600
Subject: add 'statistics on linux' to words.html

---
 served/words/statistic-gifs.html | 197 ---------------------------------------
 1 file changed, 197 deletions(-)
 delete mode 100644 served/words/statistic-gifs.html

(limited to 'served/words/statistic-gifs.html')
diff --git a/served/words/statistic-gifs.html b/served/words/statistic-gifs.html
deleted file mode 100644
index 583974b..0000000
--- a/served/words/statistic-gifs.html
+++ /dev/null
@@ -1,197 +0,0 @@
----
-template=post
-title=Statistics on Linux with /proc
-style=/styles/post.css
-style=writing.css
-
-published=2025-03-02 4:00am CST
-
-description=I want to tell you how my statistic gifs are made :)
----
-
-<style>
-	.manlink {
-		margin-top: -1rem;
-	}
-</style>
-
-I've been wanting to make a little page for the statistics of my
-webserver <i>(the system not the program)</i>. When I started to
-research the APIs that I'd need, just on a whim one day with no
-intention to start, I got grabbed by it and knew I had to start.
-
-Check it out: <a href="/starlight.html">starlight.html</a>
-
-<h2>a <code>/proc</code> foreword</h2>
-The <code>/proc</code> filesystem, on Linux, is a sort of window into
-the kernel. It lets you view some pretty detailed information by simply
-reading some files (thanks everything-is-a-file linux).
-
-There's a lot of information about it in the man pages.
-They might all be in one big one at <code>man proc</code> but,
-like how they are on my server, they could be broken into separate pages
-for distinct sections.
-
-I have linked the relevant pages at the top of their section. It's a link
-to man7.org, which seems to be <i>the</i> source for Linux Kernel man pages
-on the internet. man7 is linked from kernel.org which lends it
-credibility at least.
-
-<h2>Memory</h2>
-
-<p class="manlink"><a href="https://man7.org/linux/man-pages/man5/proc_meminfo.5.html">man7.org/proc_meminfo</a></p>
-
-This one isn't too hard. I open the file <code>/proc/meminfo</code> and
-look for the lines starting with <code>MemTotal</code> and <code>MemAvailable</code>
-which are the total memory and currently available memory, respectively. They
-are very well named :). For usage, I just subtract available from total.
-
-<h2>Network</h2>
-
-<p class="manlink"><a href="https://man7.org/linux/man-pages/man5/proc_net.5.html">man7.org/proc_net</a></p>
-
-If you <code>cat /proc/net/dev</code> you can see some stats about
-your networking interfaces. This is what I parse, with some pain.
-
-I read the bytes columns from the receive and transmit sections.
-These are total counts of bytes received since boot, so you'll
-have to take two samples and subtract to get the number of bytes
-in some time-span.
-
-Looking at it in the terminal, you might assume that the separator
-between the columns was a tab character. I sure did! It is not a tab,
-but many spaces.
-
-Because of spaces-and-not-tabs
-<i>(not the tabs vs. spaces debate of usual, but with similarities)</i>, it proved
-to be a bit annoying to parse. It made me finally
-pull in a regex crate because I didn't feel like dealing with it
-at the time. Eventually&trade; I want to write a skip-arbitrarily-many-whitespace
-iterator, but for now <code>regex-lite</code> lives in my <code>Cargo.toml</code>.
-
-<h2>CPU</h2>
-
-<p class="manlink"><a href="https://man7.org/linux/man-pages/man5/proc_stat.5.html">man7.org/proc_stat</a></p>
-
-<code>/proc/stat</code> is the least obvious of the triplet. It has more than
-just the CPU's information, but the cpu is what we're after. You'll notice many
-CPU lines probably! I'm using the one starting just "cpu" without a number
-(cpu0, cpu1, etc.) because I only have the 1 core. If I had more than one core
-it'd work similarly, the just-cpu line sums the other ones, but then it could
-show >100% usage 'cause it's per-core usage just added together.
-
-First things uh, second? To summarize from the man page:<br />
-The units of these values are <i>ticks</i>. There are <code>USER_HZ</code>
-ticks per second. On most platforms it's 100 but you can
-check the value for your system with <code>sysconf(_SC_CLK_TCK)</code>.
-
-<details>
-	<summary style="font-style: italic;">small C program to check _SC_CLK_TCK :)</summary>
-	<pre><code>#include &lt;stdio.h&gt;
-#include &lt;unistd.h&gt;
-int main() {
-	printf("USER_HZ is %i", sysconf(_SC_CLK_TCK));
-}</code></pre>
-</details>
-
-But what columns of data do we use? From <a href="https://stackoverflow.com/a/3017438">this stackoverflow answer</a>
-it seems that summing the user, nice, and system columns get you the total ticks.
-The user and system make sense to me, time spent in user and system mode,
-but what on earth is nice? I sure hope it is.
-
-The Internet tells me to check <code>man nice</code>
-(<a href="https://man7.org/linux/man-pages/man1/nice.1.html">man7.org/nice</a>).
-That page says that the
-nicness of a process can be adjusted to change how the kernel schedules
-that process. Making it less nice (down to -20) increases it's priority, and
-increasing it's niceness (up to 19) lowers it. I guess that makes sense. Lowering
-the niceness makes the process greedier and in want of more attention
-from the scheduler? I'm unsure how well that personification tracks to reality, but
-it helped me think about it.
-
-The nice column, then, seems to be the time spent in processes that
-would go in the user column, but they have a different priority and
-I guess differentiating that is important.
-
-Oh, but there might be more columns we want!
-There's <a href="https://stackoverflow.com/a/10794088">another S.O. answer</a>
-that I found while writing this that says the sixth and seventh columns should used
-as well. These are irq/softirq and are time spent servicing
-interrupts. I think it makes sense we'd want that, too.
-
-So you have all these columns&mdash;user, nice, system, irq,
-and softirq&mdash;that add together to give you the total number
-of ticks spent Doing Things since boot, and you have the number
-of ticks in a second. Can you see where I'm going with this?
-
-Yup, take two samples some time span apart, subtract the former
-from the later, and then you have how much time the processor spent
-Doing Things. You can use that and the number of ticks in your time
-span to calculate utilization. Or you just have how much actual time
-The Computer spent Doing Work which is also pretty neat. Maybe you
-can pay it an hourly wage. Is that just AWS?
-
-Something to watch out for:<br />
-apparently the numbers in <code>/proc/stat</code> can overflow and
-wrap back to zero. I don't know what size integers they are so I'm
-unsure how real of a risk that is, but it seemed worth mentioning here.
-
-<h2>So you've parsed the stats, now to graphs!</h2>
-
-My main trouble here was selecting a range that makes sense for
-the data it's representing.
-
-Again, memory was easy. There is a
-total, normally-unchanging amount of RAM, so I just use that as
-the max. Perhaps there's something to be said about zooming further
-in to see the megabyte-by-megabyte variance, but I am much more
-interested in a "how close am I to the ceiling" kind of graph. Like,
-would I hit my head if I jumped? that kind of thing.
-
-The CPU graph, though, that's very variable and a bit spiky.
-I don't <i>really</i> care what the max value was if it's a spike,
-it can go off the top for all I care, what I want to see is the
-typical usage.
-
-If I just ranged to the max then I'd have what I call The Linode
-Problem. I call it that, rather predictably, because that's what
-Linode's graphs do and it makes them kind of useless? Great, I love
-to see that spike up to 100%, but that's <i>all</i> that I can see now.
-
-So instead of max-grabbing, I sort the data and take the value that's
-<i>almost</i> max. My series are 256 samples long, so what this looked
-like was taking the 240th value in the array, getting the closest-highest
-percent, and using that as the top of the range.
-
-This <i>does</i> mean if it's <i>very</i> spiky, I get The Linode Problem
-again, but in that case I'm kind of okay with it. I sample every minute,
-so my 256 pixel long graphs are roughly 4 hours long. If it spikes more
-than 16 times in that period, perhaps that's worth looking into.
-
-Okay, CPU done. Network time! It's the same, pretty much. Where there was
-one line, there are now two. And lots more spikes! I combine the receive
-and transmit series into one <code>vec</code>, sort it, and take the 32nd
-highest value.
-
-I draw the area under the line, too, because it was nigh impossible to see
-the line when it was so.. discontinuous? We get another problem with that,
-though, where the second-drawn line-and-underfill will obscure the one
-drawn first. So then, to not overdraw an entire measurement, I try to draw
-the average-larger one first. Which is to say, I take the average of both
-series separately and draw the one with the bigger average first. That way
-the smaller one will hopefully nestle under the larger, like a baby bird
-hiding from the rain under their parents wing.
-
-<hr class="asterism-dash" />
-
-That's how the range selection works, anyway.
-
-The graphs themselves are drawn on 256x160 gif because i like gif, 256 is
-a good number, and they seem to compress better than png in this use case.
-
-One day I'd love to try and generate alternative text to describe
-the general look of the graph. "The memory usage is steady at 300MB",
-or something like "The network usage is variable, but averages 15.4kbps".
-
-That's it!<br />
-bye :)
\ No newline at end of file
-- 
cgit 1.4.1-3-g733a5