I have a network issue that I can't solve, and in fact, I don't even know what the problem is.
The only way I became aware of it was through my graphing of our spam.
At one point in time, we got a lot of spam. I implemented blackhole list checking, and a lot of it went away. I wanted something tangible to show people, so I graphed the spam loss in MRTG. Here is today's graph.

Yesterday is a normal graph. Steadily grows toward midnight, and drops off the next day. No worries. Today's is a bit different.
At 12:00, on the dot, we started receiving much less spam. The graph grows slower, which indicates a much lower flow. I saw this, and it made me curious. Usually SE Asian botnets don't just "disappear", but if there is a large quake that knocks out the power, it's been known to happen. I checked, and there's nothing above a 5.8 in the last day, so that isn't causing the problem.
I got more curious, and decided to check the total number of connections to our mail server, thinking maybe it was broke.
I ran this command line, which prints the number of connections in the left hand column, and the time of day (the hour) in the right. Here's what I got for total number of connections per hour:
root@twohearted:/var/log# cat maillog | grep "Feb 9" | grep -v "9 0.:" | grep " connect" | awk '{print $3}' | awk -F: '{print $1}' | uniq -c
699 10
687 11
334 12
324 13
316 14
295 15
347 16
143 17
Alright, the number of mail connections get cut in half at noon. That's strange. I checked the number of spam catches we got:
root@twohearted:/var/log# cat maillog | grep "Feb 9" | grep -v "9 0.:" | grep "cbl.abuseat.org" | awk '{print $3}' | awk -F: '{print $1}' | uniq -c
188 10
210 11
40 12
57 13
30 14
36 15
58 16
51 17
I'm mystified. The percentage of mail that is spam is ually around 1/2, as you can see in the first table. After the cut, the number of spam messages is 1/4, which is exactly the opposite of what you would expect if half of your email is legitimate, and came from people leaving early on Friday.
I began to wonder if there was something wrong at a lower level, so I wanted to check our DNS server, thinking maybe the mail server is broken or slow or something.
This is a table showing the number of MX lookups per hour. If our mail server was the problem, then there would be no difference at noon:
41 05
465 06
526 07
496 08
498 09
494 10
522 11
111 12
81 13
66 14
65 15
78 16
24 17
1/5th the amount of MX requests after noon than before. Something much more serious is happening. I look at the number of requests total for the system.
root@smuttynose:/etc/tinydns/log/main# cat \@4000000045cc* current...
181 05
1863 06
2028 07
1968 08
1972 09
1866 10
1948 11
355 12
386 13
310 14
296 15
380 16
118 17
And at this, I'm speechless. At noon, we lost almost all of our NS requests.
Does anyone have any idea of what this might be?
February 10 2007, 02:03:57 UTC 5 years ago
February 10 2007, 03:33:39 UTC 5 years ago
:-) I'm right there with you.
February 10 2007, 03:08:04 UTC 5 years ago
Sunspots.
February 10 2007, 03:34:13 UTC 5 years ago
February 11 2007, 04:50:58 UTC 5 years ago
February 11 2007, 14:40:07 UTC 5 years ago
February 11 2007, 21:14:05 UTC 5 years ago
Anonymous
February 14 2007, 04:36:48 UTC 5 years ago
Issue
I spoke with you the other day about this. Here is what the responses were of my more mail server sensitive friends were:"Could be a server problem, not just a mailserver problem. Has he looked at the uptime on the server itself? What about other system logs? Do other parts of the system reflect the dip? Have any of the interfaces been reset? Can he look at the packet I/O over time? Could be the mailserver or NS software was sighupped. Did his upstream provider do any re-routing?
On this last point, our old ISP once did a routing change that effectively disabled our internet access. It took them a day or so to realize that no, the line wasn't down, they had just accidentally ( Evil or Very Mad ) dead ended us in a routing table."
and
"Questions:
Are his DNS and MX's on the same subnet?
Are his primary & secondary DNS on different subnets?
Are other systems on the same subnets as above, or on a separate segment?
Is onward/upstream connectivity for this/these other segment(s) affected at all?
More to the point, could he actually provide some detail to the problem, or should we just guess what's in his logs? A hint of what OS and equipment is having problems might be useful too.. y'know, just as thought.
FWIW, on the basis of sac-all squared info, it doesn't feel like a problem with the machines, rather more "something upstream".."
That's about all I have for you. Sorry I couldn't help out more with your problem.
Anonymous
February 14 2007, 04:39:29 UTC 5 years ago
Re: Issue
Forgive the first sentence, it should say, "I spoke with you the other day about this issue. Here are the responses of some people that work more frequently with mail servers."I need sleep.