Bandman ([info]bandman) wrote,
@ 2007-02-09 17:22:00
Previous Entry  Add to memories!  Tell a Friend  Next Entry
Very Strange Network Problem
OK, I'm appealing to the greater powers than myself.

I have a network issue that I can't solve, and in fact, I don't even know what the problem is.


The only way I became aware of it was through my graphing of our spam.

At one point in time, we got a lot of spam. I implemented blackhole list checking, and a lot of it went away. I wanted something tangible to show people, so I graphed the spam loss in MRTG. Here is today's graph.



Yesterday is a normal graph. Steadily grows toward midnight, and drops off the next day. No worries. Today's is a bit different.

At 12:00, on the dot, we started receiving much less spam. The graph grows slower, which indicates a much lower flow. I saw this, and it made me curious. Usually SE Asian botnets don't just "disappear", but if there is a large quake that knocks out the power, it's been known to happen. I checked, and there's nothing above a 5.8 in the last day, so that isn't causing the problem.

I got more curious, and decided to check the total number of connections to our mail server, thinking maybe it was broke.

I ran this command line, which prints the number of connections in the left hand column, and the time of day (the hour) in the right. Here's what I got for total number of connections per hour:
root@twohearted:/var/log# cat maillog | grep "Feb  9"  | grep -v "9 0.:" | grep " connect" | awk '{print $3}' | awk -F: '{print $1}' | uniq -c

    699 10
    687 11
    334 12
    324 13
    316 14
    295 15
    347 16
    143 17


Alright, the number of mail connections get cut in half at noon. That's strange. I checked the number of spam catches we got:

root@twohearted:/var/log# cat maillog | grep "Feb  9"  | grep -v "9 0.:" | grep "cbl.abuseat.org" | awk '{print $3}' | awk -F: '{print $1}' | uniq -c
    188 10
    210 11
     40 12
     57 13
     30 14
     36 15
     58 16
     51 17


I'm mystified. The percentage of mail that is spam is ually around 1/2, as you can see in the first table. After the cut, the number of spam messages is 1/4, which is exactly the opposite of what you would expect if half of your email is legitimate, and came from people leaving early on Friday.

I began to wonder if there was something wrong at a lower level, so I wanted to check our DNS server, thinking maybe the mail server is broken or slow or something.

This is a table showing the number of MX lookups per hour. If our mail server was the problem, then there would be no difference at noon:

     41 05
    465 06
    526 07
    496 08
    498 09
    494 10
    522 11
    111 12
     81 13
     66 14
     65 15
     78 16
     24 17


1/5th the amount of MX requests after noon than before. Something much more serious is happening. I look at the number of requests total for the system.

root@smuttynose:/etc/tinydns/log/main# cat \@4000000045cc* current...
    181 05
   1863 06
   2028 07
   1968 08
   1972 09
   1866 10
   1948 11
    355 12
    386 13
    310 14
    296 15
    380 16
    118 17


And at this, I'm speechless. At noon, we lost almost all of our NS requests.

Does anyone have any idea of what this might be?



(9 comments) - (Post a new comment)


[info]kevinbelt
2007-02-10 02:03 am UTC (link)
i don't really know what some of the technical stuff means, but just from looking at the number it's intriguing.

(Reply to this) (Thread)


[info]bandman
2007-02-10 03:33 am UTC (link)
"I don't know the answer but I admire the problem"

:-) I'm right there with you.

(Reply to this) (Parent)


[info]devin_x
2007-02-10 03:08 am UTC (link)
Noon, you say? ... *flips through calendar*

Sunspots.

(Reply to this) (Thread)


[info]bandman
2007-02-10 03:34 am UTC (link)
That's got to be it! I'll carry my disks across campus in aluminum foil. Problem Solved!

(Reply to this) (Parent)


[info]dethknyte
2007-02-11 04:50 am UTC (link)
is it possible that some sort of packet filtering was turned on upstream from you? and the asian botnets could still be a little screwed up. i remember reading something about how it would take months to repair all the fiber that connects to Asia after the last quake or whatever the hell happened that wiped out their connection.

(Reply to this) (Thread)


[info]bandman
2007-02-11 02:40 pm UTC (link)
I originally thought it would be our pipe, but I got on totalfark, and people all over the world were seeing it too. I suspect another attack on the top level root servers.

(Reply to this) (Parent)(Thread)


[info]nuhir
2007-02-11 09:14 pm UTC (link)
I was about to suggest that. Or could it quite possibly be that Asian countries are starting to crack down on botnets in their domains? ..... .... .. pfft... LOL!! Yea, I couldn't keep a straight face either.

(Reply to this) (Parent)

Issue
(Anonymous)
2007-02-14 04:36 am UTC (link)
I spoke with you the other day about this. Here is what the responses were of my more mail server sensitive friends were:

"Could be a server problem, not just a mailserver problem. Has he looked at the uptime on the server itself? What about other system logs? Do other parts of the system reflect the dip? Have any of the interfaces been reset? Can he look at the packet I/O over time? Could be the mailserver or NS software was sighupped. Did his upstream provider do any re-routing?

On this last point, our old ISP once did a routing change that effectively disabled our internet access. It took them a day or so to realize that no, the line wasn't down, they had just accidentally ( Evil or Very Mad ) dead ended us in a routing table."

and

"Questions:

Are his DNS and MX's on the same subnet?

Are his primary & secondary DNS on different subnets?

Are other systems on the same subnets as above, or on a separate segment?

Is onward/upstream connectivity for this/these other segment(s) affected at all?

More to the point, could he actually provide some detail to the problem, or should we just guess what's in his logs? A hint of what OS and equipment is having problems might be useful too.. y'know, just as thought.

FWIW, on the basis of sac-all squared info, it doesn't feel like a problem with the machines, rather more "something upstream".."

That's about all I have for you. Sorry I couldn't help out more with your problem.


(Reply to this) (Thread)

Re: Issue
(Anonymous)
2007-02-14 04:39 am UTC (link)
Forgive the first sentence, it should say, "I spoke with you the other day about this issue. Here are the responses of some people that work more frequently with mail servers."

I need sleep.

(Reply to this) (Parent)


(9 comments) - (Post a new comment)

Create an Account
Forgot your login or password?
Login w/ OpenID
English • Español • Deutsch • Русский…