Troubleshooting connection loss (continued)

This is a discussion on Troubleshooting connection loss (continued) within the Linux Networking forums, part of the Linux Forums category; On Mon, 12 Nov 2007 04:46:20 GMT, Allen Weiner wrote: > Bit Twister wrote: > >> Are ...


Go Back   Usenet Forums > Linux Forums > Linux Networking

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #31 (permalink)  
Old 11-12-2007
Bit Twister
 
Posts: n/a
Default Re: Troubleshooting connection loss (continued)

On Mon, 12 Nov 2007 04:46:20 GMT, Allen Weiner wrote:
> Bit Twister wrote:
>
>> Are the majority of the disconnects happening "approximately 2 hours"
>> after the modem is powered up?

>
> It seems that way. But I haven't kept a log book.


My guess, is there might be a loose connection inside the modem.
You power up, about 2hrs later, heat causes the problem. little while
later the heat makes the connection go back together.
Imagin a loose sodder connection on a pin. Sorry for the bad graphics.

cold connection (* ) works
warm connection ( * ) breaks
warmer connection ( *) working again

connection in this context is physical connection.

> So, I doubt that that strange first line with the leading semicolon is
> causing a problem.


Well I am happy, you have learned all you need to know.
Guess we are done.

Here is a present to play with.
--------------- script starts below this line ----------------
#!/bin/bash
#************************************************* ****************
#*
#* ck_connection - Check internet connection.
#*
#*
#* Install procedure:
#* Save into a file named ck_connection
#* actual location should be somewhere in $PATH
#* chmod +x ck_connection
#*
#*
#* Code walks through the png array to test each point
#* in the path to/though the internet. DNS are also tested.
#*
#* You will need to modify the script to use system's gateway
#* and insert the ISP's gateway value.
#*
#* You may have to get into the modem's web page to find
#* the modem's gateway (ISP's gateway) for the modem.
#*
#* Depending on your distribution, the $(hostname -s) and
#* $(hostname) may need changing.
#*
#* On Mandriva linux hostname returns the FQDN and
#* hostname -s returns the short name for the node.
#*
#************************************************* ****************

function net_info {
cat <<EOF
There are settings which define where and what for DNS search order.
In the following, I'll give commands, results and maybe comments.
The command line starts with a $ so you can tell it from results and
my comments. You do not use the leading $ when you run the command.

You can get more help about the command with
man first_word_here
Example: you would do a man grep to get grep command manual.

The commands and example values follow:

$ grep hosts: /etc/nsswitch.conf
hosts: files dns nis

For speed, mine has
hosts: files dns

$ grep -v '^#' /etc/host.conf
order hosts,bind
multi on
nospoof on
spoofalert on

$ grep -v '^#' /etc/resolv.conf
nameserver 192.168.0.0
nameserver 0.238.0.12
nameserver 0.203.0.86

For speed improvements, I alwasy remove any search or domain lines.
Do not use the above numbers on your system. They are examples only.
If a nameserver fails to return anything, the next server is tried.
Because of that, I like to have the last server to be my ISP's public DNS

For routing check, there is
$ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
192.168.1.0 0.0.0.0 255.255.255.0 U 10 0 0 eth0
0.0.0.0 192.168.1.1 0.0.0.0 UG 10 0 0 eth0

In the above, UG in the Flags column indicate that line will be used
as the default Gateway route to ip addresses that can not be routed
via the lines above it.

The ip address in the Gateway column is where that traffic is sent.
If you can ping that address, you know that device is alive and
packets are leaving your node.

$ ifconfig
will allow you to see the ip address assigned to your nic and allow you
to check if you are getting unreasonable counts for errors, dropped,
overruns, frame, carrier and collisions.

If you want to check internet speeds to somewhere, Example:
$ traceroute -n yahoo.com

Some nodes drop those trace packets, so you may want to use
$ traceroute -In yahoo.com

For dns testing there is something like
$ dig google.com @isp_name_server1

You will get information about how isp_name_server1 performed
researching google.com lookup .

EOF
} # end net_info


#********************************************
#*
#* The following are not acutal checks
#* The comment box is about what the ping value
#* will be used to make what check/verification.
#*
#* You will need to make changes to match your setup.
#* If you want to skip a test you either put
#* 127.0.0.1 in the png[x] test to skip.
#*
#* Or you delete the png[] and msg[] lines,
#* and renumber them to keep the numbers continuous
#* through the png[12]="done" line.
#*
#* NOTE:
#* The png[12]="done" line has to remain and
#* must be the last one in the png array.
#*
#* When renumbering, check the msg[] text to verify
#* if there is a png[] value used in the text.
#*
#* You will also have to fix the code whcih
#* uses png[9].
#*
#********************************************


#********************************************
#* check ping works on the node
#********************************************

png[1]="127.0.0.1"
msg[1]="$(hostname -s) problem,
No idea where to look, I never had the problem
"
#********************************************
#* check dns on my node
#********************************************

png[2]="localhost"
msg[2]="Check $(hostname -s) /etc/hosts localhost line.
I assume you have a line like
127.0.0.1 localhost.localdomain localhost
man hosts for more info"

#********************************************
#* check pinging my ip address works
#********************************************

png[3]="192.168.1.130"
msg[3]="Check $(hostname -s) /etc/hosts $(hostname) ip addy.
I assume you have a line like
192.168.1.130 $(hostname) $(hostname -s)
man hosts for more info"

#********************************************
#* check dns reads my /etc/hosts by full name
#********************************************

png[4]="$(hostname)"
msg[4]="Check $(hostname -s) /etc/hosts $(hostname) line.
I assume you have a line like
192.168.1.130 $(hostname) $(hostname -s)
man hosts for more info"

#********************************************
#* check dns reads my /etc/hosts by alias
#********************************************

png[5]="$(hostname -s)"
msg[5]="Check $(hostname -s) /etc/hosts $(hostname) line for an alias.
I assume you have a line like
192.168.1.130 $(hostname) $(hostname -s)
man hosts for more info"

#********************************************
#* check my gatway device is alive
#********************************************

png[6]="192.168.1.1"
msg[6]="Check physical connection to next device to internet (gateway).
run mii-tool -v eth0
or ethtool eth0
You are looking for link ok line
or Link detected: yes depending on which tool used
run route -n to verify you have a UG Flags line
$(net_info)"

#********************************************
#* check my gatway alias in /etc/hosts
#********************************************

png[7]="router"
msg[7]="Check $(hostname -s) /etc/hosts router line
I assume you have a
192.168.1.1 router line
man hosts for more info
$(net_info)"


#********************************************
#* check my ISP's gateway connected to router
#********************************************

png[8]="71.252.137.1"
msg[8]="Check leds on internet device.
poweroff internet device (adsl/cable modem)
wait 30 seconds by watch/clock to let capacitors discharge
and reset device
power up, wait for leds to settle down
run service network restart
Leds not right, check wiring out to telephone pole
call your ISP
$(net_info)"


#********************************************
#* check if DNS server is alive
#********************************************

_dns_ip=9
png[$_dns_ip]="192.168.1.1"
msg[$_dns_ip]="Check $(hostname -s) /etc/resolv.conf nameserver line
You will have to check the device which has the name server running.
Your internet device (adsl/cable modem your dns server)
If none of the above, ${png[$_dns_ip]} is down
Work around, change namesever ip_here to a public nameserver
in /etc/resolv.conf
man resolv.conf for more info
$(net_info)"


#********************************************
#* check ISP can route to yahoo.com
#********************************************

png[10]="66.94.234.13"
msg[10]="cannot ping yahoo by ip address
yahoo.com is down or ip address changed.
check google.com with ping -c1 72.14.207.99
If that fails, google.com is down or ip address changed
or it is an ISP/internet problem
$(net_info)"


#********************************************
#* check DNS can resolve yahoo.com
#********************************************

png[11]="yahoo.com"
msg[11]="Cannot ping yahoo.com by name
yahoo.com just went down, or dns is broke on your ISP or somewhere else.
$(net_info)"


png[12]="done"
msg[12]="Last array element to tell while loop we are done pinging"

#********************************************
#* Actual testing starts here
#********************************************

#********************************************
#* get the first dns server from /etc/reso.conf
#********************************************

set -- $(grep nameserver /etc/resolv.conf | grep -v '^#' | head -1)
_ip=$2
if [ -z "$_ip" ] ; then
echo "/etc/resolv.conf does not have a nameserver line.
man resolv.conf
for more information"
exit 1
else
pgn[$_dns_ip]=$_ip
fi

#********************************************
#* loop through all ip/name tests
#********************************************


i=1
while [ "${png[$i]}" != "done" ] ; do
echo "running ping -c 1 -w 3 ${png[$i]} "
ping -c 1 -w 3 ${png[$i]} > /dev/null
if [ $? -ne 0 ] ; then
/bin/echo -e "\nFailure: ping -c 1 -w 3 ${png[$i]} "
/bin/echo -e "${msg[$i]} "
exit 1
fi
i=$i+1
done

#********************************************
#* loop through all nameservers in /etc/resov.conf
#********************************************

while read line
do
set -- $line
_ip=$2
if [ "$1" = "nameserver" ] ; then
echo "running ping -c 1 -w 3 $_ip "
ping -c 1 -w 3 $_ip > /dev/null
if [ $? -ne 0 ] ; then
/bin/echo -e "\nDNS nameserver Failure: ping -c 1 -w 3 $_ip "
echo "nameserver $_ip in /etc/resolv.conf is not responding to pings."
echo "$(net_info)"
exit 1
fi
fi

done < /etc/resolv.conf

#********* end ck_connection **********************************
Reply With Quote
  #32 (permalink)  
Old 11-12-2007
Floyd L. Davidson
 
Posts: n/a
Default Re: Troubleshooting connection loss (continued)

Allen Weiner <alweiner7@hotmail.com> wrote:
>Bit Twister wrote:
>>> ======== grep -v '^#' /etc/resolv.conf ==========
>>> ; generated by /sbin/dhclient-script
>>> search myhome.westell.com
>>> nameserver 192.168.1.1
>>> nameserver 192.168.1.1

>> I realy, realy, realy, realy, realy, want you to
>> do a echo "nameserver 192.168.1.1" > /etc/resolv.conf
>> Hopping the ; is causing the restart hang and the
>> SUGGESTION will fix
>> your problem.
>> Not doing the SUGGESTION, will force me to place you
>> in my kill file.
>>

>That's your choice. I did a Google search on resolv.conf
>& generated. I saw several examples similar to
>mine. Here's one:


Here's a better one... Download virtually any source code
to libc, and look in the .../resolv/res_init.c file for
this code:

if ((fp = fopen(_PATH_RESCONF, "r")) != NULL) {
/* read the config file */
while (fgets_unlocked(buf, sizeof(buf), fp) != NULL) {
/* skip comments */
if (*buf == ';' || *buf == '#')
continue;
/* read default domain name */
if (MATCH(buf, "domain")) {

What that is doing is reading the /etc/resolv.conf file, and
skipping any line that begins with either ';' or '#'.

Personally, I would fault it for not initially removing all
leading white space, but....

--
Floyd L. Davidson <http://www.apaflo.com/floyd_davidson>
Ukpeagvik (Barrow, Alaska) floyd@apaflo.com
Reply With Quote
  #33 (permalink)  
Old 11-12-2007
Allen Weiner
 
Posts: n/a
Default Re: Troubleshooting connection loss (continued)

Floyd L. Davidson wrote:
> Allen Weiner <alweiner7@hotmail.com> wrote:
>> Bit Twister wrote:
>>>> ======== grep -v '^#' /etc/resolv.conf ==========
>>>> ; generated by /sbin/dhclient-script
>>>> search myhome.westell.com
>>>> nameserver 192.168.1.1
>>>> nameserver 192.168.1.1
>>> I realy, realy, realy, realy, realy, want you to
>>> do a echo "nameserver 192.168.1.1" > /etc/resolv.conf
>>> Hopping the ; is causing the restart hang and the
>>> SUGGESTION will fix
>>> your problem.
>>> Not doing the SUGGESTION, will force me to place you
>>> in my kill file.
>>>

>> That's your choice. I did a Google search on resolv.conf
>> & generated. I saw several examples similar to
>> mine. Here's one:

>
> Here's a better one... Download virtually any source code
> to libc, and look in the .../resolv/res_init.c file for
> this code:
>
> if ((fp = fopen(_PATH_RESCONF, "r")) != NULL) {
> /* read the config file */
> while (fgets_unlocked(buf, sizeof(buf), fp) != NULL) {
> /* skip comments */
> if (*buf == ';' || *buf == '#')
> continue;
> /* read default domain name */
> if (MATCH(buf, "domain")) {
>
> What that is doing is reading the /etc/resolv.conf file, and
> skipping any line that begins with either ';' or '#'.
>
> Personally, I would fault it for not initially removing all
> leading white space, but....
>

Thanks very much Floyd for your reply. I'm a Linux novice and am a long
way from having the savvy to do what you did.

By the way, for many years I subscribed to comp.dcom.modems. I always
found your posts highly informative. I'm really astounded by how much
more function my small Westell DSL modem/router has than my old USR
dial-up modem.
Reply With Quote
  #34 (permalink)  
Old 11-12-2007
Allen Weiner
 
Posts: n/a
Default Re: Troubleshooting connection loss (continued)

Bit Twister wrote:
> On Mon, 12 Nov 2007 04:46:20 GMT, Allen Weiner wrote:
>> Bit Twister wrote:
>>


>
> My guess, is there might be a loose connection inside the modem.
> You power up, about 2hrs later, heat causes the problem. little while
> later the heat makes the connection go back together.
> Imagin a loose sodder connection on a pin. Sorry for the bad graphics.
>
> cold connection (* ) works
> warm connection ( * ) breaks
> warmer connection ( *) working again
>
> connection in this context is physical connection.


If that is the problem, the broken connection must be short-lived,
because without fail, the moment I reboot, My Internet connection is
restored.

So let's assume there is a momentary connection loss. The next time it
occurs, what troubleshooting steps can I perform to determine why
"service network restart" hangs?

We're saying the problem is local, so there is no point in trying to
verify DNS, or ping outside servers.
>
>> So, I doubt that that strange first line with the leading semicolon is
>> causing a problem.

>
> Well I am happy, you have learned all you need to know.
> Guess we are done.
>

The post in this thread by Floyd Davidson should close the issue. We
ought to be done pursuing the angle that there is a DHCP problem. What
would be worthwhile to me is a troubleshooting procedure for the
"service network restart" hang that is not predicated on a DHCP problem.


> Here is a present to play with.


Thanks. But that isn't applicable to diagnosing the hang of "sewrvice
network restart".
Reply With Quote
  #35 (permalink)  
Old 11-12-2007
Bit Twister
 
Posts: n/a
Default Re: Troubleshooting connection loss (continued)

On Mon, 12 Nov 2007 18:46:26 GMT, Allen Weiner wrote:
>
> If that is the problem, the broken connection must be short-lived,
> because without fail, the moment I reboot, My Internet connection is
> restored.


Hehe, think about it, router chip connection opens, software goes
insane and quits working for your internet, sometime later you notice
connection drop, start process of restart. Plenty of time for the
metal to keep expanding to the other side of the hols. Those hole are
pretty tight. Not to mention the chips that are just laided on the
board and soldered.

>
> So let's assume there is a momentary connection loss. The next time it
> occurs, what troubleshooting steps can I perform to determine why
> "service network restart" hangs?


You already know how to troubleshoot to which component is not working.
You refuse to do the three things I want done rule out possible and
get more information.

It was bad enough to have to work under the hood of your car through
the tail pipe, now that you have tied my hands, I can not help you
with that problem. :-P


> The post in this thread by Floyd Davidson should close the issue.


Saw that and your reply. Had to laugh, you just got your feet wet with
scripting in bash. Floyd's post showd the C or C++ (I forget which)
which is another programming language if you want to drill that far
down to learn what is going on.


> We ought to be done pursuing the angle that there is a DHCP problem.


I THINK so, but you will not let me rule that out. :-(

> What would be worthwhile to me is a troubleshooting procedure for the
> "service network restart" hang that is not predicated on a DHCP problem.


Make my 3 SUGGESTIONS, and see if the problem goes away while in a
static ip setup.


>> Here is a present to play with.

>
> Thanks. But that isn't applicable to diagnosing the hang of "sewrvice
> network restart".


True, just a nice script to know what is not working next time connection drops.

By the way, here is the lastest one with info on more network trouble
shooting commands and prints out what is being tested at each point.
Save/run in your user accout. Does not require root privs to run.
Run as is and I think it should fail on testing ISP gateway to modem.
It will give the number of the array to modify with your modems value.

#!/bin/bash
#************************************************* ****************
#*
#* ck_connection - Check internet connection.
#*
#* Install procedure:
#* Save into a file named ck_connection
#* actual location should be somewhere in $PATH
#* chmod +x ck_connection
#*
#*
#* Code walks through the png array to test each point
#* in the path to/though the internet. DNS are also tested.
#*
#* You will need to modify the script to use node's gateway
#* ip in png[$_gate_loc], Usually your modem's ip.
#* and insert the ISP's gateway value at png[8]
#*
#* You may have to get into the modem's web page to find
#* the modem's gateway (ISP's gateway) for the modem.
#* If you cannot find it, just change png[8] to 127.0.0.1
#*
#* Depending on your distribution, the $(hostname -s) and
#* $(hostname) may need changing.
#*
#* On Mandriva linux hostname returns the FQDN and
#* hostname -s returns the short name for the node.
#*
#************************************************* ****************

if [ $# -gt 0 ] ; then
_arg2=$1
fi

function net_info {

if [ -z "$_arg2" ] ; then
echo "$0 hints will give you more research tools/info"
return
fi

cat <<EOF

Note: just because you can ping a server does not mean
it is serving up what it is supposed to be serving. :(

There are settings which define where and what DNS search order.
In the following, I'll give commands, results and maybe comments. The
command line starts with a $ so you can tell command linefrom results
and my comments. You do not use the leading $ when you run the command.

You can get more help about the command with
man first_word_here
Example: you would do a man grep to get grep command manual.

The commands and example values follow:

$ grep hosts: /etc/nsswitch.conf
hosts: files dns nis

For speed, mine has
hosts: files dns

$ grep -v '^#' /etc/host.conf
order hosts,bind
multi on
nospoof on
spoofalert on

$ grep -v '^#' /etc/resolv.conf
nameserver 192.168.0.0
nameserver 0.238.0.12
nameserver 0.203.0.86

For speed improvements, I alwasy remove any search or domain lines.
Do not use the above numbers on your system. They are examples only.
If a nameserver fails to return anything, the next server is tried.
Because of that, I like to have the last server to be my ISP's public DNS

For routing check, there is
$ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
192.168.1.0 0.0.0.0 255.255.255.0 U 10 0 0 eth0
0.0.0.0 192.168.1.1 0.0.0.0 UG 10 0 0 eth0

In the above, UG in the Flags column indicate that line will be used
as the default Gateway route to ip addresses that can not be routed
via the lines above it.

The ip address in the Gateway column is where that traffic is sent.
If you can ping that address, you know that device is alive and
packets are leaving your node.

$ ifconfig
will allow you to see the ip address assigned to your nic and allow you
to check if you are getting unreasonable counts for errors, dropped,
overruns, frame, carrier and collisions.

If you want to check internet speeds to somewhere, Example:
$ traceroute -n yahoo.com

Some nodes drop those trace packets, so you may want to use
$ traceroute -In yahoo.com

For dns testing there is something like
$ dig google.com @isp_name_server1

You will get information about how isp_name_server1 performed
researching google.com lookup .

EOF
} # end net_info


#********************************************
#*
#* You will need to make changes to match your setup.
#* Read script header for details
#* If you want to skip a test you either put
#* 127.0.0.1 in the png[x] test to skip.
#*
#* Or you delete the png[], tst[] and msg[] lines,
#* and renumber them to keep the numbers continuous
#* through the png[12]="done" line.
#*
#* NOTE:
#* The png[12]="done" line has to remain and
#* must be the last one in the png array.
#*
#* When renumbering, check the msg[] text to verify
#* if there is a png[] value used in the text.
#*
#********************************************


png[1]="127.0.0.1"
tst[1]="that ping is working on $(hostname -s) "
msg[1]="$(hostname -s) problem,
No idea where to look, I never had the problem
"

png[2]="localhost"
tst[2]="that resolver reads /etc/hosts "
msg[2]="Check $(hostname -s) /etc/hosts localhost line.
I assume you have a line like
127.0.0.1 localhost.localdomain localhost
man hosts for more info"


png[3]="192.168.1.130"
tst[3]="nic access by ip address"
msg[3]="Check $(hostname -s) /etc/hosts $(hostname) ip addy.
I assume you have a line like
192.168.1.130 $(hostname) $(hostname -s)
man hosts for more info"


png[4]="$(hostname)"
tst[4]="that resolver reads /etc/hosts by full name "
msg[4]="Check $(hostname -s) /etc/hosts $(hostname) line.
I assume you have a line like
192.168.1.130 $(hostname) $(hostname -s)
man hosts for more info"


png[5]="$(hostname -s)"
tst[5]="that resolver reads /etc/hosts by alias "
msg[5]="Check $(hostname -s) /etc/hosts $(hostname) line for an alias.
I assume you have a line like
192.168.1.130 $(hostname) $(hostname -s)
man hosts for more info"

#********************************************
#* Script fills in real value in later.
#********************************************

_gate_loc=6
png[$_gate_loc]="192.168.1.1"
tst[$_gate_loc]="that $(hostname -s) gateway is alive "
msg[$_gate_loc]="Check connection to next device to internet (gateway).
run mii-tool -v eth0
or ethtool eth0
You are looking for link ok
or Link detected: yes
depending on which tool used. run
route -n
to verify you have a UG in the Flags column of the last line
$(net_info)"


png[7]="gateway"
tst[7]="if gateway alias works via /etc/hosts "
msg[7]="Check $(hostname -s) /etc/hosts gateway line
I assume you have added a
192.168.1.1 gateway
line to /etc/hosts

That lets you do a quick test by doing a
ping -c1 router
at a terminal
man hosts for more info
$(net_info)"

#********************************************
#* Look in modem's web page or dhcp leases file.
#********************************************

png[8]="71.252.137.1"
tst[8]="modem talks to ISP gateway "
msg[8]="Check leds on internet device.
poweroff internet device (adsl/cable modem)
wait 30 seconds by watch/clock to let capacitors discharge
and reset device
power up, wait for leds to settle down
run service network restart
Leds not right, check wiring out to telephone pole
call your ISP
$(net_info)"


#********************************************
#* Script fill in real value from /etc/resolv.conf
#********************************************

_dns_loc=9
png[$_dns_loc]="127.0.0.1"
tst[$_dns_loc]="if DNS server is alive "
msg[$_dns_loc]="Check $(hostname -s) /etc/resolv.conf nameserver line
You will have to check the device which has the name server running.
Your internet device (adsl/cable modem your dns server)
If none of the above, ${png[$_dns_loc]} is down
Work around, change namesever ip_here to a public nameserver
in /etc/resolv.conf
man resolv.conf for more info
$(net_info)"


png[10]="66.94.234.13"
tst[10]="that ISP can route to yahoo.com "
msg[10]="cannot ping yahoo by ip address
yahoo.com is down or ip address changed.
check google.com with ping -c1 72.14.207.99
If that fails, google.com is down or ip address changed
or it is an ISP/internet problem
$(net_info)"


png[11]="yahoo.com"
tst[11]="ISP can get a DNS resolve yahoo.com"
msg[11]="Cannot ping yahoo.com by name
yahoo.com just went down, or dns is broke on your ISP or somewhere else.
$(net_info)"


png[12]="done"
tst[12]="We never use this because png done is "
msg[12]="last array element to tell while loop we are done pinging"

#********************************************
#*
#* Actual testing starts here
#*
#********************************************

tput clear
#********************************************
#* get/save the first dns server from /etc/resov.conf
#********************************************

set -- $(grep nameserver /etc/resolv.conf | grep -v '^#' | head -1)
_ip=$2
if [ -z "$_ip" ] ; then
echo "/etc/resolv.conf does not have a nameserver line.
man resolv.conf
for more information
If using dhcp, resolv.conf is updated by contents of leases file,
depending on which dhcp client being used.
locate leases | grep var/
should find it.
I assume you have mlocate or slocate installed so you can use the
locate command.

Going to use ${pgn[$_dns_loc]=$_ip} to make test run farther to
help find the failure.

Press any key to continue
"
read -n 1
exit 1
else
pgn[$_dns_loc]=$_ip
fi

#********************************************
#* get/save the gateway ip address
#********************************************

set -- $(route -n | grep 'UG' | tail -1)
_ip=$2
if [ -z "$_ip" ] ; then
echo "no default gateway line found in
route -n
results. Expected to see last line something like
0.0.0.0 192.168.1.1 0.0.0.0 UG 10 0 0 eth0
that UG line is missing which can be because the network did not
come up correctly. Usually a dhcp access problem.
using ${png[$_gate_loc]}

Press any key to continue
"
read -n 1
else
png[$_gate_loc]=$_ip
fi

#********************************************
#* loop through all ip/name tests
#********************************************


i=1
while [ "${png[$i]}" != "done" ] ; do
echo "$i Test ${tst[$i]}"
ping -c 1 -w 3 ${png[$i]} > /dev/null
if [ $? -ne 0 ] ; then
/bin/echo -e "\nFailure: ping -c 1 -w 3 ${png[$i]} "
/bin/echo -e "${msg[$i]} "
exit 1
fi
i=$(( $i + 1 ))
done

#********************************************
#* loop through all nameservers in /etc/resov.conf
#********************************************

while read line
do
set -- $line
_ip=$2
if [ "$1" = "nameserver" ] ; then
echo "Test /etc/resolv.conf nameserver $_ip is alive"
ping -c 1 -w 3 $_ip > /dev/null
if [ $? -ne 0 ] ; then
/bin/echo -e "\nDNS nameserver Failure: ping -c 1 -w 3 $_ip "
echo "nameserver $_ip in /etc/resolv.conf is not responding to pings."
echo "$(net_info)"
exit 1
fi
fi

done < /etc/resolv.conf

echo " "
echo "Basic network connectivity is working to yahoo.com"
echo " "

#********* end ck_connection **********************************
Reply With Quote
  #36 (permalink)  
Old 11-15-2007
Allen Weiner
 
Posts: n/a
Default Re: Troubleshooting connection loss (continued)

Allen Weiner wrote:
> Bit Twister wrote:

<snip>

>
> So let's assume there is a momentary connection loss. The next time it
> occurs, what troubleshooting steps can I perform to determine why
> "service network restart" hangs?
>


The "service network restart" hangs after eth0 is closed down.

It seems to me that an effective troubleshooting approach to isolate the
hang would be to put hooks in the scripts that "service network restart"
invokes. But being a Linux novice, I'd prefer not play with the
networking scripts (although I could make backups).

Another possible approach to isolating the hang that avoids modifying
networking scripts would be to turn on strace from the terminal before
issuing "service network restart". To cut down on strace output, it
would be even better to turn on strace after eth0 is closed down. I have
no idea how to do this. Suggestions would be appreciated.
Reply With Quote
  #37 (permalink)  
Old 11-15-2007
Bit Twister
 
Posts: n/a
Default Re: Troubleshooting connection loss (continued)

On Thu, 15 Nov 2007 03:11:31 GMT, Allen Weiner wrote:
>
> The "service network restart" hangs after eth0 is closed down.


Well, WE will not be working that problem, unless you take my
suggestions as to what config files are to look like.


> It seems to me that an effective troubleshooting approach to isolate the
> hang would be to put hooks in the scripts that "service network restart"
> invokes.


Hehe, I spent a day in those 1 or two years ago.
What I had to do was create 8 desktops, pretty near each desktop had 3
or 4 terminals up, 1 term following the code, another to see config files,
another to hunt down man pages and doucments, ..

When a script would call another script, I would open it in another desktop
so I could keep drilling down reading code. When I finally hit the
bottom of the script, I would go back to the desktop which called the script.

> But being a Linux novice, I'd prefer not play with the
> networking scripts (although I could make backups).


Sounds good in theory, takes a very methodical, conscientious person
to make that work, and you better damn well know your backups are good.

That is why a multi-boot system, with selection to boot a copy of your
"Production Install" is handy for screwing with system scripts that
could hurt you. :-D

> Another possible approach to isolating the hang that avoids modifying
> networking scripts would be to turn on strace from the terminal before
> issuing "service network restart".


Never tried it, but pretty sure trying to do a
strace /etc/init.d/network restart is not going to work. :)

> To cut down on strace output, it
> would be even better to turn on strace after eth0 is closed down. I have
> no idea how to do this. Suggestions would be appreciated.


You would do a service network stop,
enable your tracing, then do the service network start.

Restart is just an easy call to stop/start.

FYI: I assume you are always logged into a user account, not root.
When you need root privs, you click up a terminal and su - root
as a security percation.

For debugging scripts, I find playing with the set command can help.
I would like you to click up a terminal and add
set -xv
to the first line of .bash_profile, save exit.
Now do the following command

su - $USER

exit
Up Arrow
and change set -xv to set -x, save exit
Up Arrow

exit

Up Arrow
and remove the set line.

Reply With Quote
  #38 (permalink)  
Old 11-15-2007
Allen Weiner
 
Posts: n/a
Default Re: Troubleshooting connection loss (continued)

Bit Twister wrote:
> On Thu, 15 Nov 2007 03:11:31 GMT, Allen Weiner wrote:
>> The "service network restart" hangs after eth0 is closed down.

>
> Well, WE will not be working that problem, unless you take my
> suggestions as to what config files are to look like.
>

I did change the hosts file.

My dhclient-eth0.leases has not changed in the past week. Lease expires
on 11/7. DHCP isn't being invoked.

Suppose either the leases file or the resolv.conf was causing the
problem. Should that cause "service network restart" to hang?
>


>
> Hehe, I spent a day in those 1 or two years ago.
> What I had to do was create 8 desktops, pretty near each desktop had 3
> or 4 terminals up, 1 term following the code, another to see config files,
> another to hunt down man pages and doucments, ..
>

It's interesting and discouraging to hear of your experience. It would
be interesting to hear what troubleshooting technique you use for this
situation.
>


>
> Never tried it, but pretty sure trying to do a
> strace /etc/init.d/network restart is not going to work. :)
>

Could you elaborate on why this won't work?


> You would do a service network stop,
> enable your tracing, then do the service network start.
>

Thanks very much for pointing that out.
>

You might find this interesting. My modem/router uses the AR7 ADSL chip.
A leading ISP feels this chip provides unreliable connections.

http://www.theregister.com/2007/10/2...neon_bt_fault/
Reply With Quote
  #39 (permalink)  
Old 11-15-2007
Bit Twister
 
Posts: n/a
Default Re: Troubleshooting connection loss (continued)

On Thu, 15 Nov 2007 15:51:09 GMT, Allen Weiner wrote:
> Bit Twister wrote:
>> On Thu, 15 Nov 2007 03:11:31 GMT, Allen Weiner wrote:
>>> The "service network restart" hangs after eth0 is closed down.

>>
>> Well, WE will not be working that problem, unless you take my
>> suggestions as to what config files are to look like.
>>


> I did change the hosts file.


And I know this, how?

And would you provide what you did.

> My dhclient-eth0.leases has not changed in the past week. Lease expires
> on 11/7. DHCP isn't being invoked.


Does not, matter, I was not troubleshooting dhclient-eth0.leases file change.
That information is one one aspect of your problem needing checking.
Glad you picked up on that tibit, Sorry you refused my suggestion on
what it is to contain.

> Suppose either the leases file or the resolv.conf was causing the
> problem. Should that cause "service network restart" to hang?


Told you, "WE will not be working that problem, unless you take my
suggestions as to what config files are to look like"


> It's interesting


Dang, tip on how to follow a complex script gone to waste on the OP. :(

> and discouraging to hear of your experience.


Sorry to hear that. It was not hard, just lots to things to look at,
man some_cmd_here to get a feel to what is cmd did. I gave me the
experience, to what to play with, when, and why, not to mention seeing
tricks and what you can do with bash scripting language.

> It would be interesting to hear what troubleshooting technique you
> use for this situation.


I have been giving you basic troubleshooting techniques and smart
question link to read, and for all my trouble, I was given was static
about what you believe, should not make a difference, and was not
going to change the file, go ahead and kill file me if you want.....

Instead of reading the whole document, this is the section I have in mind.
http://www.catb.org/~esr/faqs/smart-....html#symptoms
for the above paragraph.

>> Never tried it, but pretty sure trying to do a
>> strace /etc/init.d/network restart is not going to work. :)


> Could you elaborate on why this won't work?


You have to use the proper tool for the job at hand.

Do not get me wrong, on the whole, I applaude how well you are doing
and what you have done.

I want you to keep in mind, I try to keep the lurkers in mind when I
post, and teach you how to fish. Not cut the pole, sping the line,
catch the fish, fry, cut it up and feed you.

I do try to keep in mind the poster's skill, and knowledge when making
my respones.

sevice is basically a wrapper script which runs what is found in /etc/init.d
If you were to look at the files in /etc/init.d, you would see that
they for the most part scripts.

Generally speaking, in my mind, you have program/scripts which do the work.
Scripts are what you can view with the cat command. Programs are
compiled into a binary form.

Easy way to tell, try less /bin/ls
less ~/.bashrc
See the difference.

Now instead of less, use strace and see what you can see.

Next time you go to ask about a command, you need to
Read The Fine Manual (RTFM), try the commmand to see what it does.

You never experiment when logged in as root, if possible.
You boot and play in a hot backup partition.

Always as a user, if possible. If afraid of hurting your account,
create a junk account. I do not recommend calling it test.
Log into junk and play around there. You can alwasy delete/create it again.

>> You would do a service network stop,
>> enable your tracing, then do the service network start.
>>

> Thanks very much for pointing that out.


That is a function of /etc/init.d/network, not service.

So, doing a bit of reading in /etc/init.d/network, you would find
stop, start, restart, reload, status were commands available for
service network cmd_here.

> http://www.theregister.com/2007/10/2...neon_bt_fault/


Yep, saw that article on the site when they posted it.

Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 01:12 PM.


Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0