|
|
[Dec 30, 2000, 5:52 pm] voot downtime
voot (www154) crashed today under a heavy load, and was brought back online
within 15 minutes.
[Dec 28, 2000, 9:09 pm] quan upgrade completed
The drive upgrade of quan (www120) has been completed. It now has 30.7 GB
of total disk space.
[Dec 28, 2000, 7:08 pm] vsael Downtime
vsael (www173) crashed under heavy load. It has been returned to
normal operation, and had a downtime of about 10 minutes.
[Dec 28, 2000, 3:33 pm] quan upgrade
We are about to begin a drive upgrade on quan (www120) in order to improve
performance and storage. There will be two brief periods of downtime,
each approximately five minutes, at the beginning and end of the
maintenance. The server will remain online at all other times. The
entire upgrade could take up to 24 hours to complete. We will post an
additional notice when the upgrade has been completed.
[Dec 28, 2000, 12:35 pm] jeta Downtime
jern (www142) crashed under heavy load, and had to be manually rebooted. It was brought back up after a thorough disk cleaning. Total downtime was approximately 15 minutes.
[Dec 27, 2000, 9:38 am] berkano downtime
berkano (www147) crashed under heavy load, and was brought back online with 10 minutes
downtime.
[Dec 26, 2000, 9:31 am] vsael Downtime
vsael (www173) crashed today under heavy load, and had to be manually rebooted. It came up cleanly after a thorough disk check. Total downtime was approximately 15 minutes.
[Dec 25, 2000, 9:50 am] vilya Downtime
vilya (www60) crashed this morning due to heavy load, but could not automatically reboot. Because of this, the server was not able to be refreshed automatically. After a thorough disk cleaning, the machine was brought back up. Total downtime was approximately one hour.
[Dec 24, 2000, 1:52 pm] omega downtime
omega (www16) crashed, and was brought back online. Total downtime
was a little under ten minutes.
[Dec 22, 2000, 8:27 am] kodh downtime
kodh (www90) crashed under high load. Total downtime less than 25 minutes.
[Dec 20, 2000, 3:45 pm] naam upgrade completed
We have completed the drive upgrade on naam (www114). It now has a 30.5GB
hard drive. Total downtime was under 5 minutes.
[Dec 20, 2000, 10:58 am] naam upgrade
We have begun a drive upgrade on naam (www114) in order to improve
performance and storage. There will be two brief periods of downtime, each
approximately five minutes, at the beginning and end of the maintenance. The
server will remain online at all other times. The entire upgrade could take
up to 24 hours to complete. We will post an additional notice when the
upgrade has been completed.
[Dec 20, 2000, 3:53 am] gao downtime
gao (www111) crashed under high load and had to be rebooted. Total downtime
5 minutes.
[Dec 19, 2000, 1:05 pm] neter Downtime
neter (www132) crashed and was brought back online with
downtime less than 15 minutes.
[Dec 18, 2000, 10:12 pm] theta Upgrade
The upgraded theta had problems with its Ethernet card after only a few
minutes online. After swapping the card and cable, theta is now back
to normal operation. We apologize for the additional downtime and
inconvenience.
[Dec 18, 2000, 9:16 pm] theta Upgrade - Update
theta (www4) suffered some complications with the current upgrade. It will
be down for the next 20 mintues while we fix the outstanding issues.
[Dec 18, 2000, 8:45 pm] theta upgrade completed
The work on theta (www4) has been completed. This server is upgraded to a
P-III 866 with 256 MB RAM and a 30.7 GB hard drive.
[Dec 18, 2000, 6:37 pm] theta upgrade
theta (www4) will be taken down briefly to complete a systems upgrade.
Downtime should be no more than 10 minutes.
[Dec 17, 2000, 12:16 am] bemnet Downtime
bemnet crashed under heavy load and was brought back online
after a filesystem check. Downtime was approximately fifteen minutes.
[Dec 14, 2000, 3:58 am] db12 Hardware replacement
We have determined that the hard drive in db12 is dying, with unrecoverable
errors. We are performing an emergency drive swap; the server will be
offline for up to 60 minutes while this is done. No customer data will
be lost.
[Dec 14, 2000, 3:30 am] sowilu Downtime
sowilu (www145) was down early this morning for about 10 minutes. It has
since returned to normal operation.
[Dec 13, 2000, 5:00 pm] fehu Downtime
fehu (www134) crashed under heavy load. Downtime was 10 minutes.
[Dec 13, 2000, 7:43 am] quan downtime
quan (www120) crashed under high load and rebooted itself. Total downtime,
less than 10 minutes.
[Dec 12, 2000, 11:12 pm] xrra Downtime
xrra (www116) crashed under heavy load. Downtime was 15 minutes.
[Dec 12, 2000, 9:05 pm] FrontPage Extensions
In the process of correcting an obscure problem with Apache 1.3.14 under
FreeBSD 4.1.1-STABLE earlier today, we inadvertently broke the FrontPage
extensions for some customer sites. This capability is being restored at
this time, and all FrontPage extensions on -STABLE servers should be in
good working order within the next 30 minutes.
Please accept our apologies for the inconvenience. The other problem that
was being corrected was a problem with access to large PDF files from
certain versions of Internet Explorer.
[Dec 12, 2000, 1:41 pm] jarre Downtime
jarre (www172) crashed under load and was rebooted. Downtime
was less than 15 minutes.
[Dec 8, 2000, 6:24 pm] kodh Downtime
kodh (www90) crashed under heavy load. Downtime was 20 minutes.
[Dec 8, 2000, 3:55 am] halla Upgrade Completed
The drive swap of halla (www68) has completed. It now has 20.5 GB of total
disk space.
[Dec 7, 2000, 11:42 pm] halla Upgrade Failure
During the drive swap on halla (www68), the drive that was suspected of
having data errors has failed. We are in the process of rebuilding the
server, and expect it to return to normal service within the next hour.
[Dec 7, 2000, 10:38 pm] halla upgrade
We have begun a drive upgrade on halla (www68) in order to improve
storage. There will be two brief periods of downtime, each approximately
five minutes, at the beginning and end of the maintenance. The server
will remain online at all other times. The entire upgrade could take up
to 24 hours to complete. We will post an additional notice when the upgrade
has been completed.
[Dec 7, 2000, 2:44 am] wunjo Downtime
wunjo (www140) crashed due to high load and had to be rebooted. Total
downtime under 5 minutes.
[Dec 6, 2000, 5:18 pm] theta upgrade
We have begun a drive and system upgrade on theta (www4) in order to
improve performance and storage. There will be two brief periods of downtime,
each approximately five minutes, at the beginning and end of the maintenance.
The server will remain online at all other times. The entire upgrade could take
up to 24 hours to complete. We will post an additional notice when the upgrade
has been completed.
[Dec 6, 2000, 11:13 am] pi downtime
pi (www10) crashed under heavy load, and was brought back online.
Downtime was around 15 minutes.
[Dec 5, 2000, 11:17 am] bemnet downtime
bemnet (www158) crashed under heavy load, and automatically rebooted after a thorough disk cleaning. Total downtime was approximately 15 minutes.
[Dec 4, 2000, 11:46 pm] halla upgrade
halla (www68) was upgraded to an Athlon Thunderbird 1000 MHz, with 256
MB of RAM. Downtime was under five minutes.
[Dec 4, 2000, 11:29 pm] yanta upgrade
yanta (www67) was upgraded to an Athlon Thunderbird 1000 MHz, with 256
MB of RAM. Downtime was under five minutes.
[Dec 4, 2000, 11:12 pm] hyarmen upgrade
hyarmen (www66) was upgraded to an Athlon Thunderbird 1000 MHz, with
256 MB of RAM. Downtime was under five minutes.
[Dec 4, 2000, 10:48 pm] lambe upgrade
lambe (www63) was upgraded to an Athlon Thunderbird 1000 MHz, with
256 MB of RAM. Downtime was under five minutes.
[Dec 4, 2000, 10:25 pm] anga upgrade
anga (www47) was upgraded to an Athlon Thunderbird 1000 MHz, with
256 MB of RAM. Downtime was under five minutes.
[Dec 2, 2000, 5:02 pm] derba downtime
derba (www104) crashed under heavy load. It has been returned to
normal service with a downtime of approximately 5 minutes.
[Dec 2, 2000, 4:50 pm] aster downtime
aster (www96) crashed under heavy load, and required an extensive
manual filesystem cleaning before being returned into service. Downtime
was approximately 15 minutes.
[Dec 1, 2000, 9:42 pm] chi downtime
chi (www14) crashed under heavy load and had to be rebooted.
Total downtime was under 5 minutes.
[Dec 1, 2000, 9:31 pm] Power Distribution Problem
A power distribution failure in one datacenter rack led to brief downtime
for sasi (www51), pyyl (www97), zhun (www107), hwesta (www207), and db15.
A slightly loose connection on an industrial power strip led to arcing and
scoring of the plug; the problem has been rectified, and the affected
servers were out of service for less than ten minutes each.
No power protection system is perfect; this low-level problem in the final
stage of power distribution is difficult to avoid. Our primary power
systems, including transformers, TVSS, UPS, ATS, and genset were not
involved in this problem. We do not expect any likely recurrence of the
problem. Please accept our apologies for the brief interruption of
service.
[Dec 1, 2000, 12:23 pm] eeoth upgrade completed
The upgrade of eeoth (www85) has been completed. The server is
now a Pentium III, 866 MHz with 256 MB of RAM and 30 GB of space. Downtime
was under ten minutes.
[Dec 1, 2000, 11:57 am] reit upgrade completed
The upgrade of reit (www108) has been completed. The server now
has 30 GB of total disk space. Downtime was under five minutes.
[Dec 1, 2000, 5:58 am] haebrath downtime
haebrath (www156) crashed under high load and had to be rebooted. Total
downtime, less than 10 minutes.
[Nov 30, 2000, 10:51 pm] reit upgrade
We have begun a drive upgrade on reit (www108) in order to improve storage.
There will be two brief periods of downtime, each approximately five minutes,
at the beginning and end of the maintenance. The server will remain online
at all other times. The entire upgrade could take up to 24 hours to complete.
We will post an additional notice when the upgrade has been completed.
[Nov 30, 2000, 10:32 pm] eeoth upgrade
We have begun a drive and system upgrade on eeoth (www85) in order to
improve performance and storage. There will be two brief periods of
downtime, each approximately five minutes, at the beginning and end of
the maintenance. The server will remain online at all other times. The
entire upgrade could take up to 24 hours to complete. We will post an
additional notice when the upgrade has been completed.
[Nov 30, 2000, 10:04 am] anca maintenance
The hardware on anca (www53) has been swapped, following problems
after its recent upgrade. This is expected to prevent any further
problems. Downtime was under five minutes.
[Nov 30, 2000, 9:03 am] PairList Upgrade Planned
We are pleased to announce the planned upgrade of our PairList
service to Mailman 2.0. This major upgrade will take place on Monday,
December 3, and is outlined at http://www.pairlist.net/upgrade.shtml
[Nov 30, 2000, 2:11 am] anca downtime
anca (www53) crashed once more and was brought back on line. We continue to
investigating the cause of the recent crashes. Total downtime was
approximately 15 minutes.
[Nov 30, 2000, 12:22 am] anca Downtime
anca (www53) crashed and was brought back online after
a filesystem cleaning. We are currently investigating the cause
of recent crashes on this server.
[Nov 29, 2000, 9:11 pm] uilen Downtime
uilen (www35) crashed under load and required an extensive
manual filesystem cleaning before being returned to service. Downtime
was approximately 40 minutes.
[Nov 29, 2000, 4:23 pm] PHP3 Magic Quotes in -STABLE
Currently, the magic_quotes_gpc feature in PHP 3 under FreeBSD -STABLE
is set to 'on'. However, based on user feedback, we have decided to
restore the previous setting of 'off'. Because PHP 3 still runs as a
CGI, there is no way for user scripts to change this setting. Therefore
it is better to leave it unchanged by the -STABLE upgrade.
The new setting will be deployed at 8am Eastern time on Thursday, November
30. If you have any PHP 3 scripts running on -STABLE servers which
depend on this setting (which was only introduced with the -STABLE
upgrade), please correct them at that time.
Please note that the settings may be tuned for PHP 4, as it runs as an
Apache module. In general, we encourage our users to upgrade their code
to use PHP 4 wherever possible.
[Nov 29, 2000, 3:44 pm] ulwar upgrade completed
The hard drive upgrade on ulwar (www82) has been completed. Due
to complications, the server's hardware upgrade has been postponed.
Downtime was under ten minutes.
[Nov 29, 2000, 12:54 pm] umbar upgrade completed
The upgrade of umbar (www46) has been completed. The server now
has 30 GB of total disk space. Downtime was under five minutes.
[Nov 29, 2000, 12:03 pm] anca downtime
anca (www53) crashed under heavy load and had to be manually rebooted. Total downtime was under 15 minutes.
[Nov 29, 2000, 9:01 am] anca Downtime
anca (www53) crashed and was brought back online
with downtime of approximately 20 minutes.
[Nov 28, 2000, 11:55 pm] ulwar upgrade
We have begun a drive and system upgrade ulwar (www82). There will be two
brief periods of downtime, each approximately five minutes, at the
beginning and end of the maintenance. The server will remain online at all
other times. The entire upgrade could take up to 24 hours to complete. We
will post an additional notice when the upgrade has been completed.
[Nov 28, 2000, 11:32 pm] umbar upgrade
We have begun a drive upgrade on umbar (www46) to improve storage. There
will be two brief periods of downtime, each approximately five minutes, at
the beginning and end of the maintenance. The server will remain online at
all other times. The entire upgrade could take up to 24 hours to complete.
We will post an additional notice when the upgrade has been completed.
[Nov 28, 2000, 5:51 pm] STABLE Upgrades
The -STABLE upgrades that took place on maborym (www204) and
orvni (www205) today encountered problems which led to several
additional brief intervals of downtime for each server. A workaround
was developed, and the upgrades have been completed normally for both
servers. The problem is specifically related to the large drives being
used (30GB). Please accept our apologies for the downtime; we will ensure
this does not recur in future upgrades.
[Nov 28, 2000, 3:32 pm] maborym upgrade
maborym (www204) is currently undergoing upgrade procedures to
4.1-STABLE. Due to some difficulties with this process, we expect this
server to be down for another 10 minutes while this is completed. Normal
service should return shortly afterwards.
[Nov 28, 2000, 1:08 pm] omega upgrade completed
The upgrade of omega (www16) has been completed. The server now
features 30 GB of total space, and is an Athlon Thunderbird 1000
MHz, with 256 MB of RAM. Downtime was under ten minutes.
[Nov 28, 2000, 10:41 am] anca upgrade completed
The upgrade of anca (www53) has been completed. The server now
features 30 GB of total space, and is an Athlon Thunderbird 1000 MHz, with
256 MB of RAM. Downtime was under five minutes.
[Nov 27, 2000, 11:47 pm] hwesta upgrade
hwesta (www51) was upgraded to an Athlon Thunderbird 1000 MHz,
with 256 MB of RAM. Downtime was under five minutes.
[Nov 27, 2000, 11:21 pm] umbar upgrade
umbar (www46) was upgraded to an Athlon Thunderbird 1000 MHz, with
256 MB of RAM. Downtime was under ten minutes.
[Nov 27, 2000, 10:54 pm] anca upgrade
We have begun a drive and system upgrade on anca (www53). There will be two
brief periods of downtime, each approximately five minutes, at the beginning
and end of the maintenance. The server will remain online at all other times.
The entire upgrade could take up to 24 hours to complete. We will post an
additional notice when the upgrade has been completed.
[Nov 27, 2000, 10:43 pm] ando upgrade
ando (www45) was upgraded to an Athlon Thunderbird 1000 MHz, with
256 MB of RAM. Downtime was under five minutes.
[Nov 27, 2000, 10:18 pm] onn upgrade
onn (www29) has been upgraded to an Athlon Thunderbird 1000MHz,
with 256 MB of RAM. Downtime was under five minutes.
[Nov 27, 2000, 10:13 pm] omega upgrade
We have begun a drive and system upgrade on omega (www16) in order to
improve performance and storage. There will be two brief periods of
downtime, each approximately five minutes, at the beginning and end of the
maintenance. The server will remain online at all other times. The entire
upgrade could take up to 24 hours to complete. We will post an additional
notice when the upgrade has been completed.
[Nov 25, 2000, 11:41 pm] Network Problems Readdressed
A recurrence of the same internal router failure we experienced Saturday
night led to to a brief interruption of connectivity for a few customers.
The total duration of the outage was less than five minutes. We will be
shifting traffic away from the affected router within the next 24 hours,
and expect no further disruptions of traffic.
[Nov 24, 2000, 11:17 pm] Network Problems Resolved
The problem causing our internal routing issue has been found and
corrected. All routing is behaving normally again. This issue did not
affect all of our customers, and we apologize sincerly to the customers who
were affected during this issue.
[Nov 24, 2000, 10:45 pm] Network Problems
We are currently experiencing an internal routing problem, which may
prevent some of our customers from reaching their sites. We are working
diligently to investigate and remedy this issue, and hope to have the
problem solved shortly.
[Nov 22, 2000, 10:54 am] or upgrade completed
The upgrade on or (www34) has been completed. The server now has
30 GB of disk space. Downtime was under five minutes.
[Nov 21, 2000, 11:09 pm] or upgrade
We have begun a drive and system upgrade on or (www34) in order to
improve performance and storage. There will be two brief periods of
downtime, each approximately five minutes, at the beginning and end of
the maintenance. The server will remain online at all other times. The
entire upgrade could take up to 24 hours to complete. We will post an
additional notice when the upgrade has been completed.
[Nov 16, 2000, 6:04 pm] anca Downtime
anca (www53) crashed under heavy load. Downtime was under 10 minutes.
[Nov 16, 2000, 4:58 pm] gebo downtime
gebo (www139) crashed under heavy load, and was rebooted after a thorough disk cleaning. Total downtime was approximately 10 minutes.
[Nov 16, 2000, 4:44 pm] pyyl upgrade completed
The upgrade of pyyl (www97) has been completed. The server now
has 30 GB of space and is a Pentium III 866 MHz, with 256 MB RAM. Downtime
was around ten minutes.
[Nov 16, 2000, 5:57 am] vuae downtime
vuae (www93) crashed under high load and had to be rebooted. Total downtime
under 5 minutes.
[Nov 15, 2000, 8:53 pm] pyyl upgrade
We have begun a drive and system upgrade on pyyl (www97) in order to
improve performance and storage. There will be two brief periods of
downtime, each approximately five minutes, at the beginning and end of the
maintenance. The server will remain online at all other times. The entire
upgrade could take up to 24 hours to complete. We will post an additional
notice when the upgrade has been completed.
[Nov 15, 2000, 6:30 pm] zhun upgrade
We have completed the upgrade of zhun (www107). It is now an Athlon
Thunderbird 1000 MHz, with 256MB RAM, and 30.5GB of disk space.
[Nov 15, 2000, 3:43 pm] STABLE Upgrades
Due to a scheduling error, the servers intended for the -STABLE upgrade on
Thursday, November 16th were in fact upgraded today, November 15th. We
have therefore rescheduled Wednesday's servers for Thursday. Please accept
our apologies for the mixup; it should not happen again.
We have addressed every outstanding problem we are aware of; if you encounter
any problems as a result of the -STABLE upgrade, please e-mail urgent@pair.com
The upgrade notice in the Support Forum has been slightly updated:
http://support.pair.com/notices/freebsd35.html
[Nov 15, 2000, 12:44 pm] zhun upgrade
We are about to begin a drive upgrade of zhun (www107). There
will be two short periods of downtime, each approximately five minutes,
at the beginning and end of the upgrade. The server will remain online at
all other times. We will post an additional notice when the upgrade has
been completed.
[Nov 14, 2000, 5:57 pm] vilya downtime
vilya (www60) crashed under heavy load, and was brought back
online. Downtime was around 15 minutes.
[Nov 14, 2000, 3:05 pm] emancholl downtime
emancholl (www37) crashed, and was rebooted. Full service has been restored
with less than 10 minutes of downtime.
[Nov 13, 2000, 4:39 pm] STABLE Upgrades
Three servers were upgraded to FreeBSD -STABLE today, per the posted
schedule. The servers are vsael (www173), biont (www174), and zulle
(www175). The schedule has been updated to indicate the upgrades planned
for later this week.
We are not aware of any outstanding problems that are not documented at
http://support.pair.com/notices/freebsd35.html - if you encounter any
difficulty with your site after this upgrade, please read that page. If
you require assistance, please write to urgent@pair.com and we will address
the problem promptly.
The schedule is available at
http://support.pair.com/notices/stable-upgrade.html
[Nov 13, 2000, 1:31 pm] kodh downtime
kodh (www90) crashed under heavy load, and was brought back
online. Downtime was under 15 minutes.
[Nov 13, 2000, 12:06 pm] ynilo downtime
ynilo (www166) crashed under heavy load, and was brought back
online. Downtime was under 10 minutes.
[Nov 12, 2000, 7:31 pm] kodh downtime
kodh (www90) crashed under high load, and was brought back up
after a manual check. Total downtime was about 15 minutes.
[Nov 10, 2000, 4:15 pm] bemnet downtime
bemnet (www158) crashed under heavy load. After a thorough disk cleaning, it was brought back up. Total downtime was approximately 10 minutes.
[Nov 10, 2000, 1:41 pm] rho upgrade completed
The maintenance on rho (www11) has been completed. The server now
has 20 GB of disk space, and at the same time was upgraded to a Pentium
III, 866 MHZ with 256 MB of RAM. Downtime was around 15 minutes.
[Nov 10, 2000, 11:27 am] auma upgrade completed
The upgrade on auma (www86) has been completed. The server now
has 20 GB of space. Downtime was under five minutes.
[Nov 10, 2000, 5:37 am] kodh downtime
kodh (www90) crashed under high load and had to be rebooted
[Nov 10, 2000, 3:37 am] sether downtime
sether (www95) crashed under high load and had to be rebooted. Total
downtime was little over 5 minutes.
[Nov 9, 2000, 11:08 pm] rho upgrade
We have begun a drive and system upgrade on rho (www11)in order to improve
performance and storage. There will be two brief periods of downtime, each
approximately five minutes, at the beginning and end of the maintenance. The
server will remain online at all other times. The entire upgrade could take
up to 24 hours to complete. We will post an additional notice when the
upgrade has been completed.
[Nov 9, 2000, 10:44 pm] auma upgrade
We are about to begin a drive upgrade of auma (www86). There will be two
short periods of downtime, each approximately five minutes, at the
beginning and end of the upgrade. The server will remain online at all
other times. We will post an additional notice when the upgrade has been
completed.
[Nov 9, 2000, 2:10 pm] FreeBSD -STABLE Upgrades
Because we are still tracing an unusual bug that affects our ability to
manage servers, we have rescheduled the -STABLE upgrades for Thursday,
November 9 to Monday, November 13.
The -STABLE upgrade unavoidably changes certain aspects of the services
available to our customers. In several specific cases, customers will need
to modify their usage in order to operate normally after the -STABLE
upgrade. Please read about these important changes at
http://support.pair.com/notices/freebsd35.html
[Nov 9, 2000, 11:33 am] thesel downtime
thesel (www99) crashed under heavy load, and automatically rebooted. Total downtime was under ten minutes.
[Nov 9, 2000, 9:49 am] iota downtime
iota (www5) crashed under heavy load, and was brought back online.
Downtime was under 10 minutes.
[Nov 8, 2000, 10:05 pm] kodh upgrade
We have just completed a drive and system upgrade of kodh (www90). It is
now a 30GB drive in a Thunderbird with a 1Ghz processor and 256MB of RAM.
The upgrade took longer than normal, but downtime was under 5 minutes.
[Nov 8, 2000, 9:09 pm] raitax Upgrade
raitax (www171) has been upgraded to FreeBSD 4.1.1-STABLE. Please
report any problems to support@pair.com. A complete schedule of future
FreeBSD upgrades can be found at:
http://support.pair.com/notices/stable-upgrade.html
[Nov 8, 2000, 3:43 pm] ilwe downtime
ilwe (www81) crashed and was rebooted. Downtime was less than 10
minutes.
[Nov 8, 2000, 8:36 am] theta Reboot
theta (www4) was rebooted. Downtime was less than 3 minutes.
[Nov 6, 2000, 10:17 pm] kodh upgrade
We have begun a drive upgrade on kodh (www90) in order to improve
performance and storage. There will be two brief periods of downtime, each
approximately five minutes, at the beginning and end of the maintenance. The
server will remain online at all other times. The entire upgrade could take
up to 24 hours to complete. We will post an additional notice when the upgrade
has been completed.
[Nov 6, 2000, 11:57 am] pi downtime
pi (www10) encountered a problem which required a reboot.
Downtime was under five minutes.
[Nov 3, 2000, 10:19 am] wawrra upgrade
The hard drive upgrade of wawrra (www109) has been completed. The
server now has 20 GB of space. Downtime was under five minutes.
[Nov 3, 2000, 12:05 am] kodh
Due to unforseen complexities with the server, the upgrade of kodh has been
pushed back for at least 24 hours. We are examining the situation and hope
to push ahead with the upgrade this week.
[Nov 2, 2000, 11:10 pm] kodh upgrade
We have begun a drive upgrade on kodh (www90) in order to improve performance
and storage. There will be two brief periods of downtime, each approximately
five minutes, at the beginning and end of the maintenance. The server
will remain online at all other times. The entire upgrade could take up to
24 hours to complete. We will post an additional notice when the upgrade
has been completed.
[Nov 2, 2000, 10:50 pm] wawrra upgrade
We have begun a drive upgrade on wawrra (www109) in order to improve
performance and storage. There will be two brief periods of downtime, each
approximately five minutes, at the beginning and end of the maintenance.
The server will remain online at all other times. The entire upgrade could
take up to 24 hours to complete. We will post an additional notice when
the upgrade has been completed.
[Nov 2, 2000, 10:12 am] fearn Downtime
fearn (www40) crashed under load and was brought back
online with downtime of approximately 15 minutes.
[Nov 1, 2000, 10:47 pm] beeoro upgrade
We're about to begin a drive upgrade of beeoro (www69). There will be two
brief periods of downtime, each approximately five minutes, at the beginning
and end of the maintenance. The server will remain online at all other times.
The entire upgrade could take up to 24 hours to complete. We will post an
additional notice when the upgrade has been completed.
[Nov 1, 2000, 12:54 pm] paat upgrade
paat (www100) received a hard drive upgrade, and now has 19 GB of
space. Downtime was under five minutes.
[Nov 1, 2000, 12:35 pm] aedde upgrade
aedde (www84) received a hard drive upgrade, and now has 14 GB
space. Downtime was under five minutes.
[Oct 31, 2000, 9:36 pm] paat upgrade
We are about to begin a drive upgrade of paat (www100). There will be two
brief periods of downtime, each approximately five minutes, at the beginning
and end of the maintenance. The server will remain online at all other
times. The entire upgrade could take up to 24 hours to complete.
We will post an additional notice when the upgrade has been completed.
[Oct 31, 2000, 8:58 pm] aedde upgrade
We will begin a drive upgrade of aedde (www84) shortly. There will be two
brief periods of downtime, at the beginning and end of the maintenance. The
server will remain online at all other times. The entire upgrade could take
up to 24 hours to complete.
We will post an additional notice when the upgrade has been completed.
[Oct 31, 2000, 9:29 am] emancholl downtime
emancholl (www37) crashed under heavy load and has since been rebooted.
Downtime was about 10-15 minutes.
[Oct 30, 2000, 8:29 pm] thnad Downtime
thnad (www121) was down this evening for approximately 10 minutes. It has
since returned to normal operation.
[Oct 30, 2000, 5:34 pm] bemnet downtime
bemnet (www158) encountered a network error which required the
server to be restarted. Downtime was under five minutes.
[Oct 30, 2000, 3:02 pm] chi upgrade
chi (www14) received a hard drive upgrade, and now has 30 GB total
space. Downtime was under five minutes.
[Oct 30, 2000, 1:27 pm] onn upgrade
onn (www29) received a hard drive upgrade, and now has 14 GB total
disk space. Downtime was under five minutes.
[Oct 29, 2000, 10:13 pm] chi upgrade
We have begun a drive and system upgrade on chi (www14). There will be two
brief periods of downtime, each approximately five minutes, at the beginning
and end of the maintenance. The server will remain online at all other times.
The entire upgrade could take up to 24 hours to complete.
We will post an additional notice when the upgrade has been completed.
[Oct 29, 2000, 9:06 pm] onn upgrade
We have begun a drive and system upgrade on onn (www29) in order to
improve performance and storage. There will be two brief periods of
downtime, each approximately five minutes, at the beginning and end of
the maintenance. The server will remain online at all other times. The
entire upgrade could take up to 24 hours to complete.
We will post an additional notice when the upgrade has been completed.
[Oct 29, 2000, 8:09 pm] gnaaste downtime
gnaaste (www79) crashed under high load and had to be rebooted.
Total downtime was under 10 minutes.
[Oct 29, 2000, 12:58 am] SAVVIS Update
After slightly more than three hours, SAVVIS has restored the
flow of traffic between our network and theirs. No explanation of the
outage has yet been provided.
[Oct 28, 2000, 7:41 pm] SAVVIS Outage
Shortly after 5pm Eastern time, our DS-3 circuit to SAVVIS stopped
passing traffic. This may be a general outage for SAVVIS in either
the Pittsburgh area or their New York City POP. We have requested
escalation of this problem. In the meantime, customer traffic is
unaffected; our other providers are carrying the traffic with no problems.
We will post regarding any further developments.
[Oct 27, 2000, 12:38 pm] vala upgrade
vala (www59) has been upgraded with a new hard drive, to have
approx. 28 GB of space.
[Oct 26, 2000, 11:01 pm] vala upgrade
We have begun a drive and system upgrade on vala (www59) in order to
improve performance and storage. There will be two brief periods of
downtime, each approximately five minutes, at the beginning and end of
the maintenance. The server will remain online at all other times. The
entire upgrade could take up to 24 hours to complete.
We will post an additional notice when the upgrade has been completed.
[Oct 25, 2000, 10:43 pm] sowilu downtime
sowilu (www145) crashed under heavy load and had to be rebooted.
Downtime was under 15 minutes.
[Oct 25, 2000, 2:19 pm] kanat downtime
kanat (www128) crashed under heavy load, and was rebooted.
Downtime was under 10 minutes.
[Oct 25, 2000, 12:13 pm] kodh downtime
kodh (www90) crashed under heavy load, and was rebooted after a thorough disk check. Total downtime was approximately 15 minutes.
[Oct 25, 2000, 11:51 am] iota upgrade
iota (www5) has been upgraded to have 15 GB total disk space, and
is now running as an AMD Athlon 1000 MHz, with 256 MB of RAM.
[Oct 24, 2000, 10:55 am] lepen downtime
lepen (www129) crashed under heavy load, and was automatically rebooted. Total downtime was under 10 minutes.
[Oct 20, 2000, 9:25 pm] elhaz downtime
elhaz (www144) became unresponsive under heavy load and was
rebooted. Downtime was less than 10 minutes.
[Oct 19, 2000, 10:59 am] anca downtime
anca (www53) crashed under heavy load and as since been rebooted. Downtime
was less than 15 minutes.
[Oct 19, 2000, 10:05 am] bemnet downtime
bemnet (www158) crashed under heavy load. It was automatically rebooted after a thorough disk check. Total downtime was approximately 10 minutes.
[Oct 19, 2000, 4:47 am] uilen downtime
uilen (www35) crashed under high load and had to be rebooted. Total
downtime was under 20 minutes.
[Oct 18, 2000, 8:37 am] kayan downtime
kayan (www133) crashed under high load and had to be rebooted. Total
downtime was under 10 minutes.
[Oct 18, 2000, 2:30 am] bhoth downtime
bhoth (www163) crashed under high load and self rebooted. Total downtime
was under 10 minutes.
[Oct 17, 2000, 7:04 pm] kayan Downtime
kayan (www133) crashed under heavy load. Downtime was 15 minutes.
[Oct 17, 2000, 6:53 pm] Sprint Update
Our DS-3 circuit to Sprint has remained steady since noon today.
We will be working with Sprint engineers in the next few days to
resolve two minor outstanding issues with the circuit. Customer traffic
should not be affected.
[Oct 17, 2000, 1:00 pm] kayan downtime
kayan (www133) crashed under heavy load, and was brought back
online. Downtime was around ten minutes.
[Oct 17, 2000, 11:50 am] Sprint Network Trouble
Since approximately 9:45am Eastern time, we have been seeing packet errors
and brief outages on our DS-3 circuit to Sprint. We have a ticket
open with Sprint, and they are investigating the issue. Some of the
potentially affected traffic has been shifted to Digex in the
interim. We will post further information as it becomes available.
[Oct 17, 2000, 10:25 am] anca downtime
anca (www53) crashed under heavy load. It was brought back online after an extensive disk check. Total downtime was approximately 15 minutes.
[Oct 14, 2000, 8:27 pm] anca Offline
anca was offline for approximately fifteen minutes because of a
network problem, which has now been corrected.
[Oct 13, 2000, 9:55 am] enso downtime
enso (www153) crashed under heavy load, and was brought back
online. Downtime was approximately ten minutes.
[Oct 11, 2000, 4:43 pm] gao downtime
gao (www111) was rebooted to clear up a resource problem.
Downtime was under five minutes.
[Oct 11, 2000, 12:16 pm] kirlian downtime
kirlian (www110) crashed under heavy load. It has been brought back online
after an extensive disk check. Downtime was approximately 20 minutes.
[Oct 10, 2000, 7:16 pm] straif upgrade
straif (www26) has been upgraded to a 15gb hard drive, and to a
Pentium III, 733MHz with 256MB of RAM.
[Oct 10, 2000, 7:01 pm] kayan Downtime
kayan (www133) crashed under load. It was rebooted. Downtime was 10
minutes.
[Oct 10, 2000, 12:48 am] SAVVIS Resolution
After an overall outage duration of six hours, our SAVVIS
circuit was restored to normal operation around 11:05pm Eastern time.
We will monitor for any further difficulties. We do not yet have
details of the actual problem in their New York POP.
[Oct 9, 2000, 11:40 pm] straif upgrade
We have begun a drive and system upgrade on straif (www26) in
order to improve performance and storage. There will be two brief periods
of downtime, each approximately five minutes, at the beginning and end of
the maintenance. The server will remain online at all other times. The
entire upgrade could take up to 24 hours to complete.
We will post an additional notice when the upgrade has been completed.
[Oct 9, 2000, 11:24 pm] eeoth Downtime
eeoth (www85) crashed due to heavy load. It was rebooted and brought back
online. Downtime was 10 minutes.
[Oct 9, 2000, 10:22 pm] SAVVIS Update
For the past hour, we have been successfully delivering traffic outbound
via SAVVIS. However, SAVVIS is still not accepting our
network routes, and consequently no traffic is flowing in on that circuit.
Due to the fact that most of our inbound traffic flows through other
providers, who are covering the traffic easily, this is not much of a
problem. The lengthy delay in fixing their configuration is arguably
cause for concern, however.
It is likely that the circuit will go down at least once more while they
fix this configuration. We will post further updates as news develops.
[Oct 9, 2000, 8:46 pm] SAVVIS Update
SAVVIS had our circuit offline for nearly two and a half hours
this evening, following a crash of their New York POP. After having
some trouble locating our customer information, they are now working to
rebuild the BGP session necessary to carry our traffic. The circuit
has briefly been operational for outbound traffic only, but is now being
reconfigured.
Traffic has been carried by our other backbone providers without
difficulty. We will post a further update when it becomes available.
[Oct 9, 2000, 5:56 pm] SAVVIS Trouble
We are currently seeing high latency and packet loss on our circuit to
SAVVIS. This may be affecting performance for a small percentage
of our visitors; we are following up with SAVVIS to determine the
expected duration of the problem.
[Oct 9, 2000, 1:03 pm] kayan downtime
kayan (www133) crashed under heavy load, and was brought back
online. Downtime was under 10 minutes.
[Oct 9, 2000, 1:08 am] theta downtime
theta (www4) had to be rebooted n order to restart the respawn log of the
server. Total downtime was less than 5 minutes.
[Oct 8, 2000, 6:28 pm] thurisaz Downtime
thurisaz (www136) crashed and rebooted. Downtime was approximately 10
minutes.
[Oct 8, 2000, 3:31 am] fearn Downtime
fearn has been offline for approximately one hour; we have traced
the outage to a failed Ethernet controller. The server will be back online
within the next ten minutes.
[Oct 7, 2000, 7:38 pm] derba Downtime
derba (www104) crashed and rebooted. Downtime was 10 minutes.
[Oct 7, 2000, 8:14 am] bemnet Downtime
bemnet crashed under heavy load, and was down for approximately
twenty minutes during extensive filesystem cleaning.
[Oct 6, 2000, 2:34 pm] xi downtime
xi (www8) rebooted, and was down for approximately 15 minutes for
extensive automatic filesystem cleaning. All services have returned as of
this time.
[Oct 6, 2000, 1:48 am] calma Restored
calma has been restored to full service. The server was online
throughout the file restoration process, in order to maximize the
availability of sites as they were being restored. The backup is from
Friday morning. Unfortunately, Web and FTP logs for all hits on Friday
were lost as a result of the drive failure. Also, due to an unfortunate
interaction between our backup system and pair2000, the contents of
virtual mailboxes were not recoverable from backups. Because this server
was converted to pair2000 just one week ago, however, there were very few
mailboxes configured.
We have corrected this interaction. No other customer data was lost.
The server was offline for about 90 minutes, after which another 90
minutes was required to restore data from backups. In many ways, the
pair2000 system simplified the data restoration process.
The drive in calma was old and normally would have been replaced
as part of routine hardware upgrades. We will continue to strive to stay
ahead of these failures; currently we are dealing with parallel upgrades
for pair2000, CPU/memory, larger/newer drives, and FreeBSD 3.5.
We apologize to our customers for this inconvenient emergency.
[Oct 5, 2000, 11:42 pm] calma Update
calma (www43) is in the process of getting the primary drive rebuilt.
Expected time to completion is 2 hours. More information will be posted as
needed.
[Oct 5, 2000, 10:55 pm] calma Drive Failure
calma (www43) suffered a primary drive failure around 10:30 pm EST. We are
in the process of rebuilding the drive. More information will be posted as
it becomes available.
[Oct 5, 2000, 7:04 pm] kayan Downtime
kayan (www133) crashed under heavy load. It has since been brought back
online. Total downtime was approximately 10 minutes.
[Oct 5, 2000, 3:05 pm] omicron downtime.
omicron (www9) crashed under heavy load. It has since been brought
back online. Total downtime was under 15 minutes.
[Oct 5, 2000, 12:45 pm] bhoth downtime
bhoth (www163) crashed under heavy load. It automatically rebooted, and came back up with no errors. Total downtime was under five minutes.
[Oct 4, 2000, 6:28 pm] neter maintenance completed
The hard drive swap on neter (www132) has been completed. The
server server now has 18 GB total space.
[Oct 4, 2000, 3:37 am] glikk downtime
glikk (www119) crashed under high load and had to be rebooted. Total down
time under ten minutes.
[Oct 3, 2000, 4:20 pm] neter maintenance
neter will be taken down shortly for drive maintenance. Downtime
should be under ten minutes. There will later be another short period of
downtime when the drive swap is finalized.
[Oct 3, 2000, 12:04 am] shen upgrade
shen (www88) was upgraded to a Pentium III, 733 MHz with 256 MB of
RAM. Downtime was under ten minutes.
[Oct 2, 2000, 11:44 pm] mildh upgrade
mildh (www87) was upgraded to a Pentium III, 733 MHz with 256 MB
of RAM. Downtime was under ten minutes.
[Oct 2, 2000, 11:28 pm] auma upgrade
auma (www86) was upgraded to a Pentium III, 733 MHz with 256 MB of
RAM. Downtime was under ten minutes.
[Oct 2, 2000, 11:09 pm] nuumen Downtime
nuumen (www55) was down for about 10 minutes tonight. It has since been
brought back to normal service.
[Oct 2, 2000, 10:38 pm] cele upgrade
cele (www73) has been upgraded to a Pentium III 733 MHz, with 256
MB of RAM. Downtime was under 15 minutes.
[Oct 2, 2000, 11:25 am] epsilon maintenance completed
The hardware maintenance on epsilon (www3) has been completed.
The hard drive has been upgraded to have 18 GB total space. Additionally it
was upgraded to an AMD Athlon at 800 MHz, with 384 MB of RAM.
[Sep 29, 2000, 11:49 am] epsilon maintenance
epsilon (www3) is being worked on at this time for a hard drive
upgrade. There will be an additional brief period of downtime later for
the final swap. We apologize for the inconvenience.
[Sep 28, 2000, 11:37 pm] flure upgrade
flure (www75) was upgraded to a Pentium III, 733 MHz with 256 MB
of RAM. Downtime was under 10 minutes.
[Sep 28, 2000, 11:13 pm] gwind upgrade
gwind (www74) was upgraded to a Pentium III, 733 MHz with 256 MB
of RAM. Downtime was under ten minutes.
[Sep 28, 2000, 10:53 pm] dyyme upgrade
dyyme (www71) has been upgraded to a Pentium III 733 MHz, with 256
MB of RAM. Downtime was under ten minutes.
[Sep 28, 2000, 10:33 pm] tama upgrade
tama (www70) has been upgraded to a Pentium III 733 MHz, with 256
MB of RAM.
[Sep 27, 2000, 6:01 pm] falku downtime
falku (www98) crashed under heavy load, and was brought back online. Downtime was under ten minutes.
[Sep 27, 2000, 4:55 pm] eite maintenance completed
The drive maintenance on eite has been completed. New drive
capacity is now 15GB
[Sep 27, 2000, 3:30 pm] pair2000 Mail Delivery
We have corrected a qmail configuration problem which was delaying
e-mail delivery on a few of our pair2000-enabled servers. Without
this fix, all e-mail delivery was being delayed for up to 30 minutes.
We have implemented a system fix which will prevent the recurrence of
this problem. Except under conditions of unusual mail volume, all
mail delivery under pair2000 should now be nearly instantaneous.
[Sep 27, 2000, 1:32 pm] eite maintenance
After detecting the beginnings of a problem with the hard drive in eite
(www38), we are beginning a preemptive emergency drive swap of this
server before any user data is lost. There will be two brief downtimes at
the beginning and end of maintenance of approximately 5 minutes, while the
hardware is replaced. The server will remain online at all other times
throughout.
We will post an additional notice when this maintenance is completed.
[Sep 27, 2000, 1:00 pm] onn downtime
onn (www29) crashed under heavy load, and was brought back online.
Downtime was under ten minutes.
[Sep 26, 2000, 5:02 pm] khurla upgrade completed
The drive upgrade of khurla has been completed, with a new drive
space of 15GB to better accomidate future growth.
[Sep 26, 2000, 4:36 pm] UUnet Resolution
We are severely disappointed with UUnet's handling of our OC-3c upgrade,
and grievously embarrassed by the timing of what should have caused no
interruption of traffic whatsoever. The OC-3c circuit was fully tested
and accepted by all parties involved, at every layer. IP traffic was
passed successfully over it before the changeover.
Unfortunately, when the changeover was initiated around 12:30pm, traffic
refused to flow inbound from UUnet. This was resolved by renumbering the
circuit, on the assumption that there was a conflict with some other
customer circuit. At 2:45pm, UUnet again renumbered the circuit, in order
to establish a more permanent address. At this point, our network was
again blackholed by UUnet, a situation which persisted intermittently until
approximately 4:15pm. During this time, customers and site visitors
attempting to reach our network via UUnet were often unsuccessful.
Upon reverting to the prior configuration, the problem was not resolved.
At this point, UUnet was stumped, and called in additional engineers.
After 90 minutes and another circuit configuration change, traffic was
restored to normal, flowing in both directions on the OC-3c circuit.
We do not know if changes were made elsewhere in UUnet's network.
The DS-3 circuit will continue to act as a warm standby. We are pursuing
remedy for this outage with UUnet, but recognize that nothing short of
eliminating such incidents will truly protect our reputation for technical
excellence and network uptime. As UUnet continues to be the only network
with a history of blackholing our inbound traffic, for whatever technical
reason, we expect to continue to focus our traffic growth on our other
backbone providers, including our pending OC-3c circuit to Sprint as well
as our new AT&T DS-3 service.
pair Networks, Inc offers its most sincere apology to customers and site
visitors affected by this partial outage. Uptime and reliability continue
to be our top goals, and this type of outage should never occur again.
[Sep 26, 2000, 2:40 pm] thesel downtime
thesel (www99) crashed under heavy load. Downtime was approximately 10
minutes.
[Sep 26, 2000, 1:57 pm] UUnet Resolution
After an extensive debugging session with UUnet engineers, our OC-3c
circuit is now up and successfully passing traffic in both directions.
For approximately forty minutes, outbound traffic was working normally,
while inbound traffic was being blackholed by UUnet. This is another
manifestation of internal routing problems at UUnet, an issue we have
repeatedly tried to address with their engineers.
The original problem has not been resolved, but worked around. There
will be one additional brief interruption of OC-3c connectivity while
the interface is reconfigured. Traffic will be carried by the DS-3, as
well as other backbones, in the interim.
[Sep 26, 2000, 12:57 pm] kirlian downtime
kirlian (www110) crashed under heavy load, and was brought back
online. Downtime was around 20 minutes.
[Sep 26, 2000, 12:47 pm] UUnet Upgrade
We are continuing the process of upgrading our UUnet circuit from DS-3
to OC-3c. During the cutover, portions of UUnet's network are apparently
blackholing traffic destined for our network. We are working with UUnet
engineers on this urgent matter, and will post further details as they
become available.
[Sep 26, 2000, 12:31 pm] khurla upgrade
We have begun a drive upgrade on khurla (www112) in
order to improve performance and storage. There will be two brief periods
of downtime, each approximately five minutes, at the beginning and end of
the maintenance. The server will remain online at all other times. The
entire upgrade could take up to 24 hours to complete.
We will post an additional notice when the upgrade has been completed.
[Sep 26, 2000, 3:48 am] yekk downtime
yekk (www124) crashed under heavy load, and required an extensive
filesystem cleaning. Downtime was around 30 minutes.
[Sep 25, 2000, 11:47 pm] pi upgrade
pi (www10) has been upgraded to a Pentium III 733 MHz, with 256 MB
of RAM. Total downtime was under 15 minutes.
[Sep 25, 2000, 11:35 pm] nuumen Downtime
nuumen (www55) crashed under heavy load. Downtime was approximately 10
minutes.
[Sep 25, 2000, 10:44 pm] arda upgrade
arda (www62) was upgraded to a Pentium III 733 MHz, with 256 MB of
RAM. Downtime was under ten minutes.
[Sep 25, 2000, 10:29 pm] roomen upgrade
roomen (www61) was upgraded to a Pentium III 733 MHz, with 256 MB
of RAM. Downtime was under ten minutes.
[Sep 25, 2000, 4:43 pm] humph upgrade complete
Maintenance on humph has been completed. This server is upgraded
to a P-III 733MHz with 15GB of drive space.
[Sep 25, 2000, 4:05 pm] perthro downtime
perthro (www143) crashed under heavy load and required a manual reboot. Total downtime was approximately 10 minutes.
[Sep 25, 2000, 12:33 pm] humph maintenance
humph (118) will be brought down briefly today for maintenance and
upgrades. Downtime should be less than 15 minutes.
[Sep 25, 2000, 12:08 pm] Networking Changes
We are currently in the process of activating our long-awaited OC-3c
circuit to UUnet. Our existing DS-3 will remain in place as an
emergency failover circuit. During this changeover, no traffic should be
lost, but some customers may experience intermittent latency as traffic
shifts between circuits.
We will post further details once this change has been completed.
[Sep 21, 2000, 7:05 pm] eter Downtime
neter (www132) crashed requiring a reboot. Downtime was approximately 10
minutes.
[Sep 21, 2000, 1:12 pm] Genuity Upgrade
We have completed an urgent upgrade to our Genuity circuit, which
should alleviate congestion for customers using that link. We still have
a pending order to rehome our circuit from Washington DC to Cleveland OH
for improved performance; this should be complete within the next 30 days.
[Sep 20, 2000, 10:49 pm] nuin maintenance completed
The upgrade for nuin has been completed. This server is currently
alive as a P-III 733 with 15GB of drive space.
[Sep 20, 2000, 2:26 pm] bemnet downtime
bemnet (www158) crashed under heavy load. It has since been rebooted after
a thorough file system check. Downtime was under 15 minutes.
[Sep 20, 2000, 12:42 pm] nuin upgrade
nuin will be brought down briefly today for scheduled maintenance
in an upgrade of drive space and processor. Each downtime should be no more
than 15 minutes. Details will follow when this maintenance upgrade is
completed.
[Sep 19, 2000, 7:07 pm] Network problems
One of our Cisco 7200's experienced a memory problem, causing errors with
its CEF routing tables. We performed emergency maintainence, and it now
properly routing traffic again. We expect performace to be stable, but are
closely monitoring the router to be sure.
[Sep 19, 2000, 3:24 pm] Genuity Upgrade
We are seeing some saturation on our DS-3 circuit to Genuity; an
emergency upgrade is being implemented which should alleviate the problem.
Ths upgrade will take effect Wednesday morning.
[Sep 19, 2000, 12:46 pm] bemnet downtime
bemnet rebooted under heavy load, and returned after a file
systems check with approximately 10 minutes of downtime.
[Sep 19, 2000, 11:33 am] sether downtime
sether (www95) was rebooted to clear up an unrecoverable high load
condition. Downtime was under two minutes.
[Sep 19, 2000, 6:12 am] coll Downtime
coll (www22) crashed due to a mail loop and was rebooted.
Downtime was approximately 10 minutes.
[Sep 18, 2000, 10:56 pm] vala upgrade
vala (www59) has been upgraded to a Pentium III 733 MHz, with 256
MB of RAM. Downtime was under 10 minutes.
[Sep 18, 2000, 10:28 pm] noldo upgrade
noldo (www56) has been upgraded to a Pentium III, 733 MHz server
with 256 MB of RAM. Total downtime was under 10 minutes.
[Sep 18, 2000, 10:19 am] pi Downtime.
pi (www10) crashed under heavy load. It has since been rebooted
after a thorough file system check. Downtime was 20-25 minutes.
[Sep 17, 2000, 8:53 pm] ungwe Mail Delivery
We have identified and corrected a problem with mail delivery on
ungwe (www48) that caused some users' mail to be queued
throughout the day instead of being delivered. No mail was lost,
and at this time all queued mail has been distributed to the
appropriate account.
[Sep 16, 2000, 2:43 pm] chi Downtime
chi (www14) crashed under load and was brought back online
with downtime of approximately 10 minutes.
[Sep 16, 2000, 9:54 am] epsilon Downtime
epsilon (www3) crashed under load and was brought back online
with downtime of approximately 10 minutes.
[Sep 15, 2000, 6:45 pm] derba Downtime
derba (www104) suffered a crash due to heavy load. It was brought back
online within 5 minutes.
[Sep 14, 2000, 1:21 pm] jarre downtime
jarre halted under heavy load, and was rebooted. Downtime was less
than 10 minutes.
[Sep 14, 2000, 9:06 am] eeoth Downtime
eeoth crashed under heavy load, and has since been rebooted.
Downtime was approx. 6-8 minutes.
[Sep 13, 2000, 3:47 am] Genuity Maintenance
During the course of routine maintenance, Genuity's switch in Washington DC
was reset, resulting in 3 minutes of downtime and about 15 minutes of
latency to our Genuity DS-3 circuit. Their switch was brought back online
with no complications, and traffic has resumed to normal.
[Sep 11, 2000, 9:45 pm] cele upgrade
cele was brought down for approximately 10 minutes to complete an
upgrade of its hard drive. This server now has a capacity of 16GB
[Sep 11, 2000, 3:55 pm] pyyl Downtime
pyyl (www97) crashed under heavy load, and was brought back online
with less than ten minutes downtime.
[Sep 11, 2000, 8:05 am] pair2000 Deployment
Based on recent bug reports accumulated during the system maintenance
period, we have elected to extend system maintenance for an additional
week, in order to resolve known problems before expanding the pair2000
system to encompass additional users. Our conversion schedule will be
resumed starting the next week.
Servers scheduled for this week will be redispersed throughout the rest
of the month's schedule.
[Sep 10, 2000, 8:03 pm] quan Downtime
quan (www120) crashed multiple times in a row, and required an
extensive filesystem check with each crash. As there was no sign of any
load problem, we suspected a hardware problem such as flaky RAM or
a failing motherboard. We have swapped the system to a new Athalon 700MHz,
with 384MB SDRAM. The server appears to be acting normal again, and we do
not expect to see any more stability problems with this server. Total
downtime during the crashes and swap was a little over an hour.
[Sep 10, 2000, 12:50 am] UUnet Resolution
Beginning around 10:15pm Eastern time this evening, our gateway router to
UUnet went out of service. This was a repeat of the incident that
was believed to be resolved at 6pm. The problem was traced to a Gigabit
Ethernet interface present in the router. Once disabled, the router
promptly returned to normal service. For approximately thirty minutes,
some traffic that normally passes through UUnet was delayed or
dropped by the malfunctioning router.
Needless to say, we continue to be disappointed with the behavior of
Gigabit Ethernet on the Cisco 7507 platform. This problem was triggered
by the appearance of Gigabit traffic from our Black Diamond switches; the
switches do not appear to have been culpable. As recently indicated in the
Insider Newsletter, we have
committed to the Juniper routing platform for future expansion; the
Juniper happens to be well-known for wire-speed Gigabit Ethernet
performance and carrier-class reliability.
We offer our sincere apology to any customer affected by this incident.
We will continue to seek to improve reliability, performance, and
redundancy as our network expands.
[Sep 9, 2000, 10:53 pm] UUNet Gateway Maintenance
We are currently conducting emergency maintenance on our UUNet Gateway. We
apologize for any inconvenience our customers experience during this
maintenace, and we hope to have it resolved shortly.
[Sep 9, 2000, 6:09 pm] quan Downtime
quan crashed under heavy load, and was brought back online with
less than ten minutes downtime.
[Sep 9, 2000, 5:46 pm] UUnet Gateway Failure
During testing of our Black Diamond switch configuration earlier today,
an unexplained interaction with the Cisco router serving our gateway to
UUnet led to degraded resources which ultimately resulted in poor
processing by that router. During the day, a limited set of routes were
unavailable; overall traffic levels were not reduced noticeably, but there
was at least one specific customer complaint through which the problem was
identified.
While working with the degraded router, it crashed and was brought back
online manually. It is now serving our UUnet traffic successfully,
without any incorrect routes, and will be monitored closely for further
resource problems. The Black Diamond switches have been disconnected from
our LAN, pending an investigation of the origin of this interaction.
We apologize for the inconvenience to any customer who had difficulty or
delays in reaching our network during this period, including the
UUnet gateway outage of approximately fifteen minutes, during which
traffic was carried by alternate providers. We are working carefully on
the switch integration to ensure that there are no interactions such as
what was experienced today. We shall remain careful and cautious.
[Sep 7, 2000, 4:05 am] zatz Downtime
zatz (www122) crashed under heavy load. It was brought back online within 5
minutes.
[Sep 5, 2000, 5:57 pm] omega downtime
omega was rebooted, and given an extensive filesystem check. It
has returned to service with less than 10 minutes downtime
[Sep 3, 2000, 3:38 am] kodh downtime
kodh (www90) crashed under heavy load, and had to be brought
back up. Total downtime was 20 minutes.
[Aug 31, 2000, 6:44 pm] pair2000 Mail Delivery
Mail delivery in the pair2000 system has been enhanced to provide
the X-Envelope-To field that some of our users have come to rely
on. This is a nonstandard field that our sendmail-based system currently
provides on most mail deliveries. We have also corrected a glitch that was
causing the incorrect envelope sender to be included on some mail delivered
to virtual mailboxes.
For more information on pair2000, please visit
http://www.pair2000.com/.
[Aug 29, 2000, 10:08 am] unque Upgraded
As a result of the Ethernet upgrade on unque, the server is now a
Pentium III at 733 MHz, with 256MB RAM. This emergency maintenance has
caused unque's pair2000 upgrade to be rescheduled for tomorrow.
[Aug 29, 2000, 9:25 am] unque Emergency Maintenance
The Ethernet card on unque is failing, causing severe traffic
problems on that server. It is being replaced on an emergency basis, and
will be out of service for no more than ten minutes.
[Aug 28, 2000, 11:32 am] tiwaz downtime
tiwaz (www146) crashed under heavy load, and was brought back
online. Downtime was around 10 minutes.
[Aug 28, 2000, 11:11 am] pi downtime
pi (www10) crashed under heavy load. After an extensive
filesystem cleaning it was brought back up, with downtime around 20
minutes.
[Aug 26, 2000, 12:30 pm] omicron Downtime
omicron (www9) crashed under load, and was brought back online with
downtime less than 15 minutes.
[Aug 25, 2000, 5:20 am] cele Downtime
cele crashed under heavy load and was rebooted. Total downtime
was approximately ten minutes.
[Aug 24, 2000, 5:41 pm] neled downtime
neled (www127) crashed under heavy load, and was brought back
online with downtime under ten minutes.
[Aug 24, 2000, 2:32 pm] shen downtime
shen (www88) crashed under high load and had to be rebooted. Total
downtime less than 15 minutes.
[Aug 22, 2000, 4:03 am] ingwaz downtime
ingwaz (www149) crashed under high load and had to be rebooted. Total
downtime less than 15 minutes.
[Aug 19, 2000, 5:04 pm] anca Downtime
anca (www53) ran out of swap space, requiring a reboot. Downtime was 15
minutes.
[Aug 18, 2000, 7:11 pm] dair Downtime
dair (www20) was down for approximately 10 minutes to clear up
a network problem. It has since returned to normal operation.
[Aug 18, 2000, 3:03 am] Genuity Routing
Based on customer feedback, we are making some adjustments to routing
decisions in our network, with respect to the new Genuity DS-3.
As of 3am, we have lowered the inbound preference for this line, shifting
the majority of our inbound traffic towards other providers. Adjustments
to outbound traffic will be made within the next 10 days, based on an
ongoing detailed analysis of traffic flow performance. This should result
in optimal performance for all Internet destinations, based on the routes
available to us from our five backbone providers.
[Aug 18, 2000, 2:43 am] ilceille Resolution
The network interface failure on ilceille was traced to obscure
corrupted configuration data. The error has been corrected and the server
returned to normal service. Total downtime for most domains on this server
was two hours.
This was one of the most unusual problems we've seen. After exhausting
what we felt were all reasonable software solutions, the cabling and
Ethernet card were swapped out. When that failed, an in-depth study was
conducted to finally identify the cause. We do not expect this to recur.
[Aug 18, 2000, 1:41 am] iceille Downtime
iceille (www189) is experiencing network interface problems. We are working
to correct the problem and expect it to be up shortly.
[Aug 17, 2000, 4:07 pm] ipre downtime
ipre crashed due to heavy load, and has been brought back online.
Total downtime was less than 15 minutes.
[Aug 17, 2000, 3:45 pm] emancholl downtime
emancholl (www37) crashed under heavy load, and was brought back
online with downtime around 10 minutes.
[Aug 16, 2000, 11:01 am] epsilon downtime
epsilon (www3) crashed today under heavy load. It was able to reboot automatically. Total downtime was under five minutes.
[Aug 16, 2000, 3:35 am] thesel downtime
thesel (www90) crashed under high load and had to be rebooted.
total downtime approx. 20 minutes.
[Aug 15, 2000, 4:38 pm] bemnet downtime
bemnet (www158) crashed under heavy load, and was brought back
online with downtime under ten minutes.
[Aug 15, 2000, 3:23 pm] cele downtime
cele (www73) crashed under heavy load, and was brought back online
with downtime around 10 minutes.
[Aug 14, 2000, 1:40 pm] sowilu downtime
sowilu (www145) crashed under heavy load, and was brought back
online with downtime under 10 minutes.
[Aug 14, 2000, 9:20 am] thuule downtime
thuule (www49) crashed under high load this morning at 9:10am and had to be
rebooted. Total downtime, 9 minutes.
[Aug 14, 2000, 3:24 am] ampa upgrade
ampa (www52) was upgraded to a Pentium III, 733MHZ with 256MB of
RAM. Downtime was 5 minutes.
[Aug 14, 2000, 3:00 am] thuule upgrade
thuule (www49) was upgraded to a Pentium III, 667 MHZ with 256MB
of RAM. Total downtime for this maintance was 10 minutes.
[Aug 14, 2000, 2:35 am] quesse upgrade
quesse (www44) has been upgraded to a Pentium III 667 MHZ with
255MB of RAM. Total downtime was 5 minutes.
[Aug 11, 2000, 9:59 pm] Digex/Sprint Connectivity Issue
Digex is currently experiencing a connectivity problem with Sprint. This
outage appears to be only affecting customer within home.net, as Digex to
Sprint is the return path from pair. They are aware of the problem and expect
resolution within 2 hours. More information to be posted as it becomes
available.
[Aug 11, 2000, 4:57 pm] calma Downtime
calma (www43) crashed under heavy load and had to be rebooted. Total
downtime was under five minutes.
[Aug 11, 2000, 12:43 pm] pi Downtime
pi (www10) crashed under heavy load and had to be rebooted. A manual disk
clean was needed. Total downtime was about 20 minutes.
[Aug 11, 2000, 2:17 am] bhoth downtime
bhoth(www163) crashed under high load and had to be rebooted.
Downtime less than five minutes.
[Aug 11, 2000, 1:23 am] aaze upgrade
aaze (www65) was upgraded to a Pentium III 733MHz, with 256MB of
RAM. Total downtime was about 10 minutes.
[Aug 11, 2000, 12:54 am] ungwe upgrade
ungwe (www48) was upgraded to a Pentium III 733MHz with 256MB of
RAM. Total downtime was about 5 minutes.
[Aug 11, 2000, 12:18 am] nwalme upgrade
nwalme (www57) was upgraded to a Pentium III-667 MHz, with 256 MB
of RAM. Downtime was under five minutes.
[Aug 10, 2000, 11:50 pm] calma upgrade
calma (www43) has been upgraded to a Pentium III-733 MHz, with 256
MB of RAM. Downtime was approximately five minutes.
[Aug 10, 2000, 11:12 pm] cuzea upgrade
cuzea (www94) was upgraded to a Pentium III, 733 MHz with 256 MB
of RAM. Downtime was under five minutes.
[Aug 10, 2000, 10:53 pm] silme upgrade
silme (www64) was upgraded to a Pentium III 733 MHz with 256MB
of RAM. Total downtime was about 10 minutes.
[Aug 10, 2000, 10:08 pm] parma upgrade
parma (www42) was upgraded to a Pentium III, 733 MHz with 256MB
of RAM. Total downtime was about five minutes.
[Aug 9, 2000, 10:29 am] ingwaz downtime
ingwaz (www149) crashed due to a mailing loop, and came back up
smoothly. Downtime was around five minutes.
[Aug 9, 2000, 4:20 am] anca downtime
anca(www53) crashed under high load at around 3:50 Eastern, abd
had to be rebooted. Total downtime, 26 minutes.
[Aug 9, 2000, 2:17 am] vilya downtime
vilya(www160) was temporarily disabled due to a mailing loop at
around 2:00am Eastern. Downtime lasted approx. 5 minute.
[Aug 8, 2000, 5:03 pm] Genuity Activation
Our DS-3 circuit to Genuity is fully in service and running
smoothly. Several customers are reporting performance improvements.
Overall, less than ten percent of our traffic is being carried by
Genuity, but for that ten percent, there should generally be an
improvement.
[Aug 8, 2000, 12:26 pm] Genuity Activation
We are in the process of turning up our DS-3 connectivity to Genuity
at this time. Traffic will be shifted between providers several times, but
there should be no significant ill effects. Performance will improve for
certain destinations.
We will post further when the new traffic has been stabilized.
[Aug 8, 2000, 12:49 am] SAVVIS Update.
At approximately 11:11 pm (EST) Our SAVVIS connection started to return to normal service.
We have been informed by a SAVVIS representative that the problem
was caused by an outage in their ATM and Frame Relay switches here in the
Pittsburgh Area.
[Aug 7, 2000, 11:08 pm] SAVVIS Outage
Beginning around 6:30pm, SAVVIS lost connectivity to Pittsburgh once again.
This outage, like the previous incident on Sunday, affects all Pittsburgh
customers of SAVVIS. SAVVIS has assured us that the outage is being
handled at the highest possible level, and they are working with their
circuit carrier to attempt to resolve the problem.
We have the turn-up of our DS-3 circuit to Genuity scheduled for Tuesday
morning; in the event that SAVVIS is unable to restore service by that
time, Genuity will be handling that traffic instead. In the meantime,
traffic is flowing over alternate paths with no adverse effect.
We will post more information as it becomes available.
[Aug 7, 2000, 6:43 pm] SAVVIS Outage
Our SAVVIS line went down again approximately ten minutes ago, and
there is no current estimated uptime. It is unknown yet whether or not this
problem is related to the previous problem experienced today, but we will
post once we know more information.
[Aug 7, 2000, 8:17 am] SAVVIS Resolution
Our circuit to SAVVIS came back online at about 6:00am Eastern Time.
A representative from SAVVIS informed us that this trunking problem
should not occur again, as they will have the site where the difficulty is
occuring staffed more thoroughly in the future.
[Aug 7, 2000, 4:03 am] SAVVIS Outage
We are currently seeing a drastic reduction of the amount of traffic flowing on our SAVVIS circuit due to a trunk problem in the New York-Chicago corridor, and our service with them has been down for approximately 20 minutes as a result. In the meantime, traffic is taking alternate paths with no degradation of service.
[Aug 6, 2000, 11:04 pm] arda downtime
arda (www62) crashed under heavy load, and was brought back up
cleanly. Downtime was 10 minutes.
[Aug 6, 2000, 2:02 pm] SAVVIS Outage
SAVVIS has traced their trunking problem to a router failure which they
are currently working to repair. They expect to restore service by 5pm
Eastern time. Traffic is flowing over alternate paths during this outage.
[Aug 6, 2000, 8:28 am] SAVVIS Outage
SAVVIS is currently experiencing a trunk problem in the
New York-Chicago corridor, and our service with them has been down for
approximately 15 minutes as a result. They expect to have the matter
resolved within an hour; in the meantime, traffic is taking alternate
paths with no degradation of service.
[Aug 5, 2000, 6:46 pm] jarre downtime
jarre (www172) crashed today under high load. It was brought
back up cleanly. Total downtime was 10 minutes.
[Aug 2, 2000, 3:24 am] tinne Downtime
tinne (www21) crashed under high load at 2:46am and had to be rebooted. Was
brought back up to full activity at 3:13am
[Aug 1, 2000, 3:49 pm] umbar Downtime
umbar (www46) crashed under load, requiring a reboot. Downtime was
approximately 10 minutes.
[Jul 31, 2000, 11:39 pm] nuumen Downtime
nuumen(www55) reset itself due to heavy load. Downtime was less
then 10 minutes.
[Jul 31, 2000, 4:45 pm] theta Downtime
theta(www4) crashed under heavy load, and required extensive filesystem repair before it could be rebooted. Total downtime was approximately 25 minutes.
[Jul 31, 2000, 12:30 am] kodh Downtime
kodh(www90) rebooted under heavy load, and required extended file
system cleaning before being returned to normal operation. Downtime was
about 30 minutes.
[Jul 29, 2000, 12:48 am] idad Upgrade
After a third crash in less than an hour, idad has been upgraded
to a Pentium III at 733 MHz with 256MB SDRAM. We do not expect any further
stability problems from this server.
[Jul 28, 2000, 11:18 pm] idad Downtime
idad has crashed and quickly returned to service three separate
times today. As there is no sign of any load problem, we suspect a
hardware issue such as flaky RAM or a failing motherboard. We will be
swapping the system out within the next 72 hours.
[Jul 28, 2000, 9:30 pm] thurisaz Downtime
thurisaz (www136) crashed under heavy load. It has been returned
to normal service and had a downtime of about 5 minutes.
[Jul 28, 2000, 6:40 pm] derba Downtime
derba (www104) crashed under heavy load. It has since been
returned to normal operation with a downtime of under 10 minutes.
[Jul 28, 2000, 1:24 pm] jarre Downtime
jarre (www172) crashed under heavy load, and had to be rebooted. Total downtime was approximately 10 minutes.
[Jul 27, 2000, 3:57 am] idad downtime
idad (www32) crashed under high load and had to be rebooted. Total
downtime, approximately 15 minutes.
[Jul 26, 2000, 1:30 pm] cele Downtime
cele (www73) crashed under heavy load, and had to be brought back up manually. Total downtime was under 15 minutes.
[Jul 25, 2000, 10:09 pm] straif downtime
straif (www26) crashed under heavy load, and was brought back
up. Total downtime was five minutes.
[Jul 23, 2000, 8:19 pm] gao Downtime
gao (www111) was rebooted this evening to clear up a network
problem. Total downtime was less than 5 minutes.
[Jul 23, 2000, 4:37 am] uilen Downtime
uilen crashed under heavy load, and required extensive manual
filesystem cleaning before returning to service. Downtime was
approximately 40 minutes.
[Jul 22, 2000, 7:09 pm] parma downtime
parma (www42) was brought down for a routine upgrade at 6:24PM
EST. There was a problem with the upgrade, and the maintance has been
deferred. Total downtime was 30 minutes.
[Jul 22, 2000, 5:15 am] Router Maintenance
Between 4am and 5am Eastern time, routine upgrades were performed on one
of our gateway routers. Customer traffic was temporarily rerouted but was
not disrupted. These upgrades move us further towards the deployment of
the Extreme Black Diamond switches.
[Jul 20, 2000, 1:26 am] Sprint Resolution
Apparently the traffic reduction was due to some routine maintenance on Sprint's part which has since been taken care of. All appears to be well now.
[Jul 20, 2000, 1:05 am] Sprint Trouble
We are currently seeing a drastic reduction of the amount of traffic flowing on our Sprint circuit. All customer traffic is flowing over alternate paths at this time, and we are working with Sprint to find and fix the problem.Further details will be posted as they become available.
[Jul 19, 2000, 10:57 am] pair2000 Update
ipre (www181), sothor (www182), and khin (www184)
have been converted to the new pair2000
system. Users on these servers may now control their account via the
Account Control Manager, located at https://acc.pair2000.com/.
Two other servers scheduled for today, aerre (www185) and
linato (www186), have been delayed for one week while we make
improvements to the pair2000 architecture. The current schedule may be
found at http://www.pair2000.com/schedule.html
[Jul 19, 2000, 10:30 am] sowilu Downtime
sowilu (www145) crashedunder heavy load at approximately 10:10 EDT.
Downtime was approximately 10 minutes.
[Jul 18, 2000, 11:32 pm] laguz Downtime
laguz (www148) crashed under heavy load. Downtime was less than 10 minutes.
[Jul 16, 2000, 5:02 pm] fearn Downtime
fearn (www40) crashed under heavy load. After coming back up these problems
lingered until tracked down to a misbehaving .procmailrc file. Total
downtime was 30 minutes.
[Jul 16, 2000, 8:48 am] idad downtime
idad (www32) crashed under high load at 8:33AM EST. Total
downtime was 10 minutes.
[Jul 14, 2000, 7:58 am] Miva Upgrade
Miva Empresa, the CGI binary used by Miva Merchant, has been upgraded
to version 3.71 on all commerce servers. Miva Corporation reports that
this new version patches several security problems their testing
uncovered.
[Jul 14, 2000, 7:16 am] UUnet Update
UUnet continues to have serious problems with one of their core
routers in the Washington, DC area. Although we are assured that the
matter has been escalated, and it appears to affect all of their traffic
on the East Coast, there is still no projected window of repair, nor is
there any updated information on the official UUnet Status page
at http://www.noc.uu.net/.
pair Networks, Inc prides itself on informing its
customers honestly and directly, even in the case of significant failures
such as the two incidents with switch failures in the past 24 hours. We
are severely disappointed that UUnet cannot acknowledge the
severity of any problem, provide proper escalation and feedback, and
notify the public, or at least their customers, through the channels
they have setup for that purpose.
[Jul 14, 2000, 7:11 am] Switch Failure
One of our Extreme Networks switches went offline at approximately 5:50am
today. The switch was completely non-responsive and had apparently lost
power. The switch was swapped with one of our emergency spares, and is
currently back in normal operation. Total downtime for the affected
servers was approximately 50 minutes.
The power supply of the switch has apparently failed; the A/C fuse was
completely vaporized. Yesterday's switch problem appears to be related
to faulty Flash memory. The power supply problem on a different switch
this morning appears to be completely unrelated. We do not believe that
either failure is related to the recent software upgrades performed on
the switches; both failures seem to be hardware-related.
We apologize for this repeated incident. We continue to have the greatest
faith in the Extreme hardware, and will be replacing the failed hardware
promptly.
[Jul 14, 2000, 6:12 am] Network Errors
In a seperate incidnet not related to the network errors experienced on
July 12, one of our switches has physically failed, affecting approximately
20 servers. The switch is currently being swapped, and the servers should
be back online shortly.
[Jul 14, 2000, 2:17 am] UUnet Trouble
Our UUnet traffic has taken a sudden downward spike five times in
the past twelve hours. Upon contacting UUnet, they are reporting
difficulties with a linecard in one of their core routers. The problem
does not appear to be yet resolved, but we will continue to monitor the
situation and report any further problems.
[Jul 13, 2000, 11:27 am] Switch Replaced
The Extreme Networks switch that was failing has been replaced, and all
affected servers are back in normal operation. Including time to
reconfigure some servers, the swap took approximately 25 minutes. We
will continue to monitor the new switch closely to ensure there are no
further problems.
[Jul 13, 2000, 10:12 am] Switch Failure
The lingering server problems have been traced primarily to a failing
Extreme Networks switch. That switch is now being replaced on an emergency
basis. Total downtime for affected servers should be less than ten
minutes. Further information will be posted when the matter is resolved.
[Jul 13, 2000, 9:50 am] Server Maintenance
We have discovered lingering network performance issues on a number of
user servers, resulting from the new configuration of our switches after
last night's software upgrade. We will be reconfiguring servers, and in
some cases, taking them offline briefly for a network card upgrade, in
order to eliminate this problem. User impact will be minimized to the
greatest extent possible.
[Jul 13, 2000, 7:28 am] idad downtime
idad (www32) crashed under high load at around 6:18am EST. Downtime
approximately 20 minutes.
[Jul 12, 2000, 11:43 pm] ydhu downtime
In a separate incident not related to the network troubles, ydhu
(www83) rebooted under heavy load. After a manual file systems check,
it has been brought online. Approximate downtime was 20 minutes.
[Jul 12, 2000, 11:40 pm] Network Problems Resolved
The network problem we encountered earlier while upgrading our network
switches has been resolved. Total downtime on affected servers was
approximately 1 hour, and all user servers are back in full operation at
this time.
[Jul 12, 2000, 11:01 pm] Network Problems
During the course of routine maintaince, we upgraded software on our
switches. The new software version didn't take on two of the switches, and
we are backtracking the revision now.
[Jul 12, 2000, 10:50 am] pi downtime
pi (www10) crashed under heavy load and required an extensive
filesystem cleaning. Downtime was around 25 minutes.
[Jul 11, 2000, 11:24 am] uumor downtime
uumor (www159) was rebooted to clear up a high load condition.
Downtime was approximately one minute.
[Jul 10, 2000, 7:16 pm] Network Maintenance
On Wednesday night, 7/12/00, from approximately 8-10pm we will be
performing further maintenance on our network equipment. There will be less
than a minute of total interruption of connectivity for each affected server.
[Jul 10, 2000, 4:16 am] cele downtime
cele (www73) crashed under high load and had to be rebooted 3 times.
[Jul 10, 2000, 1:44 am] bemnet downtime
bemnet (www158) crashed under high load, rebooted normally, total downtime,
less than 20 minutes.
[Jul 6, 2000, 11:16 pm] iota downtime
iota (www5) crashed under heavy load, and was brought back online with 10
minutes downtime.
[Jul 6, 2000, 7:52 am] kodh Resolved
kodh's primary drive suffered an apparent failure at approximately
6:30am. After reseating all cables, the drive came back online and booted
after an extensive filesystem cleaning. We do not believe the problem will
persist. Total downtime was approximately fifty minutes.
[Jul 6, 2000, 6:50 am] Downtime for kodh
A drive has failed on kodh (www90). We are currently working to restore it
from backup. More information will be posted as it is available.
[Jul 6, 2000, 6:38 am] idad Downtime
idad (www32) crashed under heavy load and was brought back online with downtime
around 30 minutes.
[Jul 3, 2000, 3:33 pm] Network Maintenance
We will be performing routine maintenance on our network switches
Thursday night (7/6/00) from 8-10pm. Customer traffic should not be
adversely affected.
[Jul 1, 2000, 11:33 am] xi Cleanup
Cleanup of all files on xi has been completed. Some files were
restored from Friday's backup, while others were recovered directly from
the failing drives. Logs for June 30th have been generated, and all mail
has been restored (older mail was temporarily missing). Any user who
encounters problems should report them to support@pair.com or
urgent@pair.com as appropriate.
We apologize for the difficulty caused by this drive failure. We believe
damage was minimal, and the server is now back in regular service with a
faster drive, more memory, and a more powerful CPU.
[Jul 1, 2000, 9:49 am] emancholl Downtime
emancholl(www37) crashed under heavy load, and required manual
disk cleaning before bringing it back online. Downtime was 50 minutes.
[Jul 1, 2000, 1:28 am] xi emergency maintenance
Restoration of files from xi's failed hard drive is continuing,
and about 3/4 done. User access has been restored to the service, though
not all files have been restored yet.
In the process of the maintenance, we took the opportunity to upgrade
xi to a Pentium III 667 MHz with 256 MB of RAM.
[Jun 30, 2000, 8:36 pm] xi maintenance
During the drive swap on xi to replace it's failing /u3 drive, the
drive died. We are currently restoring the data from backups, and should have
it fully operational again shortly. Another notice will be posted at that
time.
[Jun 30, 2000, 2:51 pm] xi Maintenance
Effective immediately, we are beginning a drive swap on xi, which
is showing drive errors on the /u3 partition. Initial downtime will be
approximately ten minutes, and the swap will be completed on Saturday,
with another brief downtime. This will result in more disk space, memory,
and CPU capacity for xi. Customer traffic should not be
significantly affected.
[Jun 30, 2000, 2:47 pm] one.pairlist.net Maintenance
In light of recent crashes and data corruption on
one.pairlist.net, it is being replaced with a temporary server
effective immediately. The new server is somewhat less powerful (Pentium
II at 450MHz), but will be upgraded as load on the system requires.
Downtime will be less than ten minutes.
[Jun 30, 2000, 2:38 pm] news.pair.com Failure
The power supply on news.pair.com failed. The system drive was
promptly moved to a new chassis, and is now running as a Pentium III at
667MHz, with 256MB RAM. We will be building an entirely new news server
in the near future.
[Jun 30, 2000, 1:57 pm] pairlist.net Outage
While working to resolve a mailing loop on a particularly large mailing
list, a piece of software supporting Mailman on pairlist.net became
corrupted. It was promptly reinstalled, and the offending list has been
suspended from service.
There was a brief period during which mail to all lists bounced. We are
planning an upgrade to Mailman 2.0 as soon as it is released.
[Jun 29, 2000, 11:40 pm] ifin upgrade
ifin (www36) has been upgraded to a Pentium III 667 MHz, with 256
MB of RAM. Downtime was under five minutes.
[Jun 29, 2000, 11:16 pm] or upgrade
or (www34) was upgraded to a Pentium III 667 MHz, with 256 MB of
RAM. Downtime was under five minutes.
[Jun 29, 2000, 10:51 pm] ebad upgrade
ebad (www33) has been upgraded to a Pentium II 667 MHz, with 256
MB of RAM. Downtime was under five minutes.
[Jun 29, 2000, 10:27 pm] edad upgrade
edad (www31) has been upgraded to a Pentium III 667 MHz, with 256
MB of RAM.
[Jun 29, 2000, 6:50 pm] thuule downtime
thuule (www49) crashed at 6:09 pm EST; it was brought back online with 15
minutes of downtime.
[Jun 26, 2000, 11:24 pm] ur upgrade
ur (www30) has been upgraded to a Pentium III 667 MHz, with 256 MB
of RAM. Downtime was under five minutes.
[Jun 26, 2000, 10:53 pm] muin upgrade
muin (www23) has been upgraded to a Pentium III 667 MHz, with 256
MB of RAM. Downtime was under five minutes.
[Jun 26, 2000, 7:47 pm] glikk upgrade
glikk (www119) was taken down briefly for a drive upgrade. The new
total disk space on this server is increased to 20.5GB.
[Jun 25, 2000, 8:57 pm] ungwe downtime
ungwe (www48) crashed around 1AM. It was brought back online with 30
minutes of downtime.
[Jun 24, 2000, 3:28 pm] xi Downtime
xi (www8) crashed, requiring a manual disk cleaning. It was brought back
online with 15 minutes of downtime.
[Jun 22, 2000, 5:33 pm] UUnet Resolution
UUnet's line has returned to normal service as of about 5:00PM EST.
A representative from UUnet has been in contact with us to describe
to us the cause of the failure.
As stated previously we will be working even faster now to add extra
connectivity, so as to reduce the effect of future outages.
[Jun 22, 2000, 12:07 pm] UUnet cont.
The problem has been identified as with UUnet's Pittsburgh POP, and not
with any of our equipment. Work is currently in progress, and we are
awaiting status reports from them on the progress of this outage.
[Jun 22, 2000, 11:49 am] UUnet outage
We have lost contact with UUnet's Pittsburgh POP. We are currently working
with UUnet to identify the source of the problem and repair. We will post
additional details as they become available.
[Jun 20, 2000, 11:07 am] UUnet Restored to Service
After an outage of almost exactly seventeen hours, UUnet's
connectivity to Pittsburgh has been restored. We apologize for the
inconvenience to our customers and their site visitors. We are working
hard to expand our connectivity to further reduce the possible effects of
any future outages.
[Jun 20, 2000, 10:36 am] UUnet Update
We are now in the 17th consecutive hour of UUnet's complete
Pittsburgh outage. The latest report from UUnet is that as of
9:30am Eastern, fiber splicing has been completed, and end-to-end testing
has commenced. They expect a return to normal service sometime this
morning.
We have accelerated our Genuity DS-3 to next Monday, June 26, and
expect also to have our Sprint OC-3c online as early as the end of
July. We are guiding our bandwidth expansion decisions by UUnet's
apparent problems in maintaining connectivity to our city. It is
unacceptable for their connectivity here to be vulnerable, as it
apparently is, to a single point of failure.
[Jun 20, 2000, 1:58 am] UUnet Update
WorldCom is reporting an outage of eleven OC-48 long-haul circuits,
effectively partitioning UUnet's Pittsburgh POP from the rest of
their network. This is the same problem that occurred in March of this
year. We are taking steps with UUnet to reduce the impact of such
future outages, and will be favoring other providers not only during the
outage, but in future purchasing decisions as well.
We will post further information when it becomes available.
[Jun 19, 2000, 6:20 pm] UUnet Outage
Our UUnet is currently down. We are working with a technician to
bring our line back up. We only have information currently about an outage
in the Pittsburgh area and will post more updates once we have more data.
[Jun 19, 2000, 2:30 pm] idad maintenance completed
The harddrive upgrade of idad (www32) has been completed. The new
total disk space on this server is 20.5GB.
[Jun 19, 2000, 12:41 pm] UUnet Traffic
Due to an unfortunate two-week delay in the final cross-connect of our
OC-3c circuit to UUnet, we are currently experiencing saturation
and latency on our UUnet DS-3 circuit. We have shifted some traffic
to Digex and Sprint in order to compensate; this will reduce
but not eliminate the problem.
We expect to be able to turn up the new circuit on the morning of Tuesday,
June 20. The new circuit will eliminate the problem condition. We also
expect to have our new DS-3 circuit to Genuity active within the
next week, and the OC-3c circuit order for Sprint was recently
accelerated to a target date of late July.
We apologize for the temporary inconvenience. We are doing everything
possible to manage the bandwidth demand.
[Jun 16, 2000, 3:15 pm] idad maintenance
idad (www32) was taken down briefly for a hard drive upgrade.
There will be one more brief downtime to complete this process once the
swap is finished.
[Jun 14, 2000, 11:46 am] jarre downtime
jarre (www172) crashed under heavy load, and came back without incident
after a filesystem check.
[Jun 13, 2000, 12:25 pm] pi downtime
pi (www10) crashed under heavy load and was brought back online
with downtime around 15 minutes.
[Jun 13, 2000, 10:03 am] db3 Upgrade
db3 has been upgraded to FreeBSD 3.4 and MySQL 3.23.11. All
database servers not running under 3.4 will be upgraded in the near
future, in accordance with our database
upgrade plan.
[Jun 13, 2000, 8:50 am] db3 Maintenance
db3 will be taken down briefly this morning for
maintenance.
[Jun 11, 2000, 7:07 am] Digex Resolution
The problem with inbound traffic on Digex was traced to the
resolution of a pre-existing problem, not the beginning of a problem.
We are not seeing any routing or traffic problems with Digex at
this time.
[Jun 11, 2000, 12:03 am] Digex Trouble
Beginning around 8:45pm Eastern time, a precipitous drop in inbound
traffic from Digex was observed. The missing traffic, which has
not appeared on alternate inbound paths, appears to be destined for hosts
in our 209.68.0.0 netblock. We have been working with Digex
technical support in order to determine if there is a routing problem or
filter in their network, but so far nothing has been determined.
Any customer who believes they may be experiencing problems as a result
of this is requested to send a traceroute to urgent@pair.com for prompt
handling.
[Jun 9, 2000, 9:40 am] gebo downtime
gebo (www139) crashed under heavy load and was brought back
online. Downtime was around 10 minutes.
[Jun 8, 2000, 6:24 pm] cuzea downtime
cuzea (www94) crashed under heavy load and has been brought back
online. Downtime was approximately 15 minutes.
[Jun 8, 2000, 11:35 am] chi Downtime
chi (www14) crashed under load. Downtime was approximately 15
minutes.
[Jun 7, 2000, 6:29 pm] UUnet Update
As of 5:45pm, traffic to and from UUnet has returned to normal
levels. UUnet has advised us that two of their backbone routers
went out of service, precipitating this incident.
We will proceed with our plans to shift inbound traffic to avoid this
type of problem with UUnet.
[Jun 7, 2000, 4:55 pm] UUnet Problems
Beginning around 4:40pm Eastern, we have observed a major drop in inbound
UUnet traffic, and no corresponding rise in traffic through other
providers. This implies that, once again, UUnet's network is
partitioned and their internal routing is not properly dropping our
routes. This means that UUnet peering points around the world will
accept traffic destined for pair Networks, but then fail
to deliver that traffic, effectively blackholing much of our traffic.
We will be accelerating efforts to resolve this technical problem with
UUnet; we have expressed our displeasure numerous times before.
We will likely shift our inbound traffic primarily to Sprint in
order to compensate for this effect, in the near future.
Further information about the current brownout will be posted as it
becomes available.
[Jun 6, 2000, 5:35 pm] parma Web Service
Some web sites on parma (www42) experienced intermittent
downtime this morning and afternoon due to a configuration error.
This has been corrected, and the situation should not recur.
[Jun 6, 2000, 3:17 pm] Sprint Problems
Sprint is advising all of its customers of a major outage on the
East Coast, which may affect latency on some portions of their network.
Although we have not seen any significant change in traffic, we are passing
the notice along to any customers who may be affected.
[Jun 6, 2000, 2:00 pm] db3 Maintenance
db3 was taken down briefly to install a hard drive, as the
first step towards its upgrade to FreeBSD 3.4 and MySQL 3.23
(as described on the database upgrade
plan). Downtime was approximately 10 minutes.
[Jun 6, 2000, 10:18 am] UUnet OC-3c Upgrade
Beginning at approximately 8pm Eastern time today, we will be interrupting
our UUnet connectivity briefly as we perform the first stage of our
long-awaited OC-3c upgrade to UUnet. After installation and testing
of the necessary equipment, the circuit switchover will be accomplished
later this week. This will provide a significant increase in our capacity
to UUnet, and allow us to improve overall route selection.
Please note that we have additional network upgrades scheduled in the
near term, including DS-3 to Genuity, and OC-3c to Sprint.
Details are published at
http://support.pair.com/notices/upgrades.html.
[Jun 6, 2000, 6:54 am] falku upgrade complete
The drive swap on falku (www98) is now complete at 6:50am EST.
[Jun 6, 2000, 6:43 am] falku upgrade
falku (www98) was down from 1:35am EST to 1:46am EST for the new
hard drive installation. A second reboot will be performed at 6:45am EST
to bring the new hard drive into live service.
[Jun 6, 2000, 12:55 am] falku upgrade
falku (www98) will be down for a short period tonight while we
perform a hard drive upgrade to solve space issues on that server.
Expected downtime is under 30 minutes, and will begin around 1:30am EST.
[Jun 5, 2000, 4:11 pm] vuae reboot
vuae (www93) was rebooted to clear some file systems problems resulting
from a mail loop. Downtime was less than 5 minutes, and it has since
resumed normal operations again.
[Jun 5, 2000, 3:26 pm] uilen Downtime
uilen (www35) was down for 25 minutes this afternoon after a
mailing loop and an extensive file cleaning. It is now back up and
functioning normally.
[Jun 5, 2000, 6:18 am] xi Reboot
xi (www8) was rebooted to clear a conflict. Downtime was
approximately 5 minutes.
[Jun 5, 2000, 4:56 am] kappa Downtime
kappa (www6) crashed under load and was brought back online
after a systems check. Downtime was approximately 30 minutes.
[Jun 4, 2000, 5:56 pm] tinne Downtime
tinne (www21) was down this afternoon for approximately 10
minutes. It has since returned to normal service.
[Jun 4, 2000, 3:48 am] onn Downtime
onn (www29) crashed and was brought back online after a manual
filesystem check. Downtime was approximately 30 minutes.
[Jun 2, 2000, 5:13 pm] beith Upgraded
The drive swap on beith has been completed, along with a chassis
swap to upgrade the system. beith is now a Pentium III at 600
MHz, with 256MB SDRAM and 20GB drive space.
[Jun 2, 2000, 12:35 pm] kodh downtime
kodh (www90) crashed under heavy load and was restored to service
with downtime around 15 mintes.
[May 27, 2000, 11:05 am] beith Maintenance
One of beith's hard drives is reporting intermittent errors; we
have begun the process of swapping both drives for a single larger model.
This will be completed by Monday, and involves no appreciable downtime.
[May 26, 2000, 9:07 pm] epsilon Downtime
epsilon (www3) crashed under heavy load. Downtime was 15 minutes.
[May 22, 2000, 5:30 pm] UUNet problems
We have noticed problems with our UUNet backbone dropping some traffic.
According to Technical Support, a number of their backbone routers lost
their BGP sessions. They are still working on it and were not able to
offer a specific estimate on when the situation would be improved. We have
received a master ticket for the situation and will continue to monitor it
closely.
[May 19, 2000, 8:49 pm] cicka downtime
cicka (www167) rebooted under heavy load. It was brought back online with
downtime under 10 minutes.
[May 19, 2000, 9:33 am] UUnet Status
After extensive testing last night, UUnet has been unable to
identify the problem with our circuit, which is experiencing brief outages,
long enough to interrupt routing, every two to six hours. We will be
working with them again today to attempt to resolve the issue. This does
not have a major effect on customer traffic, but is not acceptable to us,
either.
[May 18, 2000, 6:47 pm] UUNet Problems
We are currently experiencing a series of short (10-20 second) outages on
our DS-3 to UUNet. They are investigating the problem and it should be
resolved by morning. Customer traffic is flowing over alternate circuits
during these brief outages, and should not be adversely affected. Further
information will be posted as it becomes available.
[May 15, 2000, 11:20 am] pi downtime
pi (www10) crashed under heavy load and was brought back online
with downtime around 20 minutes.
[May 15, 2000, 2:12 am] uilen Downtime
uilen (www35) crashed under heavy load. It was brought back online within 5
minutes.
[May 13, 2000, 1:23 am] anca downtime
anca (www53) rebooted under heavy load. It was quickly brought
back online with downtime under 10 minutes.
[May 11, 2000, 3:26 pm] pi downtime
pi (www10) crashed under heavy load and was brought back online
with downtime around 20 minutes.
[May 11, 2000, 12:50 am] SAVVIS Network Problems
From approximately 8pm-Midnight tonight, we were seeing some problems with
our DS3 to SAVVIS due to faulty filters on their gateway router. Customer
traffic should not have been adversely affected, though there were times
when some traffic would have been being routed to SAVVIS, and not accepted
for delivery. The problem has been found and corrected at this time, and we
will be monitoring the circuit closely for the next few days to ensure that
no further problems recur.
[May 10, 2000, 3:48 pm] dyyme downtime
dymme (www71) was down this afternoon for approximately 10 minutes. It has
since returned to normal operation.
[May 10, 2000, 2:04 pm] UUNet Update
According to UUNet, the problem reported earlier were related to the
reload of a backbone in the DCA area. They believe that everything
should at this time be stable.
[May 10, 2000, 12:43 pm] UUnet Problems
UUnet traffic has dropped by more than 50% in the past fifteen
minutes, leading to an overall drop in outbound traffic. This suggests
another significant outage has taken place somewhere in UUnet's
network, and because of their non-dynamic internal routing, they may be
blackholing some traffic destined for our network.
We will be pursuing this issue with UUnet management. Outages
elsewhere in the UUnet network should not have this degree of
effect; nor do other backbones experience the blackhole problem. We will
post more information on this brownout as it becomes available.
[May 8, 2000, 5:51 pm] tinco maintenance completed
The drive maintenance on tinco has been completed, with an upgrade
to 20.5GB of drive space. If any problems are detected as a result of this
operation, please let us know at support@pair.com.
[May 5, 2000, 2:37 pm] tinco maintenance
tinco was brought down briefly for drive maintenance. There will be
another short downtime once this is completed, which will result in an
overall upgrade to drive space for this server. If any problems are found
with user files during this transfer, please let us know at
urgent@pair.com.
[May 5, 2000, 9:34 am] pi downtime
pi (www10) crashed under heavy load and was brought back online
with downtime around 30 minutes.
[May 4, 2000, 12:04 pm] psi downtime
psi (www15) crashed under heavy load, and was brought back online
with downtime under 10 minutes.
[May 4, 2000, 7:59 am] pi Logs
Logs for May 2 and May 3 have been generated on pi early this
morning. There was an unusual configuration problem which should not
recur.
[May 3, 2000, 2:01 am] gao downtime
gao (www111) was brought down for a physical relocation after
Saturday's maintenance. Downtime was less then 10 minutes.
[May 1, 2000, 11:28 pm] UUnet Problems
UUnet is reporting serious problems throughout their network;
serious enough that they have promptly posted the information on their
NOC page. The impact on our network has been a temporary loss of inbound
traffic via UUnet, which caused a corresponding drop in overall outbound
traffic. The duration of the event was approximately fifteen minutes,
and traffic is now returning to normal.
This sort of incident is made worse by the behavior in UUnet's
network that we have pointed out before; when internal connectivity to a
destination is lost, routes for that destination are not promptly
withdrawn. Consequently, UUnet edge nodes will happily accept
traffic destined for our network even when there is no way for UUnet
to deliver it to us; this prevents other networks from selecting an
alternate route and defeats the stability otherwise provided by BGP.
Unfortunately, many networks rely on UUnet to reach our network
because of the otherwise superior performance it provides.
We will post further details if any are provided by UUnet.
[Apr 30, 2000, 1:38 am] gao Update
We have discovered that some users on gao did not have their
accounts fully restored. We will be working through the night to identify
the affected accounts and restore them from backup. We do have current
backups of all data; the most recent level was from Saturday morning, less
than 24 hours before the crash.
[Apr 29, 2000, 11:11 pm] Network Resolution
As of 10:05pm Eastern time, all network service has been restored to
normal. The ultimate cause was traced to problems with CEF, Cisco
Express Forwarding, on two of our four internal routers. CEF routing
tables on these routers became corrupted sometime on Friday, although
initially the impact was very small, with only a handful of addresses
affected. When the problem was discovered at 5:30pm today, attempts to
diagnose and remedy the problem led to considerably worse results, with
nearly all of our network affected at one point.
Although the attempted cure was at least temporarily worse than the
disease, several other problems, which may have been related, were
cleared up as we worked on the main problem. One reconfiguration required
some work with our upstreams to ensure that they were receiving our
announcements correctly. Several pieces of equipment, as well as cabling,
were swapped during this time as well.
The problem has been eliminated from our network as far as we can tell.
Unfortunately, there is no specific way to monitor for CEF discrepancies,
but if this type of problem should recur, we will know immediately where
to look. As an aside, we have been successfully using CEF on these routers
since July 1999. Although many providers report problems (CEF is often
referred to as "Customer Enragement Feature"), this is the first incident
we've experienced. Unfortunately, it was a major incident.
The routers affected this evening happen to be among the equipment scheduled
to be replaced by the carrier-class Black Diamond switches we already have
on order. Details are available at
http://support.pair.com/notices/upgrades.html
Just to make things more interesting, the primary hard drive on
gao died shortly after the routing problems began. The incidents
are completely unrelated, but poorly timed. gao's data has been
restored from backup on a new drive, and it is now running with no
problems.
Any customer who observes any ongoing problem reaching any site is
encouraged to send a traceroute to urgent@pair.com. We will investigate
any possibly related problems immediately.
We apologize for the interruption in service and its potentially
significant impact. We recognize the critical importance of maintaining
the best possible network service at all times, and we are committed to
doing so, now and in the future.
[Apr 29, 2000, 9:10 pm] Network Update
As of approximately 8:45pm, the original source of failure has been
eliminated. We have continuing suspicions about the behavior of one
particular router, however, and may be swapping it out later this evening.
There is an ongoing problem with traffic routing from certain addresses
to UUnet's network. This is a BGP issue, and the fault lies with
UUnet. We are working with them to resolve the issue at this
moment.
Further details will be posted.
[Apr 29, 2000, 7:56 pm] Network Update
We have narrowed down the internal network problem to a specific central
switch which appears to be operating unreliably. We are in the process of
swapping this switch out for a spare. Although behaving normally, the
switch is resolutely dropping 60-80% of traffic destined for specific IPs,
without rhyme or reason. This is a switch that was already planned for
upgrade.
gao is still being rebuilt, and should be back online within the
next two to three hours. Its primary drive failed entirely, and a new
drive is being built from backups.
[Apr 29, 2000, 6:27 pm] Network Troubles
Beginning around 5:30pm today, we discovered unusual network behavior
involving an internal router. Initially, only two dedicated servers
were affected, as well as some internal systems. In the course of our
investigation, one problem was corrected, but overall the matter has
worsened, with more servers being affected.
We are continuing to work diligently on this problem, and will post more
information as it becomes available. We believe this problem is solely
internal and can be resolved without further interruptions of service.
[Apr 29, 2000, 6:14 pm] gao emergency maintenance
Gao's hard drive has failed under load and is currently being replaced.
Estimated downtime to copy the data from backups to the new drive is
approximately 1-3 hours.
[Apr 28, 2000, 12:39 pm] UUnet Resolution
After an brownout of approximately one hour overall, all traffic through
UUnet appears to have returned to normal. We apologize for the
inconvenience; because of UUnet's internal routing scheme, backbone
outages do not effectively propagate withdrawn routes. This means that
other portions of UUnet's network will happily accept traffic
destined for a customer network such as our own, and then fail to deliver
that traffic due to a backbone outage. We have observed this behavior
before, and are considering ways to work around it in the future.
[Apr 28, 2000, 12:05 pm] UUnet Further Troubles
The recovery of UUnet's routing was short-lived. They have
indicated to us that they are currently having more serious backbone
problems, and a ticket has been opened. As further information becomes
available, we will post it here.
[Apr 28, 2000, 11:47 am] UUnet Resolution
UUnet has advised us that one of their national transit routers
failed, interrupting a portion of their network until routing sessions
could be re-established. When this happens, some traffic flowing through
UUnet will be delayed or diverted for a period of 30-45 minutes.
We are now seeing traffic levels recover to normal.
[Apr 28, 2000, 11:43 am] UUnet Trouble
We are seeing a noticeable drop in traffic across all circuits, due to
apparent inbound problems on UUnet's network. We are watching this
carefully and will post further information as it becomes available.
[Apr 27, 2000, 12:07 pm] pi downtime
pi (www10) crashed under heavy load and was brought back online with
downtime around 20 minutes.
[Apr 25, 2000, 4:11 pm] pi downtime
pi (www10) crashed under heavy load and was brought back online
with downtime around 20 minutes.
[Apr 25, 2000, 2:32 pm] theta downtime
theta (www4) crashed under heavy load, and was brought back online
with downtime around 20 minutes.
[Apr 23, 2000, 7:14 pm] theta Downtime
theta crashed under heavy load and was promptly brought
back online. Total downtime was less than fifteen minutes.
[Apr 22, 2000, 10:41 am] Digex Network Warning
Digex/Intermedia has warned us of a problem with their
network in the Washington DC area. An electrical fire at a Metro Station
has removed at least one circuit from service, but their technicians will
not be allowed in the area to address the problem until Sunday morning.
Consequently, there may be increased latency due to load on other portions
of their network. We have not seen any problems from our end, but this
information may be useful to any customers affected.
[Apr 21, 2000, 10:11 pm] Digex Slowdown
Intermedia/Digex has informed us that due to fiber damaged in a fire in
Washington, DC we can expect to see some additional latency in their network.
This means that some customer traffic might be noticably slower until they
are able to repair the damaged fiber on Sunday morning. Despite the slight
slowdown, however, all traffic will continue to reach its destination as
normal.
[Apr 17, 2000, 12:17 pm] pi downtime
pi (www10) crashed under heavy load. After an extensive
filesystem cleaning it was brought back online, with downtime around 20
minutes.
[Apr 13, 2000, 9:22 pm] omega Downtime
omega (www16) crashed under heavy load. Downtime was 15 minutes.
[Apr 13, 2000, 1:24 pm] UUNet Performance
We've observed some performance problems this afternoon on our DS-3 to
UUNet. UUNet has informed us that they are due to the link between their
Pittsburgh Router and one of their gateway routers having gone down
briefly. We will be closely monitoring the circuit for the rest of the day
to ensure that customer traffic is not adversely affected.
[Apr 13, 2000, 12:45 pm] bemnet Downtime
bemnet (www158) crashed under load and was brought back online
with downtime of approximately 10 minutes.
[Apr 12, 2000, 7:34 pm] nuin downtime
nuin (www18) crashed under heavy load and was brought back online
after a file system cleaning. Downtime was approximately 10 minutes.
[Apr 11, 2000, 4:25 pm] yanta downtime
yanta (www67) crashed under heavy load, and was brought back
online with downtime around 10 minutes.
[Apr 11, 2000, 12:47 pm] mu Maintenance Completed
The drive maintenance on mu has been completed, after another
10 minutes of downtime. mu is now a Pentium III at 600 MHz,
with 256MB SDRAM and a 20GB drive.
[Apr 11, 2000, 6:16 am] tinco Upgrade
tinco has been upgraded to a Pentium III at 600 MHz with 256MB
SDRAM. Downtime for the upgrade was approximately ten minutes.
[Apr 11, 2000, 4:18 am] mu Emergency Maintenance
Emergency maintenance has begun on mu, to replace its two
older hard drives with a new model. One of the drives is experiencing
unrecoverable errors. When this process is complete, within several
hours, the server CPU and RAM will also be upgraded.
[Apr 11, 2000, 3:34 am] cuzea Downtime
cuzea crashed under heavy load, and was brought up after
an extensive filesystem cleaning. Total downtime was approximately
30 minutes.
[Apr 11, 2000, 1:12 am] Sprint Resolution
At 12:35am Eastern time, our Sprint circuit was restored to normal
service. Sprint had significant difficulties identifying the problem
in the routing equipment on their end. We have been assured that the
outage, nearly ten hours in duration, is an unusual situation and will
not likely be repeated. We will continue to monitor the circuit closely.
[Apr 10, 2000, 9:22 pm] Sprint Outage
The Sprint circuit is still down, with no ETA for repair. Sprint has now
escalated the problem several times, but has yet to find its source other
than that it is with one of their gateway routers. They assure us that they
are working to get it back online as soon as possible.
Until then, customer traffic is continuing to flow over our other circuits.
We will continue to post more information as it becomes available.
[Apr 10, 2000, 4:14 pm] Sprint Outage
We've just discovered that the Sprint outage is worse than we thought.
While they are accepting some outbound traffic from us, it is then never
leaving their network. This is effectively causing some customer traffic to
be lost entirely. We are still working with them to isolate and fix the
problem as quickly as possible, and will post another notice as soon as we
have more information.
[Apr 10, 2000, 3:54 pm] Sprint Outage
We are currently seeing a drastic reduction of the amount of traffic
flowing on our Sprint circuit. All customer traffic is flowing over
alternate paths at this time, and we are working with Sprint to find and
fix the problem. Further details will be posted as they become available.
[Apr 10, 2000, 5:06 am] Sprint Network Outage - Resolution"
Connectivity through Sprint has been restored at 4:48 am EST. Total
downtime for the connection was 159 minutes.
[Apr 10, 2000, 2:58 am] Sprint Network Outage
At approximately 2:00 am EST, connectivity through Sprint was lost. The
issue has been escalated to Sprint for resolution. Further details will be
posted when they become available.
[Apr 8, 2000, 4:44 pm] emancholl Upgrade
emancholl has been upgraded to a Pentium III at 600 MHz, with
256MB SDRAM and a 15GB drive.
[Apr 8, 2000, 3:38 pm] harma Upgrade
The drive swap and system upgrade of harma has been completed
as of 3:35pm Eastern time. The server is now a Pentium III at 600 MHz,
with 256MB SDRAM and a 20GB drive.
[Apr 8, 2000, 12:52 pm] harma Maintenance
Due to recent errors, we are replacing the hard drive in harma.
There will be ten minutes of downtime later today, as the swap is
completed.
[Apr 7, 2000, 10:38 pm] emancholl Downtime
emancholl (www37) suffered a crash and was rebooted. Downtime was 10
minutes.
[Apr 7, 2000, 4:11 pm] cicka Reboot
cicka (www167) required a reboot after a manual reconfiguration problem.
After the reboot, it has returned to regular service. Downtime was just
over 15 minutes.
[Apr 7, 2000, 1:17 pm] enda downtime
enda (www80) crashed under heavy load and was brought back online
with downtime around 20 minutes.
[Apr 5, 2000, 2:13 pm] Routing Interruption
One of our gateway routers went down briefly at 1:35pm Eastern time
today. Slight reconfiguration was required when it returned to service.
We are investigating possible causes.
[Apr 4, 2000, 5:01 pm] Digex Problems
Beginning around 4:50pm, we have observed severe latency on Digex's
backbone, as well as a sharp decrease in routes announced and traffic
passed. We believe there is a general problem on their network. We will
post more information as it becomes available. In the meantime, alternate
paths are handling traffic as necessary.
[Apr 4, 2000, 3:34 pm] kodh downtime
kodh (www90) crashed under heavy load and was brought back online
with downtime around 20 minutes.
[Apr 2, 2000, 11:51 am] Server Upgrades
ngetal, eite, and ceirt have been upgraded
to Pentium III 600 MHz systems with 256MB SDRAM each. This is part of
our ongoing
upgrade plan.
[Mar 29, 2000, 11:25 pm] pyyl Downtime
pyyl (www97) crashed due to heavy load. Downtime was 10 minutes.
[Mar 29, 2000, 5:47 am] onn Downtime
onn crashed four times between 4am and 5:30am Eastern time today.
Although we initially suspected a hardware problem and were preparing to
swap onn into a new chassis, the crashes were in fact traced to
an extremely high network load, more than 100 times beyond the normal
traffic carried by that server. This load was caused by an extremely
popular file being posted to an FTP account and downloaded through HTTP.
Please note that FTP is more suitable for lengthy downloads. Also, any
customer who needs services in the 200GB/day range is asked to please
contact us beforehand for an appropriate QuickServe solution.
[Mar 28, 2000, 9:51 pm] UUnet Resolution
Our UUnet is once again routing traffic. The problem was tied to
a Pittsburgh router that has since been repaired. Traffic should again be
normal for all customers.
[Mar 28, 2000, 7:05 pm] UUnet Network Status
At approxmiately 6:42PM EST this evening UUnet suffered an outage that
brought down our connection with them. All traffic is currently being
routed through our other providers, and they appear to be holding up.
Customers may experience some slowdowns until this is resolved, at which
point we will post a follow-up.
[Mar 27, 2000, 5:27 pm] pi downtime
pi (www10) crashed under heavy load and was brought back online,
after an extensive filesystem cleaning, with downtime around 20 minutes.
[Mar 27, 2000, 4:19 pm] UUnet Network Status
UUnet has reportedly suffered a major fiber cut somewhere east of
here. Although Pittsburgh was not partitioned from their network in this
case, it did cause a routing flap and brief interruption of our
UUnet service. It appears to be returning to normal at this time,
although customers may still be directly affected if they are on
UUnet's network in the affected area.
[Mar 27, 2000, 9:26 am] iota Downtime
iota (www5) crashed under load and was brought back online after
a filesystem check. Downtime was approximately 15 minutes.
[Mar 26, 2000, 9:26 am] gort Upgraded
gort has been upgraded to a Pentium III at 600 MHz, with 256MB
SDRAM. We expect to upgrade the other few remaining Pentium Pro servers
within the next two weeks.
[Mar 26, 2000, 8:33 am] gamma Upgraded
The drive upgrade on gamma has been completed. The server has
also been upgraded to a Pentium III at 600 MHz, with 256MB SDRAM. We
will be continuing these upgrades for all older servers over the coming
weeks; an article will be posted shortly with more details.
[Mar 25, 2000, 8:26 am] gamma Emergency Maintenance
gamma will be taken down briefly to install a new hard drive.
Not only is it due for an upgrade, but one of the old drives is beginning
to show errors. Total downtime should be less than fifteen minutes. The
swap will continue while the server is online, and be completed with
brief downtime again later today or tomorrow.
[Mar 23, 2000, 8:37 am] pi Downtime
pi(www10) crashed under heavy load, and was rebooted. Downtime was
about 35 minutes.
[Mar 22, 2000, 2:58 pm] Network Solutions Template Change
Effective this week, we have begun using the newest e-mail template,
version 6.0, that Network Solutions (NSI) provides for domain
registrations and modifications. This template will be required by NSI
effective April 1, 2000. We adopted it as soon as our testing was
completed, primarily because it allows new domains to opt-out of NSI's
new bulk marketing programs, under which they sell contact information
to third parties without any other form of consent.
We have selected the opt-out by default, as we expect this to be the
preferred setting. If you prefer to be marketed to by third parties,
please contact Network Solutions.
[Mar 22, 2000, 4:57 am] iota Downtime
iota (www5) crashed under heavy load. It has since returned to normal service. Downtime was about 15 minutes.
[Mar 20, 2000, 10:37 pm] straif Downtime
straif (www26) crashed due to load. It was brought back up in under 15
minutes.
[Mar 15, 2000, 3:59 pm] nuumen downtime
nuumen crashed under heavy load, and was rebooted. Downtime was
approximately 10 minutes.
[Mar 15, 2000, 2:19 pm] gamma downtime
gamma crashed under heavy load, and required an extensive file
system check upon reboot. Downtime was approximately 10 minutes.
[Mar 15, 2000, 12:12 am] SAVVIS Maintainence
Connectivity through our SAVVIS circuit has returned at this time. The
problem focused around their New York router, but after discussing the
problem with SAVVIS support, they were unable to give us any other
information as to the cause.
[Mar 14, 2000, 11:46 pm] SAVVIS Emergency Maintenance
At around 11:07 pm EST , we started having trouble with our SAVVIS
connection. SAVVIS support claimed that the outage was part of an
Emergency Maintenance window of operation that was expect to last 5
minutes, but has since lasted almost 30. We will post more information as
it becomes available.
[Mar 14, 2000, 2:46 pm] gort downtime
gort crashed, requiring a reboot and file systems check. Downtime
was approximately 10 minutes.
[Mar 14, 2000, 2:16 am] othala Downtime
othala (www150) crashed and required a manual disk before rebooting.
Downtime was 10 minutes.
[Mar 13, 2000, 12:02 am] ilwe Downtime
ilwe (www81) was down this evening for approximately 10 minutes.
It has since returned to normal operation.
[Mar 4, 2000, 6:45 pm] UUnet Resolution
Just before 6pm Eastern time, our UUnet connectivity returned to
normal status, for a total outage duration of nine hours, as predicted by
UUnet technical support.
All routing now appears to be normal; if there have been any other effects,
we will post further information.
[Mar 4, 2000, 3:07 pm] hwesta Downtime
hwesta (www51) was rebooted due to load issues. Downtime was 5 minutes.
[Mar 4, 2000, 11:58 am] UUnet Update
UUnet reports that they have a total of ten OC-48 trunks out of
service in Pennsylvania, completing isolating their Pittsburgh presence
from the backbone. Due to the magnitude of the problem, no resolution is
expected any sooner than 6pm Eastern time today, which would represent a
remarkably long total outage time of at least nine hours.
Customer traffic continues to flow normally through alternate providers
with no disruption. We will post updated information when it becomes
available.
[Mar 4, 2000, 9:06 am] UUnet Outage
UUnet has lost all outbound connectivity in Pittsburgh, as of 8:45am
Eastern time today. Although we can reach their gateway router directly
over our circuit, it is isolated from the rest of their network. Luckily,
the outage is severe enough that UUnet's internal routing has been
updated, and all customer traffic is now comfortably flowing through
SAVVIS, Sprint, and Digex. In fact, we can reach
UUnet's network easily via SAVVIS, as they are connected
to UUnet in New York City.
We will post updated information as it becomes available from their NOC.
[Mar 3, 2000, 8:19 pm] theta Downtime
theta (www4) was rebooted under severe load and required extensive
filesystem checks and an additional reboot. It has now returned to normal
service. Downtime was approximately 30 minutes.
[Mar 3, 2000, 1:10 pm] nwalme downtime
nwalme (www57) crashed under heavy load and was brought back
online with downtime around 15 minutes.
[Mar 3, 2000, 10:11 am] ailm Downtime
ailm (www28) crashed and required a manual filesystem cleaning
before being brought back online. Downtime was approximately 10 minutes.
[Mar 2, 2000, 2:48 pm] auma maintenance
auma was taken down briefly to replace a failing network card. It
has since returned to normal service, with approximately 10 minutes of
downtime.
[Mar 2, 2000, 12:01 am] idad Downtime
idad (www32) was down this evening for approximately 5 minutes.
It has since returned to normal operation.
[Mar 1, 2000, 10:20 am] Denial of Service Attack
Beginning at 9:48am Eastern time today, pair Networks
was brought under a severe denial of service attack, with more than
100Mbps of traffic directed into its network from multiple attacking
sources.
Because a chain is only as strong as its weakest link, our network was
affected; a portion was effectively knocked offline for eleven minutes.
Without being more specific (to the benefit of attackers), we are pleased
to report that the weakest link was previously scheduled to be replaced
with an extremely powerful alternative, which is now on order. We will
expedite this upgrade as much possible. We will also be accelerating the
scheduled improvements for our gateway routers, which were originally
planned to accommodate OC-3 circuits, but will also benefit in protecting
against this type of attack.
There is no absolute protection against denial of service attacks; the only
cure is better defense of other servers and networks, and the only
short-term fix is to have so much capacity that it can't be overwhelmed
(currently this may not be possible; witness the attacks on Yahoo and eBay).
Rest assured that we will do everything possible to defend our network and
build it out to be able to survive any type of attack. For the record, we
have sustained eight smaller attacks over the past two weeks, with no
impact on performance or connectivity for any customer.
[Mar 1, 2000, 12:39 am] enda maintenance completed
The previously mentioned drive upgrade for enda has been
completed. Downtime was less than 10 minutes.
[Feb 28, 2000, 11:02 pm] falku Downtime
falku (www98) crashed due to high load and was brought back online after a
manual filesystem check. Downtime was approximately 10 minutes.
[Feb 25, 2000, 8:03 pm] naudhiz Downtime
naudhiz (www141) suffered a brief outage from a crash caused by high server
load. Downtime was 15 minutes.
[Feb 25, 2000, 2:24 pm] pi downtime
pi crashed under heavy load, and required a filesystem check after rebooting. Downtime was approximately 10 minutes.
[Feb 24, 2000, 2:06 pm] arda downtime
arda (www62) crashed under heavy load and was brought back online
with downtime under ten minutes.
[Feb 23, 2000, 7:32 am] auma Downtime
auma (www86) crashed under load and was brought back online after
a filesystem cleaning. Downtime was approximately 10 minutes.
[Feb 23, 2000, 2:40 am] anca Downtime
anca (www53) was down this evening for approximately 5 minutes.
It has since returned to normal operation.
[Feb 19, 2000, 11:10 pm] bemnet Maintenance
The swap on system memory for bemnet has been completed. We will continue to monitor the situation in case further hardware work is necessary.
[Feb 19, 2000, 9:55 pm] bemnet Maintenance
We have investigated the recent erratic behavior of bemnet,
including spontaneous reboots, and concluded that there is a problem
with the system memory. bemnet will be taken down within the
next 24 hours for approximately five minutes, in order to swap the system
memory. If this does not correct the problem, a complete swap to a new
motherboard and CPU will be required. We will post further information
as it becomes available.
[Feb 18, 2000, 9:21 am] enda Maintenance
enda (www80) was taken down briefly to prepare for a hard drive
upgraded that had been postponed earlier. The upgrade is being done to
increase available disk space. Downtime was approximately 10 minutes.
[Feb 15, 2000, 6:11 pm] vala maintenance completed
vala (www59) is back up after the replacement of its ethernet
card. Downtime was under five minutes.
[Feb 15, 2000, 3:32 pm] vala maintenance
vala (www59) will be taken down at approximately 6:00 PM Eastern
to replace a defective ethernet card. Downtime should be no more than five
ten minutes.
[Feb 15, 2000, 11:25 am] ailm downtime
ailm (www28) crashed under heavy load and was brought back online
with downtime around 10 minutes.
[Feb 15, 2000, 1:57 am] SAVVIS Maintenance
SAVVIS has just announced another emergency maintenance window to start
sometime in the next 2 hours and last approximately 30 minutes. Customer
traffic should be affected, but not significantly.
[Feb 14, 2000, 9:25 am] paat Downtime
paat (www100) crashed under load and was brought back online
following a manual filesystem cleaning. Downtime was approximately 15
minutes.
[Feb 11, 2000, 5:56 pm] bemnet downtime
bemnet (www158) crashed under heavy load and was brought back
online with downtime around 10 minutes.
[Feb 11, 2000, 12:14 pm] SAVVIS Emergency Maintenance
After three more outages in SAVVIS New York City POP, they have just
announced another emergency maintenance window for 12:45pm Eastern time,
approximately 30 minutes from the time of this posting. The outage may
take as long as 30 minutes. Customer traffic will be affected, but not
dramatically.
We would like to state publicly that we are pursuing termination of our
service with SAVVIS, based on a poor service record, a lack of
forthright information about problems, and an absence of future growth
capacity. We expect to shift traffic towards Sprint and our new
GTE circuit, once it is online.
[Feb 10, 2000, 1:37 pm] db2 Resolution
The problems on db2 have been identified and corrected. The
MySQL daemon on db2 has been stable for the past several hours,
and we will continue to monitor it.
[Feb 10, 2000, 12:31 pm] SAVVIS Maintenance
SAVVIS has advised us that there is a problem with a line card
on the New York City router we are connected to, which likely explains
the eleven brief outages we've experienced with them in the past week.
They will be replacing that card on an emergency basis today or tonight,
which will disrupt our connectivity through SAVVIS for fifteen
minutes or less. Hopefully this will eliminate the problem in general.
[Feb 10, 2000, 10:29 am] db2 Problems
db2 experienced intermittent problems with its MySQL daemon
overnight, which have begun to recur this morning. We are currently
working to correct the problem, as well as to relocate databases to
other servers as part of our database upgrade plan.
[Feb 10, 2000, 4:23 am] Digex Resolution
The problems with Digex are now cleared up. Traffic should again be
normal for all users.
[Feb 10, 2000, 3:12 am] Digex Trouble
We are currently experiencing problems with our leased line to Digex. We
have been told by Digex that they are aware of the problem and are
working on it. We will keep on top of this and post more information as it is
available.
[Feb 9, 2000, 3:36 pm] MySQL Upgrades
The daemons on all MySQL database servers were upgraded this afternoon to
install a security patch. Downtime for each daemon was less than 2
minutes.
[Feb 9, 2000, 3:52 am] quan Maintenance
quan will be shut down briefly at approximately 4:05am to have its ethernet
card replaced due to hardware errors observed with the current one.
Downtime should be less than 10 minutes.
[Feb 9, 2000, 1:08 am] Network Troubles
Due to the Internet-wide backlash from the recent Distributed Denial of
Service attacks against several major websites, the internet at large is
seeing massive amounts of congestion and poor routing. This may cause some
customers to see poor performance when reaching their sites hosted at pair,
or even not be able to reach them at all in some extreme circumstances.
We are working with our upstream providers to try and restore things to normal
as quickly as possible, at least from our point of view. Unfortunately as this
is a problem affecting the entire Internet, it is mostly out of our hands.
We'll post more information as it becomes available.
[Feb 8, 2000, 4:10 am] UUnet Brownout
At 3am Eastern time, we observed a sharp decrease in traffic to and from
UUnet. The traffic shifted to SAVVIS and Sprint,
which implies significant changes in BGP route announcements. At the
same time, UUnet's Pittsburgh gateway became almost entirely
non-responsive to anything but routed traffic. The possible causes
include a configuration error at UUnet's Pittsburgh gateway, or
major outages elsewhere in UUnet's network, thereby affecting route
distribution.
We did not observe any such outages, and UUnet's first explanation
was, incredibly, that the problem was on our end and was not in their
network. By the time we received a second explanation an hour later,
the network routing had suddenly returned to normal. That explanation was
that they are doing extensive maintenance in East Coast POPs, and that we
should expect to see intermittent problems until 6am.
Of course, we receive daily notification from UUnet about planned
maintenance everywhere within their global network, from software upgrades
in Saint Louis to modem swaps in Melbourne, and for at least the last two
weeks, no such round of maintenance has been planned. The announcement for
today's maintenance, in fact, lists only Stockholm, Sweden. Of course,
that announcement has been the same three days in a row, so perhaps it is
erroneous.
Customer traffic was not significantly affected, although our faith in the
forthrightness and reliability of one of our upstreams has been shaken.
We are honest and direct with our customers, and truly wish that our
providers would do the same for us.
[Feb 7, 2000, 12:44 am] SAVVIS Trouble
Once again we are having trouble with our DS-3 to SAVVIS. The problem
appears to be similar in nature to what we experienced last Tuesday (2/1),
though so far there have only been 2 brief 5-10 minute outages. Customer
traffic has not been adversely affected by these outages, as we have plenty
of extra capacity online now with our other providers.
We are working with SAVVIS to resolve the problem, and will post further
details as they become available.
[Feb 4, 2000, 5:31 pm] Internal Routing Brownout
The internal routing maintenance announced for last night was considered
the difficult portion of some network changes we're making. Those changes
went smoothly and caused no disruptions of traffic. The portion considered
"easy", however, took place around 4:30pm today, and did disrupt
traffic destined to approximately two dozen servers for approximately ten
minutes.
The changes are intended to improve the overall stability of the internal
network; we are also evaluating a possible network upgrade that would
improve our internal routing capacity by a factor of several hundred,
while also simplifying our network design. We do not expect any further
disruptions due to configuration changes or unreliable Cisco features.
We will post details of the proposed network upgrade as soon as we have
completed our evaluation of the required technology.
We apologize for the interruption of traffic to affected servers and
customer sites.
[Feb 3, 2000, 8:14 pm] Network Maintenance Complete
The scheduled network maintenance is complete, with no problems.
As expected, customer traffic was not affected.
[Feb 3, 2000, 3:57 pm] uilen downtime
uilen (www35) crashed under heavy load and was brought back online
with downtime around 15 minutes.
[Feb 3, 2000, 3:21 pm] Network Maintenance
Starting at approximately 8pm tonight we will be briefly taking down each
of our internal routers for maintenance. Customer traffic should not be
adversely affected. Each router's downtime will be approximately 5 minutes.
[Feb 3, 2000, 2:09 pm] Problems with Network Solutions
We feel obligated to warn our customers that we have consistently been
having problems with response time and correctness when dealing with
Network Solutions in recent weeks. We are frequently given
incorrect information to rely on and pass along to our customers.
This includes issues such as what e-mail addresses to send templates to,
how templates should be filled out, and even (most recently) which
template should be used.
Competition has been introduced for domain registrations under the
generic top-level domains of .COM, .NET, and .ORG.
Network Solutions operates a central registry, separate from its
operations as a registrar, and that registry deals with competing
registrars, accredited by ICANN, to accept domain registrations.
Many of the competing registrars have different procedures, behaviors,
and pricing, when compared to Network Solutions' registrar business.
At the present time, our systems do not directly support these alternate
registrars. Also, our Gold Premier status with Network Solutions
is important to us, as it allows our customers to register domains with
deferred payment, and provides us with a contact channel through which we
can at least attempt to rectify some of the problems mentioned above. We
do, however, feel that the current situation is unsuitable, and that
competition in domain registration is essential. Consequently, we have
taken several steps. First, we have received our own accreditation from
ICANN, and expect to be handling domain registration directly as
soon as April of this year. Second, our pair2000 software system,
due for user debut on March 15, will include support for one or more
competing registrars.
In the meantime, we will continue to assist customers who encounter
problems using Network Solutions' registrar services. We have
been advised that Template 5.0 is our best choice, although suddenly
only Template 6.0 is accepted. We hope to have this resolved today.
Customers who wish to use alternate registrars will need to handle
those registrations directly with such registrars, and present the
domains to us as "transfers from other NIC". If you receive
a template designed for use with Network Solutions but do not
use them as your registrar, you may simply copy the nameserver
information from that template.
We look forward to improving this situation and eliminating unpleasant
experiences with domain registration.
[Feb 2, 2000, 9:53 pm] upsilon Downtime
upsilon (www13) crashed due to heavy load. Downtime was 10 minutes.
[Feb 1, 2000, 6:49 pm] SAVVIS Network Trouble
Our SAVVIS line is once again experiencing problems and is currently
down. We are on the phone right now with their technicians to attempt
resolve this.
In the meantime, our Digex line is now functional, and this is
doing a good job of helping to balance the loss. As a result, most users
should not see significant problems in reaching us.
We are, nevertheless, working right now to bring our SAVVIS line
back up and will post another notice once we have more to report.
[Feb 1, 2000, 5:58 pm] SAVVIS Update
The most recent SAVVIS outage has continued for fifteen minutes,
and appears to be a more serious problem than originally realized. We
are attempting to get further information from SAVVIS.
[Feb 1, 2000, 5:50 pm] SAVVIS Network Trouble
We are seeing trouble with SAVVIS today; after two brief outages
during their overnight maintenance window, their New York connectivity has
gone offline entirely twice today. Their explanation so far is that they
are still having problems with their New York backbone connectivity,
similar to the incident on January 27. The outages have been brief and
have had minimal effect on customer connectivity. We will post further
information if it becomes available.
[Jan 31, 2000, 10:34 am] db4 Reboot
db4 was rebooted to clear a resource-usage problem -- it was brought
back online within 10 minutes.
[Jan 30, 2000, 6:53 am] Network Problems
Beginning around 4:30am Eastern time today, one of our internal routers
experienced severe instability due to a low memory condition. The memory
was exhausted by a routing problem elsewhere on the Internet. Unfortunately,
the failure mode caused the router to "flap" (go offline and online again
repeatedly) rapidly, which prevented other routers from successfully
taking over its traffic.
Within an hour, we had returned routing to normal. We have reconfigured
the affected router so that it carries less traffic and uses memory more
efficiently. We will be upgrading memory on all of our internal routers
to their maximum physical capacity this week, in order to reduce the
possibility of any similar failure in the future. We also have additional
routing equipment already on order to handle anticipated growth in demand.
Approximately one-third of customer sites experienced a network "brownout"
during the event. We apologize for the inconvenience, and are taking every
step possible to ensure the problem does not recur.
[Jan 27, 2000, 1:19 pm] Network troubles
A sharp drop in traffic to and from our network alerted us of an incident
in progress. So far, we have heard of a major fiber cut between here and
New York which may be the cause. This caused some network instability in
the overall region while it was worked around, and hampered our
connectivity for approximately 20 minutes. Currently, we appear to have
full service again. We will post additional details as more information
becomes available.
[Jan 26, 2000, 1:30 pm] chi downtime
chi (www14) became unresponsive after suffering high load due to run away CGI processes. It came back after a clean reboot and file system check, and downtime was less than 15 minutes.
[Jan 25, 2000, 11:07 pm] SAVVIS Emergency Maintenance
SAVVIS has announced a major emergency maintenance window for
this evening, spanning 10pm to 3am Eastern time. Their connectivity in St
Louis, New York City, and Chicago will be severely reduced for multiple
periods of approximately 20 minutes each. Customer traffic on these routes
may be significantly degraded during that time. We are connected via
New York City.
We apologize for the short notice; we have only been notified by
SAVVIS within the last hour. We did observe performance problems
in SAVVIS' network today, but compensated by rerouting some traffic
manually. We hope that the maintenance will eliminate those problems
and allow us to fully utilize our SAVVIS connectivity once again.
[Jan 25, 2000, 3:26 am] coll Downtime
coll (www22) become unresponsive due to a lack of system resources and
needed to be rebooted. Downtime was less than 15 minutes.
[Jan 24, 2000, 8:53 am] Emergency Maintenance
We are currently rebooting all user servers, in order to address two
recently-discovered security weaknesses. Downtime for each server will
be less than five minutes, and there will be no other effect on user
functionality.
[Jan 24, 2000, 4:50 am] UUNet Maintenance
UUNet will be performing maintenance on their Pittsburgh router during their
maintenance window on the morning of Tuesday 1/25 beginning around 3:00am EST.
Customer traffic will continue to flow on our other circuits and should not
be adversely affected.
[Jan 22, 2000, 9:16 pm] uilen Downtime
uilen (www35) crashed due to heavy load. Downtime was 30 minutes.
[Jan 21, 2000, 6:37 pm] theta Downtime
theta (www4) rebooted under heavy load. Downtime was around 15 minutes.
[Jan 21, 2000, 1:35 pm] UUnet Routing
While performing some emergency rebalancing of traffic this afternoon
around 1pm, UUnet traffic was impaired for approximately three
minutes. We have corrected the problem, and have now shifted more traffic
towards UUnet. Our bandwidth expansion plans now consist of
bringing Digex online within the next two weeks, having GTE
online by the end of February, and activating our UUnet OC-3c in
early March.
[Jan 20, 2000, 9:17 am] PSInet Problems
PSInet is having major peering problems with Sprint; this was
noticed yesterday but has resumed during periods of high traffic today.
Sprint is working with them on the issue; hopefully they will
upgrade their private peering to accommodate the level of traffic.
The impact on customers is that interactive response from Earthlink
or other PSInet-connected ISPs may be poor at certain times of the
day. We will attempt to redirect our outbound traffic through another
backbone within the next 24 hours, but we cannot affect the congestion on
the inbound route.
[Jan 19, 2000, 12:39 pm] enda Upgrade
enda (www80) will be taken down briefly this afternoon to
begin the process of replacing the hard drive. Another brief period
of downtime will be required when the swap is completed.
[Jan 19, 2000, 10:27 am] db4 MySQL Downtime
The MySQL daemon was down for approximately 20 minutes this morning after
a failure occurred in the automated process for monitoring it. The problem
was caught by external monitoring and corrected.
[Jan 18, 2000, 1:41 pm] gamma Downtime
gamma rebooted under high load, and after some extensive file
system checks came back with no lingering effects. Downtime was
approximatly 10 minutes.
[Jan 18, 2000, 11:48 am] uilen downtime
uilen (www35) crashed under heavy load and was brought back online
with downtime under 15 minutes.
[Jan 17, 2000, 8:56 pm] Omicron Downtime
Omicron (www9) rebooted under severe load; it has been returned to service
after a normal reboot.
[Jan 14, 2000, 8:33 pm] ruis Downtime
ruis (www27) died under load. Downtime was less than 15 minutes.
[Jan 14, 2000, 5:29 am] pi Downtime
Around 4:15am, pi crashed, apparently under heavy load.
However, it refused to come back online. What was originally believed
to be a hard drive problem was traced to faulty memory. After replacing
the SDRAM on pi, it has booted without problems. The total downtime
was approximately 60 minutes.
[Jan 12, 2000, 12:04 am] hwesta Downtime
hwesta (www51) crashed under heavy load. Downtime was less than 10 minutes.
[Jan 11, 2000, 4:04 am] UUnet Maintenance
UUnet has completed their maintenance and traffic is flowing
normally again.
[Jan 11, 2000, 3:38 am] UUnet Maintenance
UUnet's scheduled maintenance is taking longer than expected; it
seems that they have run into difficulties. As of 3:35am Eastern time,
our connectivity to UUnet has been offline for 30 minutes, and
UUnet does not expect to restore that connectivity for at least
another 30 minutes. Customer traffic is flowing normally through alternate
paths, and is not being significantly affected.
[Jan 9, 2000, 2:39 pm] UUnet Connectivity
At approximately 2:15pm Eastern time today, we lost connectivity with
UUnet. The link came back up within a few minutes, but it took
another fifteen minutes for UUnet to reestablish connectivity
between Pittsburgh and the rest of their network. Hopefully this is
not going to be a recurring problem in their POP; if so, they will likely
take it down for hardware replacement.
Customer traffic was delayed by the brownout; a clean outage would
have been smoother. All is now returning to normal.
[Jan 8, 2000, 2:33 pm] vala Reboot
vala (www59) was rebooted to alleviate a low-memory condition under heavy
load. It has returned to regular service at this time.
[Jan 7, 2000, 6:57 pm] Maintenance Windows
Our upstream providers have announced plans to utilize their
maintenance windows as follows: UUnet Jan 11 Tuesday morning;
Digex Jan 12 Wednesday morning; Digex Jan 18 Tuesday morning.
None of these outages are expected to be long-term, and none should
affect customer traffic significantly.
[Jan 6, 2000, 9:26 pm] gebo Downtime
gebo (www139) froze. Downtime was less that 10 minutes.
[Jan 4, 2000, 10:02 am] vala Reboot
vala (www59) was rebooted this morning to clear up a resource
allocation problem. Downtime was less than 5 minutes, and the server
has returned to normal service.
[Jan 3, 2000, 3:59 pm] vala Downtime
vala (www59) was down for approximately 10 minutes this afternoon.
It has since returned to normal service.
[Jan 1, 2000, 1:34 pm] zatz Drive Swap
zatz has had its primary drive swapped for a larger model, with
two brief downtimes of approximately five minutes each. The original drive
was reporting errors.
[Jan 1, 2000, 1:02 am] Y2K Rollover
There are no Y2K-related incidents to report with regards to
pair Networks servers, network, or other operations.
As of 1am Eastern time, all operations are normal.
There is an unrelated prefailure indication on the hard drive for
zatz; it will be replaced during normal hours on Saturday, January
1st, 2000.
Happy New Year's!
View Notices for 1999
|
|
|
|