|
|
[Dec 30, 1999, 11:45 am] pi Downtime
pi (www10) crashed and was brought back online after a manual
filesystem cleaning. Downtime was approximately 20 minutes.
[Dec 30, 1999, 1:23 am] theta Downtime
theta (www4) crashed under heavy load and required an extensive
filesystem cleaning before being returned to normal service. Downtime was
approximately 40 minutes.
[Dec 29, 1999, 11:09 am] upsilon downtime
upsilon (www13) crashed under heavy load and was brought back
online with downtime around 10 minutes.
[Dec 26, 1999, 2:45 pm] anca Downtime
anca crashed due to a runaway mailing list, and was brought back
online within ten minutes.
[Dec 20, 1999, 2:37 am] emancholl Maintenance
The primary drive swap on emancholl (www37) has now been
completed. Combined downtime for the two outages was approximately
20 minutes. Available disk space has been more than doubled on the
machine, as well.
[Dec 19, 1999, 7:47 pm] emancholl Maintenance
The primary drive of emancholl (www37) has begun to show signs of
failure. We have begun the process of replacing the drive, which will require
two brief periods of downtime. We expect the server to continue to function
normally while the swap is taking place.
[Dec 18, 1999, 11:38 am] UUNet Problem Resolution
UUNet has advised us that all routing problems have been resolved
overnight. Our circuit with them is functioning normally, and no customer
traffic was affected by the problem.
[Dec 17, 1999, 4:23 pm] UUNet Problems
UUNet is currently experiencing some routing problems in their Pittsburgh
POP. They're working on it and will update us as soon as the problem is
resolved. Customer traffic should not be adversely affected.
[Dec 17, 1999, 12:47 pm] Log Generation Update
Effective Monday, Dec 20th, our log generation program will begin
generating logfiles in the format www.YYYYMMDD, rather than
www.YYMMDD. Because we know that a few users have automated
scripts to process their nightly logs, for the remainder of 1999, a
symbolic link from www.YYMMDD to www.YYYYMMDD will
also be generated.
We had intended to replace the log generation system with the new log
system of pair2000 before now, but unfortunately this will not
yet be possible.
[Dec 14, 1999, 4:52 pm] SAVVIS Maintenance
SAVVIS has advised of a scheduled maintenance window early on the
morning of Thursday, December 16th. This should begin no earlier than 4am
Eastern time, and be completed within two hours. During that timespan,
our SAVVIS circuit may go out of service. This is a software
upgrade on their switching equipment. Customer traffic should not be
significantly affected.
[Dec 13, 1999, 11:29 pm] emancholl reboot
emancholl (www37) ran out of swap space forcing a reboot. Total downtime
was less than 15 minutes.
[Dec 13, 1999, 3:21 pm] UUnet Maintenance
UUnet has advised of a scheduled maintenance window early on the
morning of Tuesday, December 14th. This should begin no earlier than 3am
Eastern time, and be completed within four hours. During that timespan,
our UUnet circuit may go out of service. This is a software
upgrade on their routing equipment. Customer traffic should not be
significantly affected.
[Dec 13, 1999, 10:35 am] emancholl Downtime
emancholl (www37) crashed and was brought back online after
approximately 15 minutes downtime.
[Dec 12, 1999, 3:24 pm] theta Downtime
theta (www4) crashed this afternoon under heavy load and was brought back
online. Downtime was approximately 20 minutes.
[Dec 12, 1999, 4:46 am] epsilon Downtime
epsilon (www3) crashed early this morning under heavy load, and
required an extensive manual filesystem cleaning before being brought back
online. Downtime was approximately 45 minutes.
[Dec 10, 1999, 1:09 pm] Uilen Maintenance Completed
The emergency drive swap on uilen has been completed, and the
server is fully operational in it's new chasis.
[Dec 9, 1999, 1:41 pm] thurisaz Logs
Logs on thurisaz (www136) were not generated for some users last
night due to a configuration error. They are currently being regenerated
in full, and should be distributed to all users within the hour.
[Dec 9, 1999, 1:09 pm] uilen Drive Maintenance
Although we completed the move with no drive failures, uilen is
now reporting hard drive errors. We will be performing an online swap
of its data to a new drive; this will require two brief downtimes, barring
any greater failures, and will result in an upgraded system as well.
This will be completed within the next four hours.
[Dec 9, 1999, 11:49 am] New Address and Fax
As a result of our datacenter move, we have a new mailing address and
fax number:
pair Networks, Inc
2403 Sidney St, Suite 510
Pittsburgh, PA 15203
fax (412) 381-9997
Please use this information for all future contact, bill paying, etc.
Mail or faxes to the old address and number may be delayed for at least
one business day.
[Dec 9, 1999, 11:07 am] vilya Downtime
vilya mysteriously turned itself off. It's being brought back
online now. This could be a power supply problem; if it recurs, the
server will be swapped into a new chassis. Downtime was approximately
10 minutes.
[Dec 9, 1999, 11:02 am] mildh downtime
mildh (www87) crashed under heavy load and was brought back online
with downtime around 10 minutes.
[Dec 9, 1999, 10:18 am] Sprint Now Online
The configuration problem on the Sprint IDSU was resolved this
morning, and our circuit to their network is now fully online and routing
traffic. We will continue to tweak traffic levels today, but we expect
that there will be no further performance problems at this point.
We are still awaiting Digex's circuit installation, projected for
the first week of January. We also have the OC-3c upgrade still on order
with UUnet, and expect to order several additional circuits this
month as well, in anticipation of future growth. We should never be
caught short on circuit availability again.
[Dec 9, 1999, 1:33 am] idad Downtime
idad (www32) was down this evening for approximately 20 minutes.
It has since returned to normal service.
[Dec 8, 1999, 6:31 pm] Sprint Status
We have been unable to correct a configuration problem with the
Sprint-supplied IDSU for their circuit. We will resume these
efforts in the morning, when the engineer assigned is back on shift.
This is the only minor remaining obstacle to getting this circuit
running.
[Dec 8, 1999, 12:32 pm] Sprint Update
Sprint has been unable to resolve the interconnection issue
between their facility and MFS. Changing one port to another requires
calling in the original install group. At this point, all IP configuration
has been completed; as soon as the circuit physically comes up and can
pass traffic, our BGP session should come up and we will begin passing
Sprint traffic via Sprint's network, which will relieve
the strain on our SAVVIS and UUnet circuits.
We expect the line problem to be resolved by late afternoon. More
information will be posted before 5pm today.
[Dec 8, 1999, 10:13 am] Sprint Status
Last evening, Sprint installation engineers discovered a
provisioning error on their DS-3 circuit to this facility; simply put,
a "9" on a form faxed to them by MFS was misread as an "8". Consequently,
the line testing failed from their end to MFS. You might imagine that
correcting this is really simple. However, with a large phone company,
nothing is ever simple. When you get FOUR phone companies involved
on provisioning a circuit, simple goes right out the window.
We will know within the hour whether or not this circuit will be ready
to turn up today.
[Dec 7, 1999, 6:01 pm] SAVVIS Resolution
After a brief misconfiguration on their part, SAVVIS has restored
our circuit to New York and allocated sufficient bandwidth to serve our
network until the Sprint circuit can be activated, which hopefully
will take place tomorrow.
At this point, network congestion issues should be relatively minimal,
and decreasing as the evening goes on. We have also worked all day to
resolve IP address conflicts and other problems that arose when the
servers came back online after the move. Any customer who still has
any service down should please contact us at urgent@pair.com.
We apologize for the difficulties with our network; although the problems
arose from various broken promises from various providers, we naturally
feel responsible to our customers. We have committed to several network
upgrades in this facility which will make bandwidth shortfalls impossible.
Also, please remember that we will not be moving our equipment again;
this was a rather unusual network event.
[Dec 7, 1999, 5:17 pm] SAVVIS Maintenance
We're sorry to report that our SAVVIS circuit will be down for
approximately 15 minutes, beginning shortly. SAVVIS did not have
the capacity to support our upgraded circuit, and consequently will be
reattaching us to their New York POP, instead of Chicago, in order to
take advantage of increased capacity there. We believe that the capacity
they are offering will handle our network load until the Sprint
circuit is ready.
Unfortunately we have no choice; SAVVIS sold us a circuit upgrade
that they could not possibly support in their network, and now they have
to backpedal. This upgraded circuit was the cause of the problems both
for our connectivity through SAVVIS, and many other SAVVIS
customers in this area.
We will post further information once the line is reconnected to New York.
[Dec 7, 1999, 4:29 pm] SAVVIS Update
The SAVVIS circuit has been stabilized, with approximately
10% packet loss and excellent latency of 15ms. At this point, we are
barely saturating the UUnet circuit, but traffic is gradually
decreasing overall, so this problem will go away. The Sprint
circuit is scheduled to be activated on Wednesday morning, which should
forestall further saturation of our UUnet connectivity.
SAVVIS is trying to force us to downgrade our service with them;
apparently, a large part of the problem stems from the fact that they
sold us a service which they were not fully prepared to provision.
However, we feel we have reached a compromise level at which performance
is acceptable.
Once the Sprint circuit is operational on Wednesday, we expect
that network performance will significantly improve for the long term.
We apologize for the problems that plagued the network today; they were
brought on by SAVVIS, but the situation was primed by the fact
that other providers (specifically Digex and Sprint) failed
to come through on their target date promises to us.
[Dec 7, 1999, 3:17 pm] SAVVIS Update
The SAVVIS line gradually improved as it was worked on, to the
point of 30ms latency and 25% packet loss. However, as they've continued
work, the line has returned to 450ms latency and 40% packet loss. They
are at a loss to explain either condition, which is rather disappointing
to say the least.
We are continuing to monitor the situation and adjust traffic levels as
best possible on our end. We will post further information as it becomes
available.
[Dec 7, 1999, 2:32 pm] SAVVIS Update
SAVVIS has reduced the performance problem on their circuit from
50% packet loss and 400ms latency to 30% packet loss and 130ms latency.
We have not heard further from them, but they are still investigating.
We have rebalanced our traffic so that the UUnet line is no
longer saturated. At this point, customer traffic should be noticeably
improved for many destinations. We will post further regarding resolution
of the SAVVIS line.
[Dec 7, 1999, 1:25 pm] SAVVIS Update
SAVVIS is servicing our line. Although it has improved briefly
twice, overall the problem is as bad as ever. We are still working
with them to resolve the problem.
[Dec 7, 1999, 11:48 am] SAVVIS Update
SAVVIS has identified a potential configuration problem with our
line, and expects to have the problem resolved within the next half hour.
We will post as further information becomes available.
[Dec 7, 1999, 11:04 am] SAVVIS Update
While we are waiting for resolution of the SAVVIS circuit
problem, we have shifted traffic towards the UUnet circuit.
Unfortunately, this saturates the circuit and is causing packet loss
there as well. We are also working to bring our Sprint circuit
up today, which should alleviate the situation somewhat.
[Dec 7, 1999, 10:32 am] SAVVIS Problems
We have worked with SAVVIS this morning to identify a serious
problem with the provisioning of their circuit in the new facility.
It appears that the new circuit was provisioned through Chicago instead
of New York, and is experiencing 50% packet loss even at low load.
As this is one of only two circuits operational, this is affecting
customer traffic significantly. We have escalated the matter with
SAVVIS and hope to have more information shortly.
[Dec 3, 1999, 12:10 pm] ns0 Restored
ns0.ns0.com has been restored to full service. No reduction in
customer traffic was detected on our network, so the redundancy of
primary and secondary nameservers covered for the problem.
We will be rebuilding and upgrading our ns0.com servers after
the move is complete.
[Dec 3, 1999, 10:07 am] ns0 Failure
The primary drive on ns0.ns0.com has catastrophically failed.
We are replacing the server from a recent backup, and its service should
be restored within the next hour.
This server provides secondary nameservice to many of our customers'
domains. However, as the primary nameserver for each customer remains
operational during this outage, the general effect should be minimal; this
is a loss of redundancy, not a complete loss of service.
We will post further when the matter is resolved.
[Dec 2, 1999, 2:29 pm] pi Downtime
pi (www10/ssl3) rebooted due to heavy load, and required a manual
filesystem check before returning to service. Downtime was approximately
20 minutes.
[Dec 2, 1999, 8:41 am] sether (www95) Downtime
sether (www95) crashed under heavy load,
and was brought back online.
Total downtime was less then 15 minutes.
[Dec 1, 1999, 7:08 am] gamma downtime
As a result of high load, gamma crashed and was down for
approximately 15 minutes early this morning. It has since returned
to normal service.
[Nov 30, 1999, 8:36 pm] Network Solutions Procedural Change
Network Solutions recently informed their Premier Partners that they
will be moving to a new Domain Modification Template, version 5.0, and that
they will be accepting the previous version, 4.0, up until December 15th.
However, we have heard from at least one customer who received a message
from Network Solutions stating that the change is effective
immediately, and that their 4.0 template was rejected.
We immediately discussed the issue with our contact, as this follows on the
heels of a number of changes that didn't go smoothly. The interim solution
that was suggested is as follows:
Starting immediately we will be using template 5.0 for all domain name
submissions, both new and transfer. Any customer that is in the midst of a
transfer that has is having problems is urged to contact us at
domain@pair.com to receive the new template.
One important note about this must be brought to our user's attention.
This template will tell you to send it in to
registrations@networksolutions.com. DO NOT send it in to this
address. Instead, please continue to use
hostmaster@internic.net. We have been assured by our contact that
if this changes we will be notified. Again, only use
hostmaster@internic.net.
We apologize for these latest problems with InterNIC, and will do
everything that we can to reduce the impact they will have on our users.
[Nov 30, 1999, 11:28 am] paat Downtime
paat was down for approximately 15 minutes late this morning. It
has since returned to normal service.
[Nov 28, 1999, 4:57 pm] beith Downtime
beith (www17) was rebooted to restore normal system services this
afternoon. The server was unavailable for less than 15 minutes.
[Nov 27, 1999, 7:27 pm] Enda Reboot
enda (www80) was rebooted due to serious responsiveness problems, and has
returned to normal service. Downtime was approximately 15 minutes.
[Nov 27, 1999, 3:17 am] upsilon Downtime
upsilon was down this evening for approximately 20 minutes. It has since
returned to normal service.
[Nov 25, 1999, 6:21 pm] rho Downtime
rho (www11) was done this evening for approximately 10 minutes. It
has since returned to normal service.
[Nov 23, 1999, 10:23 pm] SAVVIS Routing Problems
We have recently noticed several problems with routes announced to us by
SAVVIS - for nearly four hours on Monday, November 22, and
approximately one hour on Tuesday, November 23, traffic delivered to and
from SAVVIS has dropped precipitously - by as much as 40%.
SAVVIS' engineers have explained this, and I must stress that we
are not making this up, as "a few less people decided to log on
to the Internet". Of course, this is absurd, but it's the answer
we were given after escalating the matter beyond their first-level techs,
so we are faithfully reproducing it here.
The more reasonable cause is, of course, that SAVVIS stopped
announcing a significant number of routes to us, presumably because there
was an outage or peering problem elsewhere in their network which caused
those routes to no longer be valid through their network. This makes
perfect sense when you consider that our traffic delivered to other
providers went up in perfect proportion during their slowdown.
Incidentally, customer traffic was not significantly affected.
It is unclear whether SAVVIS is unaware of any problems that may
have occurred on these two days, or unwilling to report them, but in
either case, providing an answer that is patently balderdash, after being
given two opportunities to investigate the problem, is a disservice to
pair Networks and its customers. Drawing a conclusion
from this incident is left to the reader.
[Nov 22, 1999, 3:49 pm] kodh maintenance completed
The harddrive upgrade for kodh has been completed, which will better allow for future growth of accounts on this server.
[Nov 22, 1999, 3:29 pm] kodh Drive Upgrade
kodh (www90) is being taken down for an emergency drive upgrade.
Downtime is expected to be less then 10 minutes.
[Nov 20, 1999, 2:56 am] unque
unque (www54) crashed due to low swap space. Downtime was under 15 minutes.
[Nov 18, 1999, 2:58 pm] omicron downtime
omicron (www9) crashed under heavy load and was brought back
online with downtime around 15 minutes.
[Nov 18, 1999, 1:54 pm] epsilon Downtime
epsilon (www3) was down this afternoon for about 10 minutes. It
has since returned to normal service.
[Nov 18, 1999, 9:39 am] gort Mail Delivery
Mail delivery on gort (www24) was interrupted this morning when
sendmail began deferring connections. The daemon has been restarted and is
working properly at this time. Any mail sent to users on gort
during the problem should be received over the course of the day, as remote
mail servers retry their deliveries.
[Nov 17, 1999, 11:18 pm] theta Downtime
theta.pair.com (www4) was rebooted under heavy load due to a serious mail
loop. Downtime was approximately 20 minutes.
[Nov 16, 1999, 8:26 pm] gao Downtime
gao (www111) crashed under heavy load. Downtime was approximately 15
miunutes.
[Nov 16, 1999, 7:00 pm] Gao maintenance
The maintenance on gao has been completed.
[Nov 16, 1999, 6:30 pm] Digex/Intermedia Maintenance
Digex/Intermedia has scheduled an emergency maintenance window
for Wednesday morning between 4am and 5am Eastern time. They will be
rebooting several routers on their network, including their equipment in
Pittsburgh. Customer connectivity should not be significantly affected.
[Nov 16, 1999, 6:26 pm] gao maintenance
gao (www111) will undergo a harddrive upgrade shortly. This will result
in two brief downtimes while the server reboots with the new configuration.
This upgrade will allow gao to better accomidate the growing needs
of users by increasing the size of the /var partition.
[Nov 15, 1999, 4:13 pm] Network Solutions Update
Network Solutions has discovered the cause of the recent problems
we and our customers have been having with template submissions, and the
result is that for the time being, the correct address for all customers to
use to modify their domain names is once again hostmaster@internic.net.
Please use this address for the submission of all domain name
modification templates. If you previously submitted a template to
Network Solutions at any address, and you have not received
back even an autoresponder, we advise that you resend it to the above
address, even if you sent it there previously as well. Any customers who
need assistance with any of this should contact support@pair.com.
[Nov 11, 1999, 5:36 pm] iota downtime
iota (www5) crashed under heavy load and was brought back online
with downtime under ten minutes.
[Nov 10, 1999, 11:10 pm] Network Solutions Troubles
Network Solutions (formerly known as InterNIC), our current default
registrar for .COM, .ORG, and .NET domains, is experiencing significant
problems processing new and modify requests for our customers' domains.
Specifically, some submissions are disappearing without receiving tracking
numbers, while others are receiving incorrect error messages or advice to
use their Web interface.
We do not know if these problems are affecting all registrants, but have
no reason to believe otherwise. Any customer who does not receive a
tracking number in response to a submission to any @internic.net
or @networksolutions.com address is encouraged to resend their
template to registrations@networksolutions.com as well as
hostmaster@internic.net. Please advise us of the difficulties
so that we can better document them for our Premier contact.
We will post further information as it becomes available.
[Nov 10, 1999, 5:25 pm] anca downtime
anca (www53) crashed under heavy load and was brought back online
with downtime around 10 minutes.
[Nov 9, 1999, 3:37 pm] straif downtime
straif (www26) crashed under heavy load and was brought back
online with downtime under 15 minutes.
[Nov 8, 1999, 2:26 pm] SAVVIS Scheduled Maintenance
SAVVIS has scheduled a maintenance window for 3am to 5am on the morning
of Tuesday, November 9th, Eastern time. During that time, they will be
upgrading router hardware and software. Our customer traffic will follow
alternate paths and not be significantly affected.
[Nov 6, 1999, 3:32 pm] derba Upgrade Completed
The upgrade on derba (www104) has been completed. Downtime
was less than 10 minutes.
[Nov 6, 1999, 3:09 pm] derba Upgrade
derba (www104) will be taken down briefly during the next hour
in order to install additional hard drive space. Expected downtime is
less than 15 minutes.
[Nov 5, 1999, 10:41 am] pi Downtime
pi (www10) crashed under heavy load and required a manual
filesystem cleaning before being brought back online. Downtime was
less than 20 minutes.
[Nov 5, 1999, 9:34 am] Database Policy
A new policy on database usage has been published today and takes effect
immediately. We will take action on known problem cases on Monday. A
detailed announcement has been posted to pair.announce on news.pair.com.
The full policy can be found at:
http://support.pair.com/policy/dbresource.html
[Nov 4, 1999, 5:24 pm] gao Downtime
gao (www111) crashed under load and was brought back online. Downtime
was less than 10 minutes.
[Nov 4, 1999, 2:57 pm] dyyme Downtime
dyyme (www71) was down this afternoon for approximately 10
minutes. It has since returned to normal operation.
[Nov 4, 1999, 2:15 pm] theta downtime
theta (www4) crashed under heavy load and was brought back online
with downtime around 10 minutes.
[Nov 3, 1999, 4:58 pm] Network Solutions Domain Modifications
The domain name internic.net will eventually no longer be
available for use with the domain registration business of Network
Solutions. This has always been true for contractors holding the
InterNIC contract. We have been advised by our Premier Contact that
as soon as possible, all domain templates should be sent to this address:
registrations@networksolutions.com
The old address, hostmaster@internic.net, may no longer function
reliably, and will eventually not work at all.
This change is already reflected in the messages we are sending to
customers for domain modifications.
[Nov 1, 1999, 6:04 pm] idad downtime
idad (www32) crashed under heavy load and was brought back online
with downtime under 15 minutes.
[Nov 1, 1999, 2:35 pm] psi downtime
psi (www15) crashed under heavy load and was brought back online
with downtime around 10 minutes.
[Oct 31, 1999, 1:49 pm] roomen downtime
roomen (www61) crashed under heavy load and was brought back
online with downtime under 10 minutes.
[Oct 29, 1999, 5:38 pm] Digex Emergency Maintenance
Digex/Intermedia is reporting that emergency maintenance will be
required in their Pittsburgh POP Saturday morning (tonight), between 4am
and 5am Eastern time. Their router is suffering from memory problems,
and will be serviced during that period. There should be no significant
impact on customer traffic.
[Oct 29, 1999, 3:30 pm] UUnet Outage
UUnet suffered a router crash in Pittsburgh at approximately 1:15pm
Eastern time today. Routing was restored to normal within 15 minutes, and
in the meantime, customer traffic followed alternate paths without major
problems.
[Oct 29, 1999, 1:02 pm] omicron downtime
omicron (www9) crashed under heavy load and was brought back
online with downtime around 15 minutes.
[Oct 28, 1999, 5:27 pm] NSI Maintenance
We have been informed of upcoming maintenance at Network Solutions:
The Ingres II database will be down for an hour starting 22:00 EDT
on 28 October 99, Thursday Night, lasting until 23:00 EDT on 28 October
99. All services should be returned to normal no later then 23:00
Thursday evening.
Affected Services:
1. The registration database will be unavailable.
2. All registration and reservation templates will queue.
3. Online payments will not be available.
The system will also be down this weekend. The outage window will last
approximately 9 hours, starting at 22:00 EDT on 30 October 99, and lasting
until 07:00 EDT on 31 October 99. All services should be up and available
no later then 08:00 EDT.
The affected services are as follows:
1. System services will be slightly degraded during that period.
2. All registration and reservation templates will queue.
3. Online payments will be available.
[Oct 28, 1999, 10:36 am] uilen Downtime
uilen (www34) crashed under load and was brought back online
after a manual filesystem cleaning. Downtime was approximately 15
minutes.
[Oct 27, 1999, 4:54 pm] ailm downtime
ailm (www28) crashed under high load and was brought back online
with downtime under 15 minutes.
[Oct 25, 1999, 4:58 pm] gao downtime
gao (www110) was found to be inaccessable due to high load. After
a reboot and filesystem check, it has returned to service. Downtime was
less than 15 minutes.
[Oct 25, 1999, 11:15 am] pi Downtime
High load forced a reboot of pi (www10); after a manual filesystem
cleaning at boot, the system has returned to normal operation. Downtime was
approximately 15 minutes.
[Oct 22, 1999, 10:27 am] thesel Logs
Web logs on thesel (www99) were delayed overnight due to a
configuration error. Full logs for yesterday have been regenerated, and log
generation will resume as normal with tonight's distribution.
[Oct 22, 1999, 12:25 am] thesel Downtime
thesel.pair.com (www99) was down for aprox. 25 minutes early this morning due to a failed power supply . The power supply was replaced , and the server is back to normal operation.
[Oct 21, 1999, 2:15 pm] theta Downtime
High load caused a major error with theta (www4), which
required a manual rebooting to restore normal service. Downtime was
approximately 20 minutes.
[Oct 20, 1999, 4:55 pm] UUnet Scheduled Maintenance
UUnet has scheduled a router upgrade for their Pittsburgh POP.
The maintenance will begin at 4am Eastern time on the morning of Thursday,
October 21st (that's "tonight"), and should impact our circuit for less
than an hour. This upgrade was originally scheduled for October 26th,
but UUnet has rescheduled it on short notice.
As usual, customer traffic should not be significantly affected.
[Oct 20, 1999, 9:21 am] ilwe Restored to Service
ilwe has been restored to full service as of 9:15am Eastern time.
All user data was transferred to a new, fully-tested drive, with no loss.
The old drive will be removed from the server at a later date; it is
currently not active and therefore cannot bring the server down.
We apologize for the outage necessitated by this emergency drive
replacement. We are continuing to pursue server configurations which
will avoid this type of problem entirely (yes, we are very familiar with
RAID).
[Oct 20, 1999, 6:29 am] ilwe Emergency Maintenance
The primary drive on ilwe has failed, starting around 5:35am
Eastern time. It will take approximately two hours to fully rebuild the
drive, during which time it will be inaccessible for customer use. We will
post updates on this procedure as it progresses.
[Oct 19, 1999, 12:26 pm] New Relay Server
As planned, service has been switched over to the new relay server.
If you need to use the old relay server until you can make necessary
changes in your or your clients' email configuration, it is available at
relay-old.pair.com, and will remain there for 1 month.
These changes might not effect you immediately, as some DNS changes will
need to propagate first.
[Oct 17, 1999, 9:00 pm] enda downtime
enda (www80) ceased responding this evening and required a
manual reboot, after which it returned to normal operations. Downtime was
less than 15 minutes.
[Oct 16, 1999, 1:51 pm] anca Downtime
anca (www53) was down this afternoon for about 15 minutes. It has
since returned to normal service.
[Oct 15, 1999, 6:57 pm] Resolution of InterNIC Changes
The controversial changes recently outlined in a posted notice, which were
instituted entirely at the request of Network Solutions (formerly
known as InterNIC), have been resolved. We are no longer required
to submit domain modifications on behalf of our customers. However,
customers may no longer send templates to the pair@internic.net
address. Instead, all modification templates must go to the traditional
hostmaster@internic.net address. We will continue to process new
domain registrations ourselves on behalf of customers, automatically.
This change is now reflected in our online documentation and customer
contact messages. We apologize for the hasty nature of the original
change, as well as the lack of clarification regarding who instituted those
changes. The original idea was put forth by Network Solutions,
not by pair Networks.
[Oct 15, 1999, 4:45 pm] anca downtime
anca (www53) crashed under heavy load, and was brought back online
with downtime under ten minutes.
[Oct 15, 1999, 3:05 pm] unque downtime
unque (www54) crashed under heavy load, and was brought back
online. Downtime was less than 10 minutes.
[Oct 15, 1999, 1:49 pm] theta downtime
theta (www4) crashed under heavy load, and was brought back online
with downtime around 15 minutes.
[Oct 15, 1999, 1:33 pm] ailm Downtime
ailm (www28) crashed under heavy load, and was brought back online.
Downtime was less then 10 minutes.
[Oct 14, 1999, 8:11 am] theta Downtime
theta (www4) crashed under heavy load and was brought back online.
Downtime was around 20 minutes.
[Oct 13, 1999, 6:42 pm] Major InterNIC domain transfer change
Beginning within the next few days, there will be a major change to the
way that InterNIC domains are transferred to accounts at pair Networks.
In the old system, the customer would be e-mailed a domain transfer
template which they would need to send to InterNIC to initiate the
transfer.
In the new system, pair Networks will actually mail in the proposed
changes to the domain whenever a domain transfer in an upgrade or signup
is processed. Once this transfer template is sent in, the current
contacts for the domain will receive a request for acknowledgement ("ACK")
and will need to send in this acknowledgement before the domain will be
transferred. If the contacts take no action, the ticket will remain open
with InterNIC indefinitely. In this way, the transfer will be possible at
any time, but the customer will be able to transfer the domain at their
leisure, for example if they do not want changes on the new site to be
"live" yet.
This system is still experimental and we certainly appreciate and pass
along any feedback on it that customers would like to offer; please send
such comments on to support@pair.com.
[Oct 12, 1999, 6:00 am] Digex Resolved
Our Digex connectivity was restored to normal around 4:35am Eastern
time, after an outage of approximately three hours. Customer traffic was
carried on alternate paths and therefore not significantly affected.
We are awaiting official explanation of the outage, but the preliminary
evaluation by Digex is that Bell Atlantic performed
unscheduled, possibly on an emergency basis, maintenance on their SMDS
equipment, thereby shutting down a number of circuits without prior
warning.
[Oct 12, 1999, 2:53 am] Digex Outage
Digex has lost connectivity to multiple customers in the Pittsburgh area, due to unscheduled
maintenance by Bell Atlantic, as of 1:30am Eastern time. Traffic to and
from our network is using alternate routes with no difficulty. We will continue to
monitor this situation, and provide further information as it becomes available. The present
estimated time of completion by Bell Atlantic is 8am.
[Oct 11, 1999, 5:32 pm] InterNIC problems continued
We have just received the following notification from our InterNIC contact:
--
The Ingres database is scheduled for maintenance this evening for
performance tuning. The database will be down starting 22:00 EDT on 11
October 99, Monday Night, lasting until 22:30 EDT on 11 October 99. All
services should be returned to normal no later than 23:00 on 11 October 99.
Affected Services:
1. Registration database will be unavailable.
2. All registration and reservation templates will queue.
3. Online payments will not be available.
--
As before, please contact domain@pair.com with details of any registration
or transfer problems you find during this time.
[Oct 11, 1999, 5:26 pm] relay Server Delayed
Based on feedback from resellers, we have delayed the deployment of the new
relay server to Monday, October 18th. At that time, the existing relay
server will be moved to relay-old.pair.com (this alias will begin working
within 24 hours), where it will remain in operation for at least 30 days.
[Oct 11, 1999, 12:32 pm] anca downtime
anca (www53) crashed under heavy load and was brought back online.
Downtime was around 10 minutes.
[Oct 10, 1999, 8:51 pm] glikk logs
A log generation problem on glikk (www119) had caused some users
to receive no web logs from Wednesday through this morning. The error has
been corrected, and all missing web logs have been regenerated.
[Oct 10, 1999, 8:10 pm] cele Mail Problems
cele experienced some problems with sendmail today, and as a
result some user mail may not have been delivered to the server. The
problem appears to now be fixed, and mail delivery is back to normal.
[Oct 9, 1999, 6:57 pm] Internal Network Trouble
During routine maintenance of an internal LAN router, a configuration
error caused the router to drop from the network, leading to repercussions
for servers that depend upon that router. The maintenance was related to
allowing these routers to cover for each other in the case of failure;
unfortunately, this was not possible due to the error.
The router was brought back online within fifteen minutes, and only certain
servers and certain sites on those servers were affected. It should now
be impossible for this problem to recur; the necessary reconfiguration for
optimal redundancy has been completed.
[Oct 9, 1999, 8:10 am] Relay Server Problems
Beginning on the afternoon of Friday, October 8th, our old relay.pair.com
server, which is still in service until Monday, began to experience severe
load problems caused by an apparent denial-of-service attack from a Los
Angeles-based Spammer. The attacker was filtered within three hours, but
it took most of the rest of the evening for backlogged mail to be delivered
to intended recipients. The relay server is now back to normal operation,
and the replacement of relay will proceed on schedule on Monday, October
11th.
Please accept our apologies for failing to post an explanation of this
issue on Friday; this was an unintentional oversight that led to customer
concern that we were not aware of or handling the issue. This was not the
case, but the missed posting was simply a case of human error.
[Oct 8, 1999, 6:15 pm] UUNet Maintenance
UUNet will be performing maintenance on their Pittsburgh router during their
maintenance window on the morning of Tuesday 10/12 beginning around 3:00am EST.
Customer traffic will continue to flow on our other circuits and should not be
adversely affected.
[Oct 8, 1999, 5:58 pm] InterNIC problems
Starting last weekend and continuing on and off through at least this
coming weekend, mail to InterNIC has been experiencing some difficulty. As
a result, some registration and transfer attempts sent to them may be lost.
Any customer who believes that a domain name that they attempted to
register with InterNIC through us hasn't been completed should contact us
at domain@pair.com. Similarly, if you have a domain name transfer
which should have been completed but hasn't yet, contact us for assistance
with completing it.
We will post here when there has been any resolution to this problem.
[Oct 7, 1999, 4:44 pm] pi downtime
pi (www10) crashed under heavy load and was brought back online.
Downtime was around 15 minutes.
[Oct 7, 1999, 11:32 am] anca reboot
anca (www53) was rebooted due to a memory error while in a low-disk
condition; the server has come up after a short filesystem cleaning, and
disk space has been cleared. Normal service has been restored at this
time. Server downtime was approximately 15 minutes.
[Oct 6, 1999, 4:29 pm] theta downtime
theta (www4) crashed under heavy load, and was brought back
online. Downtime was around 20 minutes.
[Oct 2, 1999, 5:21 am] epsilon Downtime
epsilon (www3) was down for approximately 10 minutes after crashing under load.
[Oct 1, 1999, 5:00 pm] UUnet Update
UUnet is now reporting that they are affected by fiber cuts in
Wyoming and the Bay Area. Although no significant traffic drop has been
observed, we can confirm the Bay Area problem by examining performance to
certain remote sites.
This does not affect pair Networks service directly, but may
affect some of our customers and their site visitors.
[Oct 1, 1999, 2:09 pm] UUnet Resolution
UUnet has advised us that they have recently suffered fiber cuts
in New York and Chicago, and that the most likely cause of the problem
is the brief loss of international traffic flowing through New York.
Traffic returned to normal levels within forty-five minutes of the
incident.
[Oct 1, 1999, 1:37 pm] UUnet Traffic
At 1:30pm Eastern time today, we have observed a sudden and significant drop
in inbound UUnet traffic, which in turn results in reduced outbound
traffic. The most likely cause is a cut or failure somewhere within
UUnet's network. We are contacting their NOC at this time in order
to determine the cause and their projected resolution.
Customer traffic will automatically shift to other circuits. We will post
more information as it becomes available.
[Sep 30, 1999, 11:13 am] tinne downtime
tinne (www21) crashed under high load, and after an extensive
filesystem cleaning was brought back up. Downtime was around 15 minutes.
[Sep 29, 1999, 8:48 pm] Dhaame Reboot
Dhaame (www77) was restarted because of a high load condition severely
degrading performance. It has been rebooted and returned to normal service.
[Sep 29, 1999, 5:42 pm] Severe Network Event
Beginning around 5:06pm Eastern time today, a severe network event
impacted accessibility of a large portion of our network for approximately
twenty minutes.
Another network, which we will not yet identify, issued nearly ten
thousand new route announcements, all of them apparently invalid, to
their upstream provider, who dutifully propagated them to its peers,
including UUnet and SAVVIS, who in turn propagated those
announcements to us.
The sudden sharp increase in routing table size caused one of our
internal gateway routers to shut down its routing capabilities on a
low memory condition. This was not expected behavior, but it was
diagnosed promptly and the router was re-enabled. This was insufficient
to fully correct the problem; consequently, we placed filters on our
border routers to deny all route announcements from the offending
network. This removed the unusual routes from our network and allowed
traffic to again flow freely.
This type of misconfiguration, in which one network disrupts the
connectivity of thousands of others through inadvertent bogus
announcements, has happened before and is difficult to prevent. Some
of the fault may lie with the network's upstream, who did not filter
their announcements for known valid blocks. Once the announcement leaves
the upstream, other networks have no reliable and automatic method of
recognizing invalid announcements. Thus the problem easily spreads
throughout the Internet and can disrupt traffic or crash routers in many
other networks.
Although service was fully restored within twenty minutes, we have
reason to believe that the original source of the problem has impacted
other networks and will continue to do so for some further time this
evening. We will seek further information on the incident from public
sources and our upstream providers. Our network will remain filtered
until such time as we can confirm that the problem has been rectified to
our satisfaction.
Please note that not all pair Networks servers were
severely affected, although most were at least somewhat affected by the
overall disruption of traffic to our upstream providers.
We will continue to monitor this incident and post further updates as
they become available.
[Sep 29, 1999, 3:40 pm] Ohio Fiber Cut
A massive fiber cut has been reported in Ohio, affecting cross-country
OC-192 links for a number of backbone providers, including AboveNet,
GTE/BBNplanet, and MFS (MCI WorldCom's facilities division). We have not
seen any significant changes in traffic, but this cut could affect some of
our customers and visitors to their sites.
The cut will surely be repaired as soon as possible. A news story on
the cut is available
here.
[Sep 28, 1999, 1:14 pm] theta Downtime
theta crashed and was down for approximatly 15 minutes this
afternoon. It has since rebooted and is back to normal service.
[Sep 28, 1999, 10:37 am] pi Downtime
pi (www10) crashed under heavy load. Downtime was less then 15 minutes.
[Sep 28, 1999, 12:13 am] nuin Downtime
nuin (www18) was down for approximately 20 minutes this evening.
It has since returned to normal service.
[Sep 27, 1999, 10:07 am] xi Downtime
xi (www8) was down for approximately 5 minutes after
a maintenance reboot.
[Sep 26, 1999, 6:19 pm] gao upgrade
Due to a shortage of drive space on gao, this server was taken down
for a drive upgrade. Downtime was less than 10 minutes for this operation, and
the new total drive space for gao is now 15.2GB.
[Sep 26, 1999, 12:29 am] wawrra Downtime
wawrra (www109) was down for approximately 10 minutes this
evening. It has returned to normal service.
[Sep 25, 1999, 4:14 pm] gao Downtime
gao (www111) crashed under severe load and
has been brought back online after a manual filesystem
cleaning. Downtime was approximately 30 minutes.
[Sep 24, 1999, 9:31 am] gao Relocation Complete
The relocation of gao (www111) has been completed.
Actual downtime was approximately 15 minutes.
[Sep 24, 1999, 7:57 am] gao Relocation
gao (www111) will be taken down briefly this
morning for a physical relocation. Downtime is expected
to be less than 10 minutes.
[Sep 24, 1999, 7:19 am] lissa Downtime
lissa spontaneously rebooted this morning. Downtime was less then 5 minutes.
[Sep 24, 1999, 3:30 am] lissa Downtime
lissa (www72) crashed under heavy load. Total downtime was less then 10 minutes.
[Sep 21, 1999, 4:53 pm] Digex/Intermedia Scheduled Maintenance
Digex/Intermedia will be utilizing their maintenance window
between the night of Thursday, September 23rd and the morning of Friday,
September 24th. Between 4am and 8am Eastern time, our connectivity to
their network may be interrupted. Network performance for our customers
should not be significantly affected.
[Sep 20, 1999, 3:22 pm] pi Downtime
pi (www9) was down briefly due to high load. Total downtime was around 10 minutes.
[Sep 20, 1999, 3:23 pm] upsilon Downtime
upsilon (www13) was rebooted for routine maintainence. Downtime
was approx 6 min.
[Sep 20, 1999, 5:26 am] lissa Downtime
lissa was down briefly this morning due to a high load. Total downtime was around 10 minutes.
[Sep 18, 1999, 2:11 am] Savvis Network Resolution
At approximately 12:30AM EST Savvis's trunk line repair was completed.
All traffic through our network should be flowing normally again.
[Sep 18, 1999, 1:43 am] anca Downtime
anca was down briefly this evening and has returned to service. Total
downtime was approximately 5 minutes.
[Sep 17, 1999, 7:05 pm] Savvis Network Issues
Savvis has had two cuts in trunk lines, between the New York and
Philadelphia areas, and between the Philadelphia and D.C. areas. This
outage may cause slowdowns in access for some customers, but new routes
utilizing our other providers are currently handling the traffic load.
More information will be posted when there is any change.
[Sep 16, 1999, 8:06 pm] theta downtime
theta (www4) was rebooted under heavy load and disk activity due to a mail
loop. The server required a manual filesystem check when it was returned to
service. Downtime was approximately 40 minutes. The system has returned to
normal service at this time.
[Sep 15, 1999, 11:02 am] anca downtime
anca (www53 was down for approx. 15 minutes due to heavy load.
It has been brought back online with no errors.
[Sep 14, 1999, 6:18 pm] sether downtime
sether (www95) crashed under heavy load, and was brought back
online. Downtime was around 10 minutes.
[Sep 14, 1999, 9:08 am] Resolution of pi Incident
As our customers are surely aware, the last week has brought a string of
unfortunate incidents to pi, all related to hard drive failures.
What started as a simple upgrade of available space turned into a nightmare
that required repeated downtimes.
The problem began when the replacement drive was found to be running
extremely slowly, as well as seeking very noisily. This was considered a
sign of failure, although the drive had been tested before installation.
Unfortunately, the problem wasn't recognized until the majority of user
data had been moved to the new drive. We chose to re-replace the noisy
drive with a new drive, which unfortunately had not gone through our
burn-in procedures.
Because the drive was running so slowly, the only
way to complete this transfer within an acceptable timeframe was to disable
most of pi's services during the transfer. That took about six
hours, and was the primary reason for the three-day credit issued to all
pi customers. When the transfer was completed and the system
brought up with the new drive, we found that it was also very noisy.
The problem was then recognized as a
failure of the IDE controller, so we moved the new drive to a different
chassis, and the problem went away.
Unfortunately, this left all user data running on an untested drive. As
fate or Murphy would have it, the new drive failed on Monday afternoon,
with extensive bad blocks (more than could be remapped) on the /usr partition.
As the new backups were not yet complete, and we didn't want to fall back
to even older backups, this left us with no choice but to once again take
the server offline for a manual transfer of user data from the failed drive
to another new, recently tested drive. This took approximately five hours
to complete. The server was again offline briefly this morning to put it
back into correct physical position, and now appears to be operating
normally. Another fresh backup has been started.
In the midst of all this, the original replacement drive for pi,
which had been tested, was used as an emergency replacement for
nuumen.
The entire incident has led to several significant developments.
- A full month's credit will be posted to all pi customers,
above and beyond the three-day credit that was posted earlier.
- We are accelerating our testing of a hardware-based IDE RAID solution,
which is relatively inexpensive but virtually eliminates the problems
associated with this failure mode.
- We will be maintaining a more extensive supply of pre-tested drives.
Our new datacenter has provisions specifically to support this.
- We will be publishing detailed plans for addressing future hard drive
problems, which will become more and more critical as the majority of
our installed base of drives approach the 2-to-3 year age. These plans
will be available in the Support Forum by the end of September.
We would like to extend an apology to all pi customers and site
visitors who were affected by these extended downtimes. We will be working
to ensure that such outages are no longer possible in the future, or in
the worst cases, are at least less painful.
[Sep 13, 1999, 8:42 pm] pi Progress
pi (www10) has returned to normal operation. If you encounter any problems
with the system please e-mail urgent. Further information will be posted
shortly.
[Sep 13, 1999, 4:04 pm] pi Drive Failure
Even though its own its third drive subsystem, in a completely new case and
motherboard, the drive on pi is again showing signs of failure.
We will be working with the server in offline mode in order to restore its
backup as quickly as possible. The new drive has gone through an extensive
burn-in procedure, and should be online within the next two hours.
[Sep 10, 1999, 3:15 pm] Scheduled Maintenance for SAVVIS
SAVVIS has announced a major scheduled maintenance window for this
coming Saturday night until Sunday morning, from 2am to 8am Eastern time
on September 12th. The expected duration of the outage should be no more
than 5 minutes, but in the worst case, it could take four hours. In either
case, customer traffic should not be significantly affected.
The maintenance will complete the merger of SAVVIS' network with
that of their new parent, Bridge Information Systems.
[Sep 10, 1999, 12:26 pm] nuumen Maintenance Completed
As of 12:25pm Eastern time, nuumen is back to normal operation
with a much larger hard drive. The data transfer was completed with no
further drive failures, and downtime was minimal.
There may be another brief downtime within the next 48 hours, as the
server will be relocated to a normal rack-mounted position.
[Sep 10, 1999, 9:48 am] nuumen Maintenance
nuumen will be down briefly again this morning so that drive
maintenance can continue. Although the failed drive has been running
comfortably in a new system, we believe that it should be replaced. Unless
a further failure occurs, most of the replacement operation will occur
while the server is still in service.
In the worst case of failure, there is a backup available from Thursday
evening. We will post further notices as there is progress to report.
[Sep 9, 1999, 9:47 pm] nuumen in Service
nuumen has currently returned to service. While it is in service
it will be undergoing maintenance to replace the hard drive. There will be
a notice when this occurs.
[Sep 9, 1999, 9:43 pm] nuumen Downtime
nuumen (www55) is currently experiencing hard drive problems. We
are working to resolve this problem. Further updates will be posted as
needed.
[Sep 7, 1999, 5:16 pm] theta Upgrade Completed
Upgrades to the hard drive space for theta (www3) have been
completed. Downtime was approximatly 10 minutes while we completed the
final hardware swap. Total drive space on this server now exceeds 25GB,
which should be sufficient to cover customer needs for some time to come
in the future.
[Sep 7, 1999, 7:32 am] pi Maintenance Completed
As of 7:30am Eastern time today, all maintenance has been completed on
pi. The server was providing temporary service and burn-in
since last night around 10:30pm; overall, the outage was effectively
seven hours long. We will be issuing an appropriate credit to customers
on this server.
We apologize for the extended downtime necessitated by this difficult
failure mode.
[Sep 6, 1999, 6:11 pm] pi Emergency Maintenance
Emergency maintenance has been required on pi this afternoon,
which has significantly reduced the availability of the server to our
customers. The maintenance consists of an emergency drive swap, which
should be complete by 11pm Eastern time, as well as a motherboard swap
(we believe the motherboard controller may be faulty as well).
Once this incident is resolved, we will be investigating our burn-in
procedures and considering a customer credit for the associated downtime.
Please accept our apologies for the inconvenience. Managing this incident
is further hampered by the fact that as one of our older servers,
pi not only has a larger number of customers, but also a larger
quantity of accumulated files for those customers.
[Sep 6, 1999, 1:03 pm] pi Performance
pi is currently experiencing performance problems due to the
partial failure of its new drive. We are working urgently to move all
customer data to another replacement drive. We hope to have the server
restored to normal service by Tuesday morning.
[Sep 3, 1999, 3:27 pm] pi Maintenance
pi (www9) was down for approximatly 20 minutes this afternoon
while we completed the upgrade to its harddrive. It has since returned to
normal service.
[Sep 2, 1999, 5:40 pm] theta Downtime
theta (www4) was down for 20 minutes this afternoon.
It has returned to normal service.
[Sep 1, 1999, 9:02 pm] drive upgrades
Servers pi (www9) and theta (www3) shall be receiving
hard drive upgrades over the next couple of days. There will be two short
downtimes for both of these servers as the new hardware is put into place.
[Sep 1, 1999, 3:21 am] pi Downtime
pi (www10) was down for approximately 15 minutes this morning.
It has returned to normal service.
[Sep 1, 1999, 3:26 am] epsilon Downtime
epsilon (www3) was down for approximately 15 minutes this morning.
It has returned to normal service.
[Aug 30, 1999, 7:19 pm] enda Downtime
enda (www80) was down for approximately 15 minutes after a crash
due to a bad mail loop. It has returned to normal service at this time.
[Aug 27, 1999, 3:20 am] omega Downtime
omega (www16) was down for approximately 20 minutes after a crash
due to a bad mail loop. It has returned to normal service at this time.
[Aug 26, 1999, 3:53 pm] omicron Upgrade
Work on omicron (www8) has been completed, resulting in another
brief downtime while we swapped hardware. This server is now upgraded to a
P-III 550 MHz with 16.8GB total disk space.
[Aug 26, 1999, 2:50 pm] UUnet Upgrade
There was a brief interruption of service on our UUnet circuit,
approximately three minutes, around 2:45pm Eastern time today, as the
circuit was upgraded to additional capacity. This is the last expected
upgrade until our new datacenter comes online in November.
[Aug 26, 1999, 1:32 pm] nuumen Downtime
nuumen (www55) was down for approximately 15 minutes after a crash
due to a bad mail loop. It has returned to normal service at this time.
[Aug 25, 1999, 11:10 pm] omega Downtime
omega (www16) was down for approximately 10 minutes after a crash
due to heavy load. It has returned to normal service at this time.
[Aug 24, 1999, 9:27 am] kodh Downtime
kodh was down for approximately 10 minutes while
a severely run-away user process was cleaned up. It has
returned to normal service at this time.
[Aug 23, 1999, 5:22 pm] omicron Upgrade
omicron was taken down briefly for an upgrade to server hardware.
A new harddrive is being added currently, and once this is completed
omicron will be rebooted once more to finish the procedure. The new
omicron is planned to become a P-III 550mhz.
[Aug 23, 1999, 10:05 am] onn Downtime
onn crashed during a mail loop, and was brought back online in
approximately fifteen minutes.
[Aug 21, 1999, 2:32 pm] Digex Outage
Digex and Bell Atlantic technicians, working together in
Pittsburgh, have resolved the SMDS connectivity problem. Our circuit to
Digex has been restored to normal operations as of 11am Eastern time
today. The total outage duration was approximately 13 hours. Customer
traffic should not have been significantly affected.
[Aug 21, 1999, 6:59 am] Digex Outage
The outage of our circuit with Digex has now continued for nine
hours. Digex reports that the problem is in Bell Atlantic's SMDS
host equipment. This equipment failed approximately two years ago, as
well, and it took several multi-hour outages to get the problem corrected.
Traffic flows should not be significantly affected during the weekend.
We will post further information as it becomes available.
[Aug 20, 1999, 10:23 pm] Digex Outage
Digex has lost all customer connectivity in their Pittsburgh point
of presence, as of approximately 9:55pm Eastern time. Traffic to and from
our network is using alternate routes with no difficulty. We will continue
to monitor this situation, and provide further information as it becomes
available. At present, Digex has no estimated time to repair.
[Aug 20, 1999, 5:21 am] Network Maintenance
Digex is performing maintenance within the PIT1 Intermedia
Point of Presence between 4am and 8am EST. Network connectivity
may might be intermittently affected.
[Aug 18, 1999, 6:20 pm] theta Downtime
theta (www4) crashed under heavy load, and was brought back
online. Downtime was approximately ten minutes.
[Aug 16, 1999, 12:42 pm] omicron Downtime
omicron (www9) crashed under heavy load, and was brought back
online. Downtime was approximately ten minutes.
[Aug 14, 1999, 9:08 am] ungwe Downtime
ungwe (www48) was down this morning for approximately 2 hours.
It was rebooted and returned to normal service after the problem was
determined and solved.
[Aug 13, 1999, 11:56 pm] omega Downtime
omega (www16) was down this evening for approximately 30 minutes.
It was rebooted and returned to normal service after a manual file system
check.
[Aug 10, 1999, 9:41 pm] pi Downtime
pi (www10) was down this evening for approximately 10 minutes.
It has rebooted and returned to normal service.
[Aug 10, 1999, 6:42 pm] ailm Downtime
ailm (www28) was down this evening for approximately 10 minutes.
It has since returned to normal service.
[Aug 9, 1999, 8:59 am] emancholl Downtime
emancholl (www37) crashed, and was down for approximatly 10 minutes. After
a normal reboot, it returned to service.
[Aug 4, 1999, 10:46 am] epsilon Downtime
epsilon crashed, and was down for approximatly 10 minutes. After
a normal reboot, it returned to service.
[Aug 3, 1999, 6:19 am] SAVVIS Downtime
The problem with SAVVIS, which was caused by unannounced maintenance
on their end, has been resolved.
[Aug 3, 1999, 5:59 am] SAVVIS Downtime
The router on the other end of our SAVVIS circuit apparently went
down around 5:15am Eastern time. We are currently awaiting further
information on the estimated time to repair. In the meantime, all traffic
will continue to flow on alternate paths.
[Aug 3, 1999, 2:39 am] falku Downtime
falku (www98) was down tonight for approximately 15 minutes.
It has returned to normal service.
[Jul 30, 1999, 5:12 pm] UUnet Routing
UUnet has advised us of limited outages and routing abnormalities
in their network, which may affect some customer traffic. We are seeing
fluctuating traffic levels, with reduced UUnet flows matched by
increasing flows on other circuits. This implies that there is relatively
little outage impact, except that some customers and site visitors may
experience suboptimal routing until the problem is rectified by UUnet.
[Jul 29, 1999, 12:55 pm] SSL Upgrade Patch
The secure server patch that restores the value of the REMOTE_HOST
variable has now been deployed to all SSL-enabled servers.
[Jul 29, 1999, 5:20 am] Power Emergency
As a result of two lines of severe thunderstorms moving through the area,
Pittsburgh is currently facing a power emergency. Around 2am Eastern time
this morning, during the second round of thunderstorms, commercial
three phase power to our facility was knocked out of service. We are
presently running on generator power alone.
Although our facility can theoretically operate on generator power
indefinitely, it is possible, although unlikely, that we will face a
further emergency such as a generator failure under load, or an inability
to refuel repeatedly during an emergency that affects hundreds of thousands
of power customers (thus increasing demand for diesel services). If such
an emergency occurs, we will be forced to follow load-shedding protocols,
taking our routers and services offline in anticipation of a complete power
loss.
We are working to avoid that relatively unlikely scenario, but would like
for our customers to be aware of the problem. Under these emergency
conditions, it is possible that our service will not be restored for
several days. We are working diligently to ensure that the generator
remains online so that no customer is affected. Further information will
be posted as it becomes available.
[Jul 27, 1999, 9:51 am] db & db3 Maintenance
The MySQL servers db and db3 were rebooted this
morning for normal maintenance. Both were returned to service
within 5 minutes.
[Jul 26, 1999, 11:03 am] SSL Upgrade Patch
The SSL servers on fearn (www40 / ssl7) and quan
(www120 / ssl13) have been upgraded again to patch a problem in
which the REMOTE_HOST variable returns a null value, instead of the
remote IP address as it had before. If no problems are reported, all
SSL servers will receive the patch this week.
[Jul 23, 1999, 9:20 am] chi Reboot
chi (www14) was rebooted this morning for brief maintenance.
Downtime was less than ten minutes, and it has returned to normal
service.
[Jul 22, 1999, 1:56 pm] rho Downtime
rho (www11) was down this afternoon for approximately 10 minutes.
It has returned to normal service.
[Jul 21, 1999, 3:56 am] Network Filtering
Due to a recent flooding attack on our network, we have been forced to
temporarily place filters on certain types of traffic. Although ping
and traceroute will not work normally for some customers as a result,
most customers and site visitors will otherwise be unaffected. We will
have this resolved entirely as soon as possible. We are currently tracing
the source of the attack.
[Jul 20, 1999, 9:04 pm] coll Downtime
coll crashed under heavy load this evening for appox. 15 minutes.
It has since returned to normal operation.
[Jul 20, 1999, 1:07 pm] UUnet Network
There was a brief interruption of our UUnet connectivity at
approximately 12:30pm Eastern time today. This was unscheduled and
appeared to originate in UUnet's facilities. The circuit has
returned to normal. We are also not seeing any further problems with
our SAVVIS circuit since the problems of the last week.
[Jul 20, 1999, 7:41 am] SSL Upgrades
The remaining 4 SSL-enabled user servers -- straif (ssl5),
fearn (ssl7), anca (ssl8), and cele
(ssl9) -- have been upgraded to our new Raven SSL configuration.
[Jul 20, 1999, 3:16 am] pi Downtime
pi (www10) was effectively down for approximately 45-60
minutes while emergency maintenance was performed on it. During
this time, users were likely to see errors while trying to access
their accounts. It has returned to normal service at this time.
[Jul 19, 1999, 2:56 pm] SAVVIS Network Performance
We are again seeing major problems with our SAVVIS circuit,
beginning with a 45-minute outage that started around 1:45pm. We are
working with their senior engineering to resolve the issue, once and for
all. At present, traffic is being passed successfully through
SAVVIS, but there is a continuing problem with packet loss.
[Jul 19, 1999, 5:59 am] beith Downtime
beith (www17) was down for approximately 20 minutes this morning.
It has since returned to normal operation.
[Jul 16, 1999, 8:11 am] unque Downtime
unque (www54) crashed this morning under heavy
load resulting from a mail loop. After the loop was
cleaned up in single-user mode, the server was returned
to full operation. Downtime was approximately 15-20
minutes.
[Jul 15, 1999, 3:36 pm] SAVVIS Resolution
From approximately 10am to 3pm Eastern time today, our SAVVIS
connectivity was effectively unusable. During this time, traffic took
alternate paths, largely through Digex, but also through
UUnet. In the case of the Digex circuit, this resulted in
some saturation which reduced network performance.
After extensive troubleshooting in conjunction with our staff,
SAVVIS determined that the problem was in their recently-expanded
trunking from Pittsburgh to New York City. Approximately one week ago,
our circuit to SAVVIS had been rehomed to a new portion of their
trunk, and it seems likely that since that time, there has been a hidden
performance problem. That problem only came to the fore when the BGP
session was reset today and refused to come back up.
After switching our circuit back to the original trunk, the problem
immediately went away. Our traffic to SAVVIS has now returned to
normal, and they will be working to repair the other trunk before putting
it back into service. We will not be placed on the other trunk again,
even after repairs are completed.
We are disappointed that the problem existed, and would like to pass on our
apology to all customers and site visitors. We were pleased with the quick
response and tenacity of SAVVIS staff in tracking down the problem,
even when all likely causes seemed to have been eliminated. We will
continue to monitor the circuit for performance problems.
[Jul 15, 1999, 10:45 am] SAVVIS Routing
We are currently seeing a major loss of routing via SAVVIS. We are
working with their Network Center to resolve the problem, which appears to
be on their end. In the meantime, traffic will continue to flow through
UUnet and Digex until the routing can be resolved.
[Jul 15, 1999, 10:32 am] ando upgrade
The hard drive upgrade for ando has been completed. Any changes
made to user files since this process began would have been lost. However,
all other normal operations should be intact. If there are any problems,
please contact support@pair.com.
[Jul 14, 1999, 7:05 pm] Correction
The previous notice of drive upgrade concerned the server ando,
not anca.
[Jul 14, 1999, 6:56 pm] anca Upgrade
The server anca has been shut down to begin a swap to a newer,
larger drive. There will be two periods of downtime for this, each lasting
10-15 minutes. Until this is complete, users should not make any web page
changes, as those will be lost. Completion time is expected to be about
two hours.
anca is currently is in the middle of its first reboot. It will
be up shortly, and we will then post when the second reboot has been
completed.
[Jul 14, 1999, 12:36 pm] SSL Upgrade
The secure servers on omicron and kodh have been upgraded
to our new Raven SSL configuration.
[Jul 14, 1999, 10:45 am] Reminder to be Careful
We'd like to remind our customers that we will never ask for your
password, under any condition. Occasionally, an impostor will send
out e-mail to one or more sites, attempting to impersonate the hosting
service and requesting the user's password, on the contrived excuse that
some database information was lost or corrupted. This is completely bogus
and cannot ever possibly be true. Please forward any such e-mail that you
might receive to abuse@pair.com immediately upon receipt, and do not reply
to the sender.
[Jul 12, 1999, 11:04 am] SSL Upgrade
The secure servers on theta and pi have been upgraded
to our new Raven SSL configuration.
[Jul 11, 1999, 2:44 pm] anca Downtime
anca was down this afternoon for approximately 10 minutes. It has
since returned to normal service.
[Jul 10, 1999, 3:52 pm] edad Downtime
edad was down this afternoon for approximately 15 minutes. It has
since returned to normal service.
[Jul 8, 1999, 10:22 pm] kappa Downtime
kappa was down this evening for approximately 10 minutes. It has
since returned to normal service.
[Jul 7, 1999, 1:58 pm] SSL Upgrade
The SSL Servers on flaid and enda have been upgraded
to use the Raven SSL module, built with Apache 1.3.6.
[Jul 6, 1999, 3:34 pm] omega Upgrade
The SSL Server on omega has been upgraded to use the Raven SSL module,
built with Apache 1.3.6. This upgrade brings the secure server setup in line
with our regular servers, in terms of features. Over the next week, all SSL
servers will receive this upgrade.
[Jul 5, 1999, 2:36 am] pyyl Downtime
pyyl was down this evening for approximately 10 minutes due
to a large mail loop.
[Jun 30, 1999, 11:48 am] pine Upgrade
The upgrade of pine to version 4.10 has now been completed on
all user servers.
[Jun 29, 1999, 8:35 pm] lissa Downtime
lissa was down this evening for approximately 10 minutes. It is
currently up and in service.
[Jun 29, 1999, 9:42 am] lissa Maintenance
After three unexplained crashes in the past twenty-four hours, we are
suspecting hardware problems on lissa. It will be taken down
again, for approximately ten minutes, to swap into another system.
[Jun 29, 1999, 6:11 am] lissa Downtime
lissa was down this evening for approximately 10 minutes. It
has returned to normal operation.
[Jun 24, 1999, 10:56 am] pine Upgrade
The pine mailreader has been upgraded to version 4.10 on onn.
Any customer on onn who experiences unexpected behavior from pine
or its associated programs pico and pilot is asked to report them
to support@pair.com. If no problems are
uncovered, the new version will be deployed to all user servers in the near future.
[Jun 22, 1999, 4:28 pm] lissa Downtime
lissa was down this afternoon for approximately 10 minutes. It
has returned to normal operation.
[Jun 20, 1999, 1:51 am] Telnet and POP3 Access Problems
Startnig at various times in the evening of Saturday, June 20th, an
intrusive scan was directed at several user servers by a third party.
Although no direct damage was done, an unfortunate interaction between
the scanning technique and a security tool used internally, caused all
incoming connections to Telnet and POP3 to be blocked. In some cases,
the interruption was as long as eight hours. However, other services,
including FTP, WWW, mail delivery, SSL, SSH, and FrontPage, were completely
unaffected.
We have identified and blocked the scanner. We are currently enhancing
our monitoring to catch this unusual case (in which everything looks normal
internally, but fails to respond to the outside world), and modifying our
security tool to avoid this type of interaction entirely in the future.
We apologize for the inconvenience and the delay in identifying and
correcting the problem for the affected servers.
[Jun 14, 1999, 12:55 pm] db3 Downtime
db3 experienced downtime this morning after it's MySQL process "ran away"
and caused the server to hang up. It has been back online for
approximately an hour, and appears to once again be stable.
[Jun 12, 1999, 9:36 am] ceirt Incident
A particularly vicious mail loop on ceirt caused reduced
performance for approximately forty-five minutes. This has now been
cleaned up, and normal service restored.
[Jun 12, 1999, 3:47 am] coll downtime
coll crashed due under heavy load. It was brought back online
with downtime of approximately 15 minutes.
[Jun 11, 1999, 9:35 am] coll downtime
coll crashed due under heavy load. It was brought back online
with downtime of approximately 15 minutes.
[Jun 10, 1999, 11:28 am] sether downtime
sether crashed due to high load. Downtime was approximately
ten minutes.
[Jun 10, 1999, 10:09 am] relay Maintenance
Because of an apparent Spam sent by a customer through our customer-only
relay system, the relay server is currently overwhelmed and also under
attack. Consequently, our relay service will be down briefly to perform
necessary maintenance.
We are also accelerating plans to convert the relay service to a
POP-before-SMTP authenticated service; more details will be posted on this
as soon as possible.
[Jun 9, 1999, 1:44 am] gamma downtime
gamma crashed under heavy load, and was brought back online.
Total downtime was approximately 15 minutes.
[Jun 8, 1999, 7:03 pm] UUnet outage resolved
The problem we experienced this afternoon with UUnet have been
rectified. We have been told by UUnet that they are still looking
into the cause of this, and will give us a more detailed report in the
next few days.
In the meantime the circuit is again performing as normal. Until such
time as we receive their report we will be keeping a closer eye on this
circuit for any possible recurrences.
[Jun 8, 1999, 5:22 pm] UUnet outage
UUnet is reporting the outage of a router near our location, and
is currently working on the problem. Customers may experience slowdowns while
this occurs, but our other circuits should be able to deliver all traffic.
We will post more information here as we receive it.
[Jun 6, 1999, 10:20 pm] db.pair.com Update
We have located the error in our MySQL monitoring system, and believe it
to be corrected at this time. We apologize for the inconvenience, and
will be testing both MySQL and our monitoring software extensively over
the next few days.
[Jun 6, 1999, 9:35 pm] db.pair.com Downtime
db.pair.com was down for about two hours this afternoon. The
problem seems to have been a hung mysql daemon that would respond just
enough to satisfy our monitoring programs, but not allow anyone to connect.
The server seems to now be functioning, but we're taking a closer look into
exactly what caused this problem.
[Jun 5, 1999, 6:12 pm] kodh Maintenance
kodh will be going down shortly for a hardware upgrade. Downtime
is expected to be less than 10 minutes.
[Jun 4, 1999, 12:06 pm] Continued Network Upgrades
The servers listed below will all be
shutdown briefly late this evening in order to upgrade their network cards.
Downtime for each server is expected to be no more than 10 minutes.
The servers to be upgraded tonight are:
oore, halla, beeoro, tama, dyyme, lissa
[Jun 4, 1999, 4:00 am] Erroneous Overusage Reports
On Thursday, June 3rd, overusage reports were sent out containing completely
incorrect information. Instead of covering the month of May, the charges
reflected usage in the first two days of June only, and this brief period
caused a calculation error to come into play, as well.
Customers are advised to please disregard any overusage reports received
that refer to usage in June. Fully corrected reports which apply only to
May will be sent out today.
[Jun 3, 1999, 7:45 am] Database Server Upgrades
db and db2 will be taken down briefly this
morning for hardware upgrades. Downtime should be less than 10 minutes in
each case.
[Jun 1, 1999, 12:22 pm] UUnet Outage
Our UUnet circuit was out of service for approximately one minute
this morning, causing a routing flap and effective brownout of our service
for some customers. We have not yet received notification from
UUnet regarding the cause or nature of this outage. We are now
seeing traffic return to normal.
[May 27, 1999, 8:13 am] UUnet Update
The official explanation of yesterday afternoon's misrouting of
209.68.53.0/24 is that it was inadvertently injected into
UUnet's routing mesh by an engineer in Atlanta. Once contacted,
UUnet promptly corrected the problem, and would like to apologize
for the effect on our network and customers. This was not caused by a
UUnet customer, but by a configuration error by a UUnet
engineer. It was an honest mistake, and we were glad we were able to
minimize the impact.
[May 27, 1999, 8:10 am] Network Updates
During a scheduled UUnet maintenance window this morning, our
UUnet connectivity was up and down for approximately an hour.
During this time, we also took the opportunity to upgrade the capacity
of several of our routers, in a fashion that did not affect customer
traffic. There was no significant effect on customer traffic from these
events, except in cases where UUnet accepted traffic for our network
but was unable to deliver through the downed circuit. This is a routing
issue with UUnet's network which we are unfortunately unable to
control.
[May 26, 1999, 9:31 pm] xi downtime
xi spontaneously rebooted this evening. Total downtime was
approximately 10 minutes.
[May 26, 1999, 4:55 pm] straif downtime
straif spontaneously rebooted this afteroon. Total downtime was
approximately 15 minutes.
[May 26, 1999, 4:38 pm] Digex Network
We have observed extremely poor performance through Digex in the
D.C. area, including through MAE-East. In order to improve performance
overall for customers and site visitors, we will be performing a new
routing evaluation this weekend, in order to select optimal global routes
for use with our new router configuration. We expect significant
performance improvements for affected customers by early next week.
[May 26, 1999, 4:34 pm] theta downtime
theta crashed due to heavy load, and was brought back online.
Downtime was around 15 minutes.
[May 26, 1999, 2:13 pm] Network Resolution
Once notified of the problem, UUnet acted swiftly to remove the
offending and incorrect announcement from their network, which has restored
service to normal for customers in the 209.68.53.0/24 block.
We are awaiting further details from their NOC regarding the incident, and
will post a wrap-up here when those details become available.
This was not an attack, or any error in our network. This problem was
caused by another UUnet customer apparently making a configuration
mistake that affected us because of UUnet's own insufficient
configuration to protect against the possibility.
[May 26, 1999, 2:00 pm] Network Update
We have narrowed down the problem to the 209.68.53.0/24 block.
None of our other addresses appear to be affected. The UUnet
customer causing this problem is located in Atlanta. The fault also
lies largely with UUnet for not correctly filtering the
announcements of their customer. We are working with their NOC to have
the problem rectified immediately. Unfortunately, because the problem lies
in UUnet's routing mesh, there is little we can do to affect the
situation. We will attempt to announce a more specific route ourselves,
but UUnet does have filters in place on our
announcements.
[May 26, 1999, 1:56 pm] Network Event
At this time, a UUnet customer is causing massive routing problems
in UUnet's network which is affecting our traffic. We are working
to determine if the problem is specific to our routes (ie, they are
announcing more specific routes for our netblocks), or more general. In
either case, we are working with UUnet's NOC to have this corrected
immediately.
[May 25, 1999, 12:24 pm] ungwe Downtime
ungwe crashed today under heavy load. It was brought back online with
less than 15 minutes downtime.
[May 23, 1999, 6:32 pm] gort Downtime
gort crashed today under load. Downtime was less than 5 minutes.
[May 20, 1999, 5:29 pm] beith Downtime
beith crashed today under load, and was brought back online after a
filesystem cleaning. Downtime was less then 5 minutes.
[May 20, 1999, 10:53 am] ampa Downtime
ampa crashed today under load, and was brought back online after a
filesystem cleaning. Downtime was less then 15 minutes.
[May 18, 1999, 1:50 am] omicron Downtime
omicron crashed today under heavy load. Total downtime was
less then 30 minutes.
[May 14, 1999, 9:22 am] db Downtime
db crashed today under heavy load. Total downtime was
less then 30 minutes.
[May 13, 1999, 1:55 pm] gamma Downtime
gamma crashed today under heavy load. Total downtime was
approximately 10 minutes.
[May 12, 1999, 8:50 pm] Network Upgrades
Tonight's upgrades have been completed on the following servers:
enda, gnaaste, scire, dhaame, wredhor, flure, gwind, and
cele. The remaining servers scheduled for tonight will be
upgraded between now and 9 AM Thursday morning.
[May 12, 1999, 4:52 pm] ngatwo Downtime
ngatwo is temporarily out of service while we replace a
failed power supply. We believe this supply may have failed due to
slightly elevated temperatures caused by the temporary failure of one
of our cooling units. This would only occur if the power supply in
question was already marginal. We expect to have ngatwo back
online within fifteen minutes.
[May 12, 1999, 10:34 am] ShopSite Upgrades Update
We've temporarily suspended upgrades of ShopSite Manager and Pro upgrades
to the new version, while we resolve a problem one customer has reported.
After the problem is solved and we're sure it will not affect other
customers, we'll again proceed with the upgrades. All upgrades should
still be completed by the end of May.
[May 12, 1999, 10:29 am] Continued Network Upgrades
The servers listed below will all be
shutdown briefly late this evening in order to upgrade their network cards.
Downtime for each server is expected to be no more than 10 minutes.
The servers to be upgraded tonight are:
enda, gnaaste, scire, dhaame, wredhor, flure, gwind, cele, silme, aaze,
lambe, yanta, hyarmen
[May 11, 1999, 5:12 pm] UUnet Upgrade
We are in the processing of upgrading our primary UUnet circuit. At
this time, we are seeing unusually high ping times to UUnet's
network from certain portions of our network. We do not believe this is
affecting throughput or performance, but we are working to identify and
resolve the cause within the next 24 hours.
[May 12, 1999, 10:30 am] Network Upgrades
Continuing with our network upgrades, the servers listed below will all be
shutdown briefly late this evening in order to upgrade their network cards.
Downtime for each server is expected to be no more than 10 minutes.
More servers will be upgraded over the course of the next 2 weeks, and will
be announced here ahead of time.
The servers to be upgraded tonight are:
ilwe, ulwar, ydhu, aedde, eeoth, auma, mildh, zais, shen, db2, and db3
[May 11, 1999, 5:06 am] Network Upgrades
The second phase of our major network upgrade project has been completed
this morning. There was a very brief interruption of UUnet
connectivity around 4:15am Eastern time. There will be another brief
interruption within the next 48 hours, as our circuit is upgraded in
coordination with UUnet's NOC. We are also planning a circuit
upgrade for our SAVVIS connectivity.
Each of these upgrades should lead to improved performance for some
customers, as well as improved capacity for future growth in our network
overall.
[May 10, 1999, 4:27 pm] theta Downtime
theta crashed under heavy load, and was brought back online.
Downtime was approximately 10 minutes.
[May 10, 1999, 1:55 pm] gamma Downtime
gamma crashed under heavy load, and was brought back online.
Downtime was approximately 10 minutes.
[May 10, 1999, 8:55 am] lissa Downtime
lissa crashed under load, and was brought back online after
a filesystem check. Downtime was approximately 10 minutes.
[May 9, 1999, 6:04 pm] upsilon Downtime
upsilon crashed under heavy load, and has been brought back online.
Downtime was approximately 10 minutes.
[May 8, 1999, 10:06 pm] beeoro Downtime
beeoro crashed under heavy load, and has been brought back online.
Downtime was approximately 10 minutes.
[May 7, 1999, 6:31 pm] Server Performance Issues
We are aware of recent performance degradations for several servers,
including omicron, omega, and anca. In each
case, we have identified customers who are reaching levels of CGI
activity and overall traffic that make them ideal candidates for upgrades
to High-Volume or QuickServe, and we are currently making
these arrangements. We expect, therefore, to have these problems resolved
within the next week.
[May 7, 1999, 11:03 am] Commerce SS / ShopSite Upgrades
Over the course of the next week, we will be upgrading the ShopSite Manager
and Pro stores of each Commerce SS account holder to version 4.1. This is
a free upgrade, and important as the new version is the only ShopSite that
is Y2K certified by OpenMarket.
[May 1, 1999, 11:13 am] UUnet Update
As of this morning, we are no longer seeing any significant performance
problems with our UUnet traffic. Pings have returned to the 8ms
range. The unofficial word is that, among many fiber cuts which took
place yesterday, a train derailment caused UUnet to lose much
of their connectivity between DC and Pittsburgh, specifically. As traffic
levels lowered overnight, the problem improved, but the delay in actually
performing the necessary repairs was caused by hazardous conditions at
the site.
[Apr 30, 1999, 5:48 pm] UUnet Update
We are seeing a gradual improvement in UUnet performance, with ping
times dropping from 600ms to 250ms, and packet loss from 40% to 20%.
However, we still do not consider the problems to be resolved, and we still
have not received any official explanation of the problem.
We will post more information as it becomes available.
[Apr 30, 1999, 2:50 pm] UUnet Problems
We are currently seeing extreme packet loss within UUnet's network,
which has caused a dramatic drop in customer traffic. We are attempting to
open a ticket with UUnet to have the problem corrected; we will
drop the circuit in order to use alternate paths if the problems persist.
[Apr 22, 1999, 1:11 pm] Utility Power Restored
At this time, utility power has been restored to our location. While the
power was down our backup generator and UPS array kept all services
operational.
[Apr 22, 1999, 12:06 pm] Utility Power Incident
We are currently experiencing a utility power outage. As usual, our backup
generator kicked in immediately, and no downtime has or will occur.
[Apr 20, 1999, 1:39 pm] theta Downtime
theta crashed under heavy load, and has been brought back online.
Downtime was approximately 5 minutes.
[Apr 20, 1999, 9:49 am] pi Error
An error in a disk balancing process caused the web directories for
approximately 20 customers on pi to be unavailable for varied periods of
time overnight. This has been corrected, and should not recur.
[Apr 19, 1999, 8:11 am] zirx Glitch
Due to a permissions glitch, web and mail service on zirx were
interrupted for around 10 minutes this morning. The problem was quickly
corrected, and the server was returned to full operating status.
[Apr 19, 1999, 7:41 am] Further Maintenance
ruis will be taken down briefly this morning for a hardware
swap. Downtime is expected to be less than 15 minutes.
[Apr 16, 1999, 9:01 am] Server Maintenance
huath was taken down briefly for a physical relocation, and was
brought back online within less than 10 minutes. anca was also
taken down briefly, so that an additional hard drive could be installed.
[Apr 15, 1999, 4:50 pm] SAVVIS Network Incident
SAVVIS has advised us that a major fiber cut in New York City
is adversely affecting their network, causing greatly increased packet
loss due to saturation of alternate links they are presently using.
The cut apparently took place around 3:15pm Eastern time. Some of the
affected traffic has been rerouted for delivery via Digex, but
some customers may continue to experienced increased packet loss and
latency via SAVVIS until the cut has been repaired.
We will post more information as it becomes available.
[Apr 15, 1999, 8:39 am] huath Upgrade
huath has been upgraded to a Pentium III 500 MHz with 256 MB
of RAM. Due to some complications during the upgrade, downtime was
around 30 minutes. huath will be taken down again for
approximately 5-10 minutes at a later date for a physical relocation.
[Apr 14, 1999, 1:02 pm] gamma Downtime
gamma crashed under heavy load, and is being brought back online.
Downtime should be under 20 minutes.
[Apr 11, 1999, 5:13 pm] xi Downtime
xi crashed under heavy load, and was brought back online.
Downtime was less then 10 minutes.
[Apr 7, 1999, 10:38 am] Procmail Upgrade
The new version of procmail deployed on the six servers mentioned yesterday
has been upgraded again from version 3.13 to 3.13.1, following its release.
Four more servers, emancholl, eite, ceirt, and fearn have
also been upgraded. If no problems are found, version 3.13.1 will be
deployed to all user servers this week.
[Apr 6, 1999, 4:38 pm] Procmail Upgrade
Procmail has been upgraded to 3.13 on servers 41-46 (tinco, parma,
calma, quesse, ando, and umbar). If no problems are found,
it will be distributed everywhere this week.
[Apr 6, 1999, 2:40 pm] UUnet Maintenance
UUnet has advised us that they will be performing router upgrades
in the Pittsburgh POP during their maintenance window on the morning of
Wednesday, April 7th. This normally takes place around 4am Eastern time,
and should not significantly disrupt customer traffic.
At present, we are still aware of a few isolated cases in which
UUnet is not properly delivering traffic that they are advertising
routes for. This is the same problem that affected Sprynet and
Concentric for several days. We are awaiting a diagnosis and
resolution of this problem from UUnet.
[Apr 6, 1999, 8:10 am] db Reboot
db was rebooted in order to install a new kernel which should
help guard against the problem reported earlier. Downtime was less than
5 minutes.
[Apr 6, 1999, 5:12 am] db.pair.com downtime
db.pair.com was down briefly this morning, and has been brought
back to normal operations.
[Apr 5, 1999, 4:35 pm] khurla
Khurla logs have not been generated correctly recently. They are
presently being fixed, and older logs regenerated.
[Apr 5, 1999, 12:02 pm] Digex Update
Problems with our Digex circuit have been resolved, and service
has returned to normal.
[Apr 5, 1999, 10:43 am] UUnet Update
We believe that as of 10:45am Eastern time today, there should be no
further problem reaching our network from Compuserve,
Sprynet, or any other network that was affected over the
weekend. If you have trouble reaching our network after reading this
message, please send a new traceroute to "urgent@pair.com"
as soon as possible. If there is any problem remaining, we need to
document it for UUnet to debug.
We do not have any confirmed information on the cause of this routing
problem. We believe it may be related to attack filtering, and we are
evaluating alternate methods of protection against future attacks.
[Apr 5, 1999, 10:13 am] UUnet Update
We are continuing to receive reports of trouble reaching our network
from Compuserve-related networks. As a further temporary
workaround, we have statically routed as many such networks as we have
been able to identify, to Digex. We are working with UUnet
to attempt to better identify and rectify the problem.
We apologize for the inconvenience; this is an extremely frustrating
situation for us as well.
[Apr 4, 1999, 10:26 pm] UUnet Update
We have identified two significant networks with which UUnet has had
trouble providing connectivity to our network. Sprynet, now a
division of Mindspring, was a problem for most of the weekend,
but this appears to have cleared up. Concentric continues to be
unreachable via UUnet from our network, although UUnet is
still advertising those routes to us. We have worked around this problem
by statically routing via Digex to reach Concentric. This
is a temporary solution, until UUnet can diagnose and correct their
routing problem, which will hopefully happen on Monday.
Please note that over the weekend, filters were also put in place which
prevent many ping and traceroutes from completing correctly. These filters
are necessitated by unrelated network flooding attacks that we have again
been experiencing. This means that in some cases, you may be able to reach
our services (Web, mail, news, etc), even when a traceroute or ping indicates
that you cannot. Please verify any failed traceroute or ping by trying to
actually connect to one or more of our Web sites. If you still have
trouble, please send the failed traceroute to "urgent@pair.com".
We apologize for the inconvenience caused by these routing problems; we
are unhappy with UUnet's handling of the situation and will pursue
that matter with their management. Updates will be provided here as they
become available.
[Apr 3, 1999, 6:08 pm] UUnet routing
We have confirmed that there are problems in UUnet's local routing
mesh, and that their network engineers are working now to correct it. This
affects a small number of networks and ISPs whose route to us is
unavailable. However, the problem seems to be not very widespread, and
only affecting a handfull of ISPs at this time. We will post more details
as they become available.
Anyone who is unable to reach us via web or
telnet is encouraged to send traceroute results to urgent@pair.com demonstrating their
problems. While we are unable to correct this ourselves, this information
may be usefull in our contacts with UUnet in resolving this issue.
[Apr 2, 1999, 7:33 pm] psi Downtime
psi's network card died and had to be replaced. It was shut down
cleanly and brought back online with no further problems. Downtime was
less than 10 minutes.
[Apr 2, 1999, 3:27 pm] InterNIC Maintenance
We have been advised by Network Solutions, on very short notice, that
in order to update their systems in anticipation of the upcoming
ICANN registrar tests, there may be intermittent disruptions of their
service, beginning at 10pm Eastern time tonight (April 2nd),
and continuing until some time on April 3rd. According to their
notification, no submissions should be lost, but delays can be
expected.
[Apr 2, 1999, 2:40 pm] UUnet Routing
We have become aware of routing problems in portions of UUnet's
network, most likely related to unknown incidents behind three major
drops in traffic levels from UUnet overnight. We have received
no report of any maintenance or failure incidents from UUnet,
and at present traffic levels are normal, although we still have a
few lingering, unconfirmed reports of difficulty reaching our network
through UUnet.
This situation marks the third time we've had this type of problem with
UUnet, in which portions of their network accept traffic destined
for us, but appear to forget how to deliver it to us. This is not a
problem that originates in our network, nor can it be solved within our
network. It appears to be a problem in UUnet's routing mesh,
and we have continually encouraged them to identify and correct this
type of problem. We will post any further information, if it becomes
available.
[Apr 1, 1999, 11:38 pm] rho Restoration Completed
We have finished restoring all files on rho. The restoration
appears to have gone smoothly, with all services functioning normally.
Nevertheless, any user on rho seeing significant problems with
their account should write to urgent@pair.com so that we may
investigate.
[Apr 1, 1999, 8:16 pm] rho update - correction
It seems the damage is less that we had thought. All user web directories
are intact. Only user home directories and public_ftp sites were affected.
Nevertheless, we should have everything back to normal shortly.
[Apr 1, 1999, 7:51 pm] rho update
rho is back up again, after some hardware swapping. Any files
that were previously located under the /u2 partiton will be
inaccessable until we have completed our backup recovery of the drive.
This includes the web files of some users on this server. We expect this
process to be completed later this evening.
[Apr 1, 1999, 7:20 pm] rho
rho crashed under heavy load. Unfortunatly, this was just enough
to cause the secondary drive to finally fail. We are currently recovering
rho from our most recent backup tapes onto a new harddrive, so
access will be down until this is completed. We will post updates on the
progress as necessary.
[Apr 1, 1999, 10:46 am] rho and noldo
We have delayed action on rho so that the swap can proceed at
a more appropriate time. We currently believe that the errors are not
critical, but the drive will be replaced at the earliest opportunity
during off-hours.
noldo was taken off-line briefly for a physical relocation,
which was necessitated by its own recent drive swap.
[Apr 1, 1999, 10:10 am] rho Maintenance
We have detected serious faults on rho's second hard drive.
rho will be taken down shortly for an emergency upgrade and
drive swap. Total downtime should be less than fifteen minutes, and
the drive swap will be completed by the end of the day.
[Apr 1, 1999, 7:30 am] eite Downtime
eite crashed under heavy load, and required a manual filesystem
cleaning before being brought back online. Downtime was approximately
30 minutes.
[Apr 1, 1999, 3:18 am] Network Update
An emergency network upgrade has been completed and the attack has been
blocked from our network. Performance has returned to normal for all
servers, with a total interruption of less than ten minutes overall,
and approximately thirty minutes of disruption for the affected server.
Coincidentally, this upgrade also allowed us to eliminate our plans for
a network maintenance window in the near future, and we will now be
proceeding quickly with the remainder of our network upgrades.
For more information, please visit the
Network Upgrade
page.
[Apr 1, 1999, 2:18 am] Network Attack
We are currently enduring a network attack which is degrading performance
to one particular server. We will be making a few emergency changes to
alleviate the attack. More details will be posted as they become
available.
[Mar 31, 1999, 5:18 pm] analog Upgrade
The newest version of Analog is now available on all servers as
analog311. We invite current users of analog3 to use
analog311 during this testing period. If any problems are found,
please report them to support@pair.com. Once we are
satisfied, we will be replacing the current analog3 with the
newest version.
We also suggest that current users of analog and analog2
consider migrating over to the current analog3. Older versions of
this software are no longer supported, and we will be phasing out their
use over time for the most recent supported version of this software.
Analog3 offers signifigantly improved speed and reliability over
previous versions, as well as numerous bug fixes.
[Mar 31, 1999, 9:06 am] analog Upgrade
During testing of updated versions of analog2 and
analog3, a 2.9 beta version was inadvertently deployed to all
user servers as a replacement for the analog2 binary. As the
2.9 beta is really an early version of analog3, this caused most
uses of analog2 overnight to fail.
This was not an intentional upgrade; it was scheduled only for test
deployment on one or two servers. The reasons for updating these binaries
are:
- to remove links to webtechs.com, a domain which lapsed from
registration and has subsequently been re-registered with pornographic
content
- to upgrade analog3 to a more current version
We apologize for the inconvenience this has caused. At the present time,
the original analog2 binary has been restored.
[Mar 29, 1999, 10:15 pm] calma Downtime
calma was down this evening due to a runaway CGI script. Total
downtime as approximately 10 minutes.
[Mar 28, 1999, 1:50 am] mu Downtime
mu crashed under heavy load at approximately 11pm. Downtime was less
than 15 minutes.
[Mar 27, 1999, 11:06 am] gao SSL Update
gao, which is operating as ssl12, has recently had
problems supporting directory indices and CGI for some users. In the
process of updating the server configuration, we have corrected this
problem for all users. We will be making other configuration updates,
but this particular problem should not recur.
[Mar 27, 1999, 7:43 am] uilen Downtime
uilen suffered with a very slow mailing loop overnight, which
peaked around 7am this morning and crashed the server. After a brief
cleanup of the offending loop, uilen is back in normal service.
[Mar 25, 1999, 11:03 am] edad Downtime
edad crashed under load due to a mailing loop caused by a user who
had removed bounce protection from his procmail files. The server has been
brought back online and the procmail recipe in question shut down until
such time as the user can return bounce protection to it. Downtime was
approximately 15 minutes.
[Mar 25, 1999, 10:57 am] theta Downtime
theta crashed under heavy load, and has been brought back online.
Downtime was approximately 15 minutes.
[Mar 25, 1999, 9:07 am] noldo Update
To expand upon the last update, any user on noldo who has
CGI-created files owned by nobody should double-check them at their
earliest convenience. If you encounter any files that need ownership
changed, please let us know at support@pair.com. Symbolic links should
also be checked, as some of them may have been lost.
[Mar 25, 1999, 6:16 am] noldo Update
We believe that noldo has been restored nearly to its original
state. Mailbox contents have been merged, and no mail, old or new, was
lost. A few users may still encounter permissions problems, and it is
also possible that a few files are missing; there was some minor corruption
in the backup. However, the backup was very current, so most users will
not see any problems.
We apologize for the inconvenience and extended downtime during the tape
restore. We are investigating ways of supplementing our tape backup
schedule, but have no firm plans to announce at this time.
[Mar 24, 1999, 11:49 pm] noldo Update
Restoration on noldo has been mostly completed. All user files
have been restored from the most recent backups, except for the account's
mailboxes, and some users' web directories which are still in the process
of being restored. There was also a problem with file permissions which we
are now fixing. Any users who see any problems with their restored
accounts should contact us at urgent@pair.com.
[Mar 24, 1999, 6:27 pm] noldo Update
Mail spools on noldo were also lost during the drive crash.
These will be restored from backup after the home and Web directories
are restored. At the present time, new mail is being delivered to
user's previously empty mailboxes. When we restore the old mail, we
will append this new mail to the restored mailboxes. Please do not
be alarmed; the missing messages will be restored.
[Mar 24, 1999, 5:57 pm] noldo Update
noldo is now online, but Web content and most users' home
directories are still being restored from tape. The content will
gradually reappear on the drive; please be patient with this procedure.
[Mar 24, 1999, 5:32 pm] noldo Drive Failure
During the copying procedure on noldo, the original drive suffered
a complete failure. We are currently working to restore the uncopied data
from tape; the backup being used is less than twelve hours old. We expect
to have noldo back online within the next two to three hours. We apologize
for the inconvenience.
[Mar 24, 1999, 4:22 pm] noldo driveswap
noldo's drive has failed and we are currently performing an emergency
drive swap. We estimate total downtime for this to be less then a half hour.
[Mar 24, 1999, 12:09 pm] noldo downtime
noldo crashed under heavy load, and was brought back up online
with no visible lingering effects. Downtime was under 15 minutes.
[Mar 24, 1999, 5:53 am] UUNet maintenance
UUnet has completed their scheduled maintenance, in addition to
having corrected the problem that arose Tuesday evening. At this time,
we expect our UUnet circuit to remain stable.
[Mar 23, 1999, 7:28 pm] UUNet problems resolved
Routing through UUnet has returned to normal, and with it so has
routing through our Savvis line. Barring tomorrow's scheduled
maintenance we don't forsee any more problems with this.
[Mar 24, 1999, 10:42 am] idad Downtime
idad crashed under heavy load and was brought back online with no
problems. Downtime was approximately 15 minutes.
[Mar 23, 1999, 5:57 pm] UUNet Network Problems
UUnet has just notified us of a sudden outage which is resulting in
a network slowdown. Customers going through them will experience some
delays, as will some customers going through Savvis, since they will
be carrying the bulk of the re-routed traffic until this is resolved.
We will post any new information we receive on this.
[Mar 23, 1999, 1:24 pm] UUNet Scheduled Maintenance
UUnet has scheduled downtime for their maintenance window on the
morning of Wednesday, March 24th. Our connectivity to UUnet may
be offline between 4am and 7am Eastern time as a result. This should not
significantly affect customer traffic overall.
[Mar 21, 1999, 1:47 pm] theta Downtime
theta crashed at 11:30am and was down for approximately 30 minutes
due to drive problems. This system has been returned to normal service.
[Mar 18, 1999, 3:25 pm] anca Downtime
anca was down briefly this afternoon. Total downtime was
approximately 10 minutes.
[Mar 17, 1999, 10:38 am] anca Downtime
anca crashed and required a manual filesystem cleaning. After a
thorough check in single-user mode, it has been returned to service.
Downtime was approximately 20 minutes.
[Mar 16, 1999, 4:26 pm] Network Degradation
During peak hours of network activity today, a flooding attack was
initiated against several servers from a compromised server at a
competitor-owned site. We have worked with our upstreams and the
compromised site to trace and eliminate the source of this attack; however,
at several points during the day, there were five to ten minute periods of
severe network performance degradation as a result of the attack.
We are also working with our switch vendors to ensure that this type of
attack cannot be as effective in the future; what should have affected only
one server instead reduced performance to many servers on our network.
[Mar 16, 1999, 10:16 am] MySQL Upgrade
MySQL has been upgraded to version 3.22.19 on the db
and db2 servers. db3 has been running this version since
being brought online.
[Mar 16, 1999, 7:23 am] iota Downtime
iota crashed under a heavy mail loop and required a manual
filesystem cleaning before being brought back online. Downtime was less
than 20 minutes.
[Mar 11, 1999, 2:34 pm] UUnet Scheduled Maintenance
UUnet has advised us that during their maintenance window on the
morning of Tuesday, March 16th, they will be upgrading routing equipment
in their Pittsburgh POP. For a brief period starting at 3:00am Eastern
time, we can expect to lose UUnet connectivity. This should not
significantly affect customer traffic.
[Mar 11, 1999, 9:14 am] or Downtime
or crashed under the strain of a mailing loop and was brought
back online quickly with no ill effects. Downtime was less than 10
minutes.
[Mar 11, 1999, 4:30 am] omicron Downtime
omicron crashed early this morning and showed signs of hardware
problems after an extensive filesystem cleaning. The RAM, motherboard, and
Ethernet card were all swapped out, and the server is now back online and
being closely monitored. Downtime was approximately 150 minutes.
[Mar 10, 1999, 1:38 pm] pi downtime
pi crashed and required an extensive manual filesystem cleaning
before being brought back online. Downtime was less than 15 minutes.
[Mar 8, 1999, 4:35 pm] anca downtime
anca crashed under heavy load. After a clean reboot, it returned to
service with less than 15 minutes downtime.
[Mar 8, 1999, 7:48 am] ceirt Files
We have two reports from users on ceirt of files edited or
uploaded Friday night not appearing on the restored server. We are
currently investigating this, but wish to advise all users on
ceirt who may have edited their site Friday night to double
check the edited files.
[Mar 8, 1999, 7:11 am] ceirt Telnet Glitch
We have identified and corrected a problem that may have prevented some
users of ceirt to access their account via telnet, rlogin, or ssh
for the past few hours.
[Mar 7, 1999, 11:45 pm] ceirt Restoration Complete
The new ceirt is now up and running. Backups were restored, and
include all files up until Saturday morning. If any users discover any
previous files missing, corrupted, or otherwise not as they should be,
please contact us at urgent@pair.com so that we can take care of
whatever the problem might be.
[Mar 7, 1999, 3:42 pm] ceirt Update
Restoration of ceirt is partially complete, and we expect the new drive to
be fully operational in about 2 hours. We will post here once the new
ceirt is ready.
[Mar 7, 1999, 8:36 am] ceirt Downtime
ceirt's hard drive failed at approximately 7AM this morning. We
are working right now to construct a new server from the most recent
backups, and will post here when we have a progress report.
[Mar 4, 1999, 7:28 pm] fearn Upgrade
fearn will be going down at 7:45PM to add more diskspace. Total
downtime is expected to be approximately 5 minutes.
[Mar 3, 1999, 7:16 pm] pi downtime
pi was down for about 20 minutes this evening due to file system
inconsistencies. They've since been cleaned up and the server has returned
to normal operation.
[Mar 2, 1999, 6:42 pm] reit downtime
reit will be having a physical relocation at 7:00 PM Eastern time.
Downtime is expected to be less than five minutes. We apologize for the
short notice in this matter.
[Mar 2, 1999, 11:37 am] PHP Upgrade
The upgrade to PHP has been deployed to all remaining servers, as
has the XML::Parser Perl module.
[Mar 1, 1999, 7:39 pm] db2.pair.com Resets
The mysql daemon on db2.pair.com was down for about 20 minutes.
It has been brought back online and has returned to normal.
[Mar 1, 1999, 3:20 pm] PHP 3.0.6 Deployment
PHP has been upgraded to version 3.0.6 on servers gamma
through nuumen (#2 through #55). The remaining user servers will be
upgraded tomorrow morning, barring any problems.
[Feb 28, 1999, 3:39 pm] relay Downtime
Due to unexpected problems, relay.pair.com ceased relaying mail
sometime yesterday. Mail sent during this time through our
relay.pair.com mail server may have not been delivered. The
server has since been brought back up to operational status, and is once
again relaying mail normally. We are currently investigating the cause of
this outage to prevent future similar incidences.
[Feb 25, 1999, 8:53 pm] idad Downtime
idad was down for approximately 10 minutes this evening. It has
returned to normal service.
[Feb 25, 1999, 1:42 pm] Digex Planned Maintenance
Digex/Intermedia has advised us that between 4am and 8am on Monday,
March 1st, upgrades to their Pittsburgh POP may interrupt their DS-3 service
for our network. We expect this will have minimal customer impact, as
traffic will flow through alternate circuits.
[Feb 24, 1999, 11:37 am] PHP 3.0.6 Deployment
Two more servers, chi and beith, have been added to the
test of PHP 3.0.6. Barring any problems, this upgrade is scheduled for
deployment to all servers Monday morning, so that we are better staffed to
handle any complications.
[Feb 24, 1999, 9:54 am] alpha Relocation
alpha was down briefly this morning for a physical relocation.
It is now restored to full working order.
[Feb 23, 1999, 2:45 pm] PHP 3.0.6 Deployment
PHP 3.0.6 has been deployed to 2 additional servers, auma and
onn, for testing. If tests continue on their present course it
will be deployed to all user servers this week.
[Feb 23, 1999, 12:58 pm] Upgrades Notice
Due to the hard drive failure on alpha, an indeterminate number
of upgrades may have been lost. If you submitted an upgrade between
approximately 8:00 PM Eastern, Sunday, Feb. 21, and 10:00 PM Eastern,
Monday, Feb. 22, you may need to resubmit it. If resubmitting, please
clearly state in the comments section that this is a DUPLICATE
upgrade request submitted due to the alpha downtime, so that our staff
can check if it was already put through prior to processing it.
[Feb 22, 1999, 10:54 pm] Alpha Downtime
alpha crashed today at approximately 4:30pm EST due to a hard
drive failure. Since alpha is the home of www.pair.com and
support.pair.com both were down for approximately 6 hours while we
replaced the old alpha with a newly configured system. As a result,
the upgrade and signup systems are currently offline while we continue to
restore files and functionality. More details will be posted as they become
available.
[Feb 22, 1999, 1:14 pm] PHP 3.0.6 Deployment
We believe we have solved the earlier inconsistency with the crypt
function, and have again deployed PHP version 3.0.6 to the servers
iota and thuule for testing. If no problems are
reported, it will be distributed to all servers this week.
[Feb 22, 1999, 1:13 pm] swish-e Upgrade
The swish-e indexing tool has been upgraded to version 1.3 on
all servers.
[Feb 21, 1999, 11:14 am] omicron downtime
omicron crashed under heavy load, and required an extensive
filesystem cleaning before being brought back online. Downtime was less
than 30 minutes.
[Feb 20, 1999, 7:15 pm] sether troubles
Users may have experienced some difficulty in reaching
accounts hosted on sether recently. The cause was a network
attack, which tied up transfers to the server for approximately 20 minutes.
This attack has apparently ceased, though we have installed additional
monitoring software to more quickly track and eliminate this should it
happen again in the near future.
[Feb 19, 1999, 8:55 pm] sether downtime
sether is currently experiencing some network difficulties. We are
working on correcting this right now and will post once this has been
fixed.
[Feb 11, 1999, 9:00 am] idad Downtime
A runaway script caused idad to crash; the server was brought
back online with approximately 10 minutes of downtime and has returned to
full service.
[Feb 11, 1999, 8:33 am] theta Downtime
theta crashed this morning under heavy load. After a clean
reboot, it returned to service with less than 15 minutes downtime.
[Feb 9, 1999, 10:41 pm] Details of Routing Problem
As best we have been able to determine, beginning around 6:30pm Eastern
time today, UUnet began having problems in our area. A fiber cut
was rumoured to have taken place in Indiana, but we do not yet know if this
is related. The direct consequence of these problems was that
approximately 30% of the traffic that normally flows to us through
UUnet was instead sent nowhere, although UUnet continued to
advertise our routes and was successfully delivering the remaining 70% of
the traffic load. The traffic flow returned to normal around 10pm.
This type of behavior could be caused by inconsistencies in their routing
mesh, or more likely by overloaded circuits due to unplanned reroutes (this
would be consistent with a fiber cut problem). When we have a definitive
explanation from UUnet, we will post the details.
In a case such as this, BGP is insufficient for managing the outage, as the
routes remain in place and most traffic continues to flow. Even with
manual handling, disabling the circuit would have more negative effect
than maintaining the partial traffic flow.
We will work with UUnet to identify the cause of this problem in
their network, and report to our customers as soon as possible.
[Feb 9, 1999, 10:17 pm] UUnet Routing
We are aware of problems with routing to our network via UUnet;
although everything appears normal from our testing, a significant number
of customer complaints have prompted the opening of a ticket with
UUnet. We will post more details as they become available.
[Feb 9, 1999, 10:12 am] beith Downtime
beith crashed and was brought back online after a filesystem
cleaning. Downtime was less than 15 minutes.
[Feb 6, 1999, 11:17 am] mu Downtime
mu crashed under heavy load and was brought back online promptly.
[Feb 5, 1999, 3:15 am] ebad Downtime
ebad crashed as a result of a user-induced mail forwarding loop,
and required extensive filesystem cleaning.
Total downtime was approximately forty minutes.
[Feb 2, 1999, 11:11 am] bellat Downtime
bellat crashed as a result of a user-induced mail forwarding loop.
It was brought back online within five minutes.
[Feb 1, 1999, 2:12 pm] MySQL Upgrade
MySQL on both db and db2 has been upgraded to version
3.22.15. Additionally, access to the mysqlimport utility has
been restored.
[Jan 26, 1999, 6:37 pm] theta Downtime
theta rebooted this evening. Total downtime was approximately 20
minutes.
[Jan 26, 1999, 8:23 am] Network Upgrades
Please note that our network upgrade page has been updated.
http://support.pair.com/notices/network.html
[Jan 26, 1999, 3:10 am] Pi Downtime
Pi went down for unknown reasons. Total downtime was about 15
minutes.
[Jan 25, 1999, 11:26 am] MySQL Update
We are still awaiting the release of version 3.22.15, and will upgrade to
it soon after it becomes available. Additionally, we now know that use
of the LOAD DATA INFILE command can also cause database crashes.
Please refrain from using it until we are able to upgrade.
[Jan 23, 1999, 5:00 pm] db2.pair.com emergency maintenance
A failed power supply required an emergency swap of db2.pair.com
to new hardware. After this was completed, db2.pair.com has
returned to full operating status as a P-II 450. Total downtime was
approximately 15 minutes.
[Jan 23, 1999, 11:58 am] MySQL Bug
We have confirmed a bug in MySQL 3.22.14 that can cause the server to
crash during the use of the mysqlimport utility. This program
has been disabled and will be reenabled early next week when we upgrade
to version 3.22.15 of MySQL, which fixes the bug. All other MySQL
utilities (and PHP and Perl access) should work as normal in the interim.
[Jan 21, 1999, 11:57 am] Customer Notice
We have today sent out a customer notice, to all billing contacts, briefly
explaining that we have recently converted our form of business to a
corporation. As indicated in the notice, this has no effect on our
operations, our customers, our services, our employees, or our owners. In
effect, nothing has changed but a legal designation, a mere technicality
that influences how we file our taxes, for example. The notice is only
being sent on the advice of our corporate attorney. Please do not be
alarmed.
[Jan 21, 1999, 11:44 am] PHP 3.0.6 Deployment
The deployment of PHP 3.0.6 has been halted while we investigate and
resolve a difference in the crypt() function from the 3.0 installation.
All servers have been reverted to 3.0 at this time.
[Jan 21, 1999, 7:56 am] Network Maintenance
This morning's network maintenance has been completed. The upgrade caused
three service interruptions of up to three minutes each. Our peak routing
capacity has been improved, which should resolve slowdowns that some
customers have experienced in the past week. We have additional upgrades
within the next two to four weeks, but for now, routing should remain
stable. We apologize for today's inconvenience.
[Jan 21, 1999, 7:10 am] Network Maintenance
We are working with SAVVIS and Cisco to complete several
network upgrades today; our gateway will be down for approximately ninety
seconds, but this is an important step towards expanding our routing
border.
[Jan 20, 1999, 7:32 pm] Server Downtime
xi and tau both crashed under heavy load and were
promptly brought back online.
[Jan 20, 1999, 5:18 pm] UUnet Upgrade
Service on our UUnet DS-3 circuit was interrupted for nine minutes
this evening during a routine upgrade, due to a configuration glitch in
the CSU. The problem was identified and the configuration was quickly
corrected. We now have increased capacity online with UUnet, and
an additional DS-3 circuit on the way.
[Jan 19, 1999, 1:20 pm] PHP & Perl
PHP 3.0.6 has been installed on four user servers -- onn,
arda, thuule, and nish. Barring any problems,
it will be installed on all user servers in the near future. This new
compilation includes many bug fixes, as well as Freetype and XML
support.
We have also installed the XML-Parser 2.19 Perl module on these 4 servers.
It will be installed everywhere in conjunction with the PHP binary
distribution.
[Jan 18, 1999, 11:31 am] msqldump Fixed
The msqldump utility, for use with mSQL 1.0.16, has been
fixed; the bug was introduced by recent security updates. Please note that
support for mSQL 1 will eventually be discontinued; we recommend
that projects that rely on it be reworked for MySQL if at all
possible.
[Jan 15, 1999, 1:37 pm] MySQL Client Upgrades
The MySQL client program upgrades, which correspond with the current version
of the MySQL daemons (3.22.14), have been deployed to all user servers at
this time.
[Jan 15, 1999, 1:38 pm] mSQL 1 Upgrade/Prolems
mSQL 1 was recently recompiled on all servers to guard against a
number of recently published denial-of-service exploits. As a result, some
users have seen segmentation faults when using msqldump and
relshow, which we are working to correct. Also, some scripts which
specify the server as the host name may not work properly unless the host name
is omitted entirely.
All of this furthers our resolve to phase out mSQL 1 within the immediate
future.
[Jan 15, 1999, 12:48 pm] Problems at InterNIC
InterNIC has officially advised us that they have been experiencing
some very significant problems with their internal systems recently and at
present, for unspecified reasons, and that until these problems are
resolved, service levels will be much worse than usual. New registrations
and modifications may be delayed significantly. Please continue to advise
us of problems you encounter; we will work with our Premier Partner contact
to try to resolve as many cases as possible.
[Jan 15, 1999, 12:47 pm] Pittsburgh Weather
Weather conditions have been severe in Pittsburgh for the past two weeks,
with extensive rounds of snow, freezing rain, and sleet. This has had no
impact on our online operations, of course, but has led to difficulties
for staff members trying to get on-site. Furthermore, a number of staff
members have fallen ill with flu, bronchitis, and even pneumonia. As a
result, customers may experience slower response on billing, upgrades, and
new signups, but we are working hard to maintain reasonable levels of
service in all critical areas.
[Jan 14, 1999, 2:14 pm] MySQL Client Upgrades
The updated MySQL client programs have been installed on the following
servers: chi, onn, anca, pyyl. Barring any problem reports
by the users of those servers, they will be deployed to all user servers
on Friday.
[Jan 14, 1999, 11:59 am] db2.pair.com MySQL Upgrade
As no major problems were seen with the MySQL upgrade on
db.pair.com, db2.pair.com has also been upgraded to
MySQL version 3.22.14. An upgrade of the MySQL client programs will
be made in the near future, to bring them in line as well.
[Jan 12, 1999, 9:40 pm] xi Downtime
xi was down for approximately 10 minutes this evening, and is back
up now. A few of the more active sites on xi are currently being
offered Quickserve arrangements. This should help balance out
xi's load.
[Jan 11, 1999, 1:00 pm] MySQL Upgrade
MySQL on db.pair.com has been upgraded to version
3.22.14. If you see any unusual behavior, please contact us at
support@pair.com. Barring any
problems, db2.pair.com and the client-side mysql programs
will be upgraded shortly to bring them in line.
[Jan 10, 1999, 6:28 pm] xi Downtime
xi rebooted this evening due to heavy load. Total downtime was
approximately 15 minutes.
[Jan 5, 1999, 7:21 pm] epsilon Downtime
epsilon spontaneously rebooted this evening. Total downtime was
approximately 15 minutes.
[Jan 4, 1999, 9:21 am] UUnet Maintenance
UUnet has informed us of planned maintenance in their Pittsburgh
facilities, beginning at 4am Eastern time on Thursday, January 7th. We
do not expect customer traffic to be significantly affected.
View Notices for 1998
|
|
|
|