Category: computers

Time For A New Printer

Published September 20, 2008

Old parallel port printer cable connecting into the back of my work laptop, a Dell D610.

I guess I really need a new printer. After almost a year of limping along with a failing HP LaserJet 5L at home I’m finding I can’t even convince it to print any more. Today I was able to get it to print a test page while manually guiding the paper deep into the feed mechanism, but then I was unable to print properly from either my Mac or work laptop via lpr (to the JetDirect), or straight from my work laptop via parallel port.

Two pages bearing print did leave the printer eventually when connected via parallel port, but only half of the PDF which I needed to print (a free admission ticket to Addison Oaks for riding the mountain bike trails) was actually render correctly. Oh well. I guess it’s time to go to Kinkos.

Leave a Comment

Busy Weekend

Published September 19, 2008

This weekend looks to be very busy. I’m still at work, don’t know when I’ll be leaving, and likely will have to put in some time on either Saturday evening or early Sunday morning.

The new hard disks for my server are going to be delivered today, so hopefully the wipe of the failing ones (with DBAN) will be complete by the time I arrive home so that I’ll be able to do the dump and restore, check out the install, then get in with more burn-in.

I’d originally planned on riding both the Tour De Troit and the Addison Oaks Fall Classic this Saturday and Sunday (respectively), but I just don’t think I want to schedule things that tightly. So, maybe I’ll get out and ride a bit, but it definitely won’t be anything planned or structured.

Now, to get this stuff at work wrapped up. Thankfully Danielle brought me some really, really yummy lunch from Rangoli Express so that I didn’t have to leave for lunch today. It was really, really, really good.

(No, I’m not neglecting work right now… I’m just waiting for some other folks so I can keep going with stuff that I’m doing.)

Leave a Comment

+12 Hours of Breakin

Published September 18, 2008

Breakin, having run for 12h 28m 33s after swapping RAM around.

Yesterday I ordered a pair of Seagate Barracuda ES.2 ST3500320NS disks to replace the two which failed on Tuesday. Today I called Newegg about my RMA for the old ones and the old controller and was able to get the 15% restocking fee waived for both the controller and drives. Hopefully the drives will arrive tomorrow and I can dump | restore the OS and such, then start Breakin running so that it can thrash the drives for a few days.

Speaking of Breakin, I disconnected the disks from the machine (but left them mostly fitted in the case as to not disrupt airflow) and started Breakin running this morning before I left for work. When I arrived home it was still running, unlike last week when it regularly failed with MCEs. This is good, as I had been unable to get it to run for this long before.

Leave a Comment

SMART Issues

Published September 16, 2008

When I got home I started running SeaTools, Seagate’s disk diagnostics utility for Windows, on the ad4 which had begun failing earlier. It reported back that it, and the other hard drive, were just fine. However, when booting into FreeBSD after using them I found that both drives were now indicating that Seek_Error_Rate was past threshold. The OS booted very slow, then kicked ad6 out of the mirror set.

I tried connecting the drives to another, standalone SATA controller (some plain old Maxtor bundle-in one) with new SATA cables and same problem.

So, I’m not sure what to do. Here’s every issue I’ve had with the new server and its resolution:

Issue: Server locking up hard, unexpectedly. MCEs on console.
Resolution: Ensure that only matched RAM is used and that all RAM tests good during burn-in.

Issue: Slow performance / absurd latency while using 3ware disk controller.
Resolution: Identified GIANT-LOCK on driver, moved to using software mirroring.

Issue: One of the original two Western Digital disks used, which were part of a gmirror set, has started giving block errors.
Resolution: Replace disks with brand new Seagate pair.

Issue: Both of the new Seagate drives began failing with excessive Seek_Error_Rate within a few hours of each other after extensive burn in.
Resolution: Unsure.

I can’t help but wonder if one of the Seagates beginning to fail was contributing to the latency observed with the 3ware controller, but as neither was throwing SMART errors at the time, so I discount this.

My current thought is that I should order a pair of server-grade disks, burn them in as before (~50 hours of constant activity), copy the data to them, then see if things will keep working. The failed disks and the unwanted 3ware controller will go back to Newegg, and hopefully things will work right.

I don’t know what other option I have besides scrapping the whole idea of moving servers, but I really rather not do that. If anyone else has any ideas, I’d love to hear them…

Leave a Comment

New Hard Disk Is Failing

Published September 16, 2008

root@banstyle:~# smartctl -H /dev/ad4
smartctl version 5.38 [amd64-portbld-freebsd7.0] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
Failed Attributes:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
7 Seek_Error_Rate 0x000f 013 012 030 Pre-fail Always FAILING_NOW 38293929828058

root@banstyle:~#

I can’t win. Now one of the brand new hard disks in the server is getting a bunch of seek errors.

Leave a Comment

3ware 8006-2LP Sucks Under FreeBSD 7.0-RELEASE

Published September 14, 2008

Results from using Bonnie++ on FreeBSD 7.0 with a 3ware controller (twe), gmirror, and just a single local disk.

As mentioned here I got my new server working with a 3ware 8006-2LP and a pair of new 500GB disks. While it was working fine, I noticed that when updating the FreeBSD ports collection that the update would occasionally pause, consuming no CPU, but with the update process having a status of sbwait. I understand this to mean that the process is waiting on a blocked socket.

It turns out that the twe(4) driver is what is known as GIANT-LOCKED, which I believe means that it uses the old SMP locking mechanism in FreeBSD:

twe0: <3ware Storage Controller. Driver version 1.50.01.002> port 0x8c00-0x8c0f mem 0xfc7ffc00-0xfc7ffc0f,0xfb800000-0xfbffffff irq 28 at device 3.0 on pci1
twe0: [GIANT-LOCKED]
twe0: [ITHREAD]
twe0: 2 ports, Firmware FE8S 1.05.00.068, BIOS BE7X 1.08.00.048

Best I can tell, the result of this is that the disk controller’s driver needs to wait for the kernel to free up other resources and tell the driver that it can go ahead and work before it does things. The result of this tends to be that the driver works well, but there is a lot of latency.

This understanding matches what I observed, which was the aforementioned lengthy pauses when doing things which required a bunch of disk IO. In order to prove this understanding out, I set up a test hard disk running a stock FreeBSD 7.0 amd64 installation from which I could run Bonnie++, a file-based disk benchmarking suite.

In my testing I used the following three scenarios:

· One 120GB IBM Deskstar PATA drive (IC35L120AVVA07) connected to the motherboard booting the OS, listed in the results as banstyle_deskstar.
· Two 500GB Western Digital SATA drives (WD5000AAKS-40TMA0) connected to the motherboard with software RAID 1 via gmirror(8), listed in the results as banstyle_gmirror.
· Two 500GB Seagate SATA drives (ST3500320AS) connected to the 3ware 8006-2LP using the twe(4) driver in hardware RAID 1, listed in the results as banstyle_twe.

The result ended up being that all three configurations are generally around the same speed for throughput, but the 3ware controller had an absurd amount of latency. If one looks at the HTML version of the Bonnie++ output here (or the PNG here or above), one can see that was giving near three SECONDS of latency for random seeks and writes using write(2). This is insane.

The only thing I can think to attribute this to is the GIANT-LOCK in twe(4). I guess this means that I’m going to have to go back to gmirror(8) for software RAID and return the card. How disappointing.

(If anyone reading this disagrees with these findings or wishes to comment on them, please don’t hesitate to do so here or by emailing me directly.)

7 Comments

Black and Shiny

Published September 14, 2008

Set up to polish my boots in the laundry room. One boot is done.

After eating some really nice Skillet Baked Ziti (recipe from America’s Test Kitchen) that Danielle made for dinner I avoided working on my server by polishing my boots. As you can see above or at this close-up of the toes of my boots, they needed it.

Now I get to go back to figuring out why twe(4) in FreeBSD 7.0 seems sluggish. It may just be my perception, so I’m double-checking this by comparing the new 3ware-based array to the old gmirror(8) version. Or, it may be that it’s one of three drivers (the other two are ohci(4) and atkbd(4)) which indicate that they are GIANT-LOCKED, which means that they use the old SMP locking method.

Leave a Comment

Control

Published September 14, 2008

Danielle and I finally watched Control (Official Site · IMDB · Wikipedia), which she had received from Netflix last week. While it was a bit slow and (obviously) predictable, I enjoyed it.

I think that tonight I also got banstyle.nuxx.net working properly again. Over the past two days I did a bunch of extensive testing with spare RAM, Breakin, and a white board, and I think that I may have narrowed down the problem. I believe that the MCEs I was seeing were caused by a combination of a failing DIMM and modules which were the same in part number but not in actual chip content. There may actually be a bad slot there too, but I’m not certain of that.

I’ve winnowed the box down to 6GB of matched, tested RAM and it seems to pass all the tests I’ve thrown at it thus far. With the discovery that ad6 is dying as well I ordered a 3ware 8006-2LP and two Seagate ST3500320AS 500GB disks. Those were fitted into the server and I then dumped the the partitions from ad4 to it and everything seemed to be working fine, but occasionally slowly. Jumpering the board to force the first PCI-X slot to 66MHz (to match the PCI 8006-2LP) and turning on bus mastering for IDE transfers on the PCI slots seems to have sorted this out.

SMART tests and a number of hours of Breakin have shown the disks to be okay, so come Monday morning I’ll attempt to get a good 36 hours of burning in happening. If this all goes good the server will be back in place on Wednesday, with everything moved (shifted?) back over by Thursday evening.

If you are interested, here is a photo of my workbench just after dumping the partitions from one half of the old mirror to the new mirror set. Due to a bug in dump (or UFS) on FreeBSD 7.0 I had 6.3 booting off of an external USB drive, running dump to throw data from disk to another, a partition at a time.

After that photo was taken fstab was edited, everything booted up great, and then the new drives each passed an extended offline SMART test.

Leave a Comment

Time Machine Network Backup Speedup / Fix

Published September 11, 2008

I just acquired a new external disk enclosure and 750GB disk for hanging off of an AirPort Extreme and using for Time Machine backups of my main machine. From this I currently have ~480GB of data to back up, and for some reason the initial large backup repeatedly fails when I attempt to do it over the network.

The easy way around this is to first do the backup to the drive when it is connected locally and then hang it off of the AirPort Extreme to continue the incremental backups. The problem is that this doesn’t work as one would expect, because when an initial Time Machine backup is made to a local disk the backup ends up in a series of subdirectories, which is a different format from what it is via network.

When the backup is made to a volume hanging off of an AirPort Extreme a .sparsebundle file is created containing the backup; essentially a disk image stored on the network. Therefore, if you make a Time Machine backup locally and then try to use it via an AirPort Extreme the .sparsebundle file will be created on the disk in parallel to the now-useless directory structure.

So, how do you work around this? Easy. Hook the external disk up to the AirPort Extreme then either let the backup fail or cancel it, which will leave the incomplete .sparsebundle file on the disk. Disconnect the drive from the AirPort Extreme, connect it to your Mac, and point Time Machine to that volume. If it finds an appropriate .sparsebundle on the volume (which it will, since it’s already there) it’ll use that instead of creating the aforementioned subdirectory structure.

The backup will then happen quite quickly, and after it completes you can just hang the drive back off of the AirPort Extreme, redirect Time Machine to back up to that network volume, and things will continue via the network.

UPDATE: Since 10.5.5 was applied to my machine I have been unable to use this backup method and have had to resort to making the entire initial backup via network.

Leave a Comment

ad6 is Dying Too!

Published September 11, 2008

Error messages on the console showing that ad6 is actually failing hard. Good thing I ordered replacement disks.

It’s a good thing I received a 3ware 8006-2LP and a pair of Seagate 500GB disks today, because one of the two drives in the mirror set on my new server is just about to fail. To make matters worse, the failing disk is ad6, and ad4 is the one I’d accidently broken the other night, so I’ve been desperately waiting for the disks to finish syncing so that everything would be backed up.

This failed at ~10:00am this morning, which kept me from rebooting the box remotely to run more stress testing and (hopefully) replicating last night’s error.

Now that the data is sync’d I’ll wait for it to finish fscking then I’ll shut it down cleanly and begin running Breakin again.

Leave a Comment