Archive for the 'Job' Category

OnTAP version issues

Last week (on the 24th) I performed a routine OnTAP upgrade across 5 of my 3170 filers; I upgraded from 7.3.5 to 7.3.5.1P5. This upgrade was performed to help prevent a system panic from happening, which had happened twice before on our stand alone 3170 snapvault target system, under Bug 446493. Here is the Bug description:

Much disk and shelf hardware can be managed by an ANSI-standard technology called SCSI Enclosure Services (SES).  To support SES-related processing, the SES subsystem of Data ONTAP schedules various periodic actions, using an internal timeout mechanism.

Due to a software defect, under certain conditions, the SES subsystem may set an excessive number of timers, with new timers being set before old ones expire.  If this continues, an internal callout table will fill up, triggering an interruption of processing.

One condition in which the problem can occur is during initial setup and configuration of the storage system:  when the “cluster-setup wizard” is run, it asks the administrator for configuration input as follows:

Do you want to create a new cluster or join an existing cluster? {create, join}: (Login timeout will occur in 60 seconds)

When installing releases in which the defect is present, if this prompt is allowed to time out, an interruption may occur at some later time.

However, the callout table can also fill up during routine production, if storage events occur in rapid succession, such that SES scans are rapidly invoked.  Such events may include:

  •  a continuing series of disk errors
  •  breakages in disk-communication links
  • adding new shelves
  • power-cycling shelves
  • removing a power supply
  • shelf firmware updates
  • shelf faults
  • takeover/giveback

We were hitting this bung under the “continuing series of disk errors” event, which caused the SES scans to fill the timer-callout table. When this happened the system would panic and reboot. After the upgrade I performed all standard checkouts and everything appeared to be functioning normally and within standards, so I closed the upgrade processes and marked it as successful.

Then on the 26th we attempted to perform an allocation using the NMC. While going through all the steps everything appeared as though it was going well, all of the checks passed and we hit commit, only to have the process came back with an error message indicating that the process had failed. After opening a support call with NetApp and providing both screen captures of the error received and steps to reproduce, it was determined that we had run it to another bug. This time we hit Bug 474612. Here is the bug description:

system-cli API returns cli-result-value with an invalid return status. This invalid return status may break OnCommand and other third party applications utilizing NMSDK.

Basically what is happening is the NMC is executing the requested commands, but the filer cli is returning a response code that the NMC is not expecting and in fact does not recognize. When this happens the NMC does not know how to proceed and the command fails without performing the provisioning. Apparently this issue was introduced in OnTAP 7.3.5.1P4 and still exists in 1P5, so the solution is to downgrade OnTAP to 7.3.5.1P3. In this version of OnTAP bug 446493 is resolved and bug 474612 has not yet been introduced (as of this writing, bug 474612 is NOT resolved in any version of OnTAP).

After performing the downgrade tonight to OnTAP 7.3.5.1P3 I performed all normal checkouts and additionally performed the pending allocation via the NMC to verify the functionality and non-existence of bug 474612 in this version of OnTAP. Happily the allocation went through without issue, and from what we can tell all aspects of the filers are functioning normally. Our next OnTAP upgrade will have to be to 7.3.6.

The problems with combining FC and SATA drives on the same NetApp

The idea of tiered storage is something many business are seriously exploring now a days, and combining and leveraging it with and for “cloud” operations is a major focus. The idea behind tiered storage is that you have different levels of disk that have different performance characteristics, with the main focuses being speed and performance. We recently looked in to the possibility of adding some tiering to one of the NetApp environment’s I manage. The idea was to use 300GB FC (Fibre Channel) as our Tier 1 NAS disk and some 1TB SATA as our Tier2. On the surface of things this seems like a good idea for a couple of reasons:

  1. The Tier 1 disk is sufficient enough to run Oracle databases over NFS if those databases are configured properly.
  2. The Tier 2 disk is much cheaper and would be perfect for housing non-critical non-performance intense shares, such as home directories, at a fraction of the cost
  3. The mix of available disk would allow us to tailor the allocations to the actual needs of the project based on their performance requirements.

So as I began to look in to this course of action I discovered a few things that completely negates this idea, at least for having it housed in one filer (or a clustered pair even):

  • When using FC connectivity to disk shelves, FC drive and SATA drives must go on different loops.  This means that if you add a shelf of SATA to an open FC port on a filer, you will not be able to add any FC drives to that loop.
  • All write operations are committed to disk as a group, regardless of what the destination aggregate is for those writes.  So, on a system with SATA and FC disk, write operations destined for FC drives may be slowed down by the writes going to the SATA drives, if the SATA drives are busy.

The first point, the dedicated loops, isn’t such a big deal if you are planning on adding a full loop worth of SATA shelves (6 shelves per loop) and you have an open FC port with nothing else attached (or can move other shelves to fill in open spots on other loops to free up an FC port). So while the dedicated loop can be resolved and may or may not be an issue depending on your set up, it’s the second point that poses the most trouble in our environment (and I would assume most others as well).

Running the risk of impacting performance to your Tier 1 disk is not acceptable. The applications running on that tier are there for a reason, they need the performance of those faster disks. But how do you know if you will hit that impact, maybe it won’t apply? Good question. Perhaps this won’t be an issue for your environment. So ask your self this: Do you know the exact details of your workloads? No of course you don’t. You may know that there are databases on some of the exports, or that certain exports are used for regular CIFS or NFS shares for home directories, but you most likely do not know all the intimate details of each given applications work load. Without that precise knowledge it is nearly impossible to quantify the potential impact ahead of time, and thus this possible latency becomes a real concern.

Because of these factors we choose (and I recommend) not mixing FC and SATA shelves on a single (or clustered) system. If you need to have multiple Tiers you still have options:

  • Implement SAN as your Tier 1 storage and utilize NAS as Tier 2
  • Implement a Tier 1 NAS environment and a Tier 2 Nas environment on separate hardware (read: separate physical systems, either single headed or clustered)
  • Look into an appliance that can handle different types of disk in the same housing without impact and configure tiering therein.

Tiering your storage is great idea and allows for many flexibilities and possible cost savings for the customer in terms of charge backs for disk utilization. Even so, you still need keep performance in mind, and for me, the possible performance impact is not worth the risk for mixing these shelf types on a single head.

NCDA Boot Camp, Day 6

I just passed NS163 and now officially have my NCDA! It will take a couple of weeks for them to ship out my certificate but the system has been updated with my passing score so I am officially certified. Not just me though, everyone in the class passed both exams, so there are now 8 new NCDA’s in the field. Congratulations to all the other guys and to Jon Presti for being a great instructor and making sure we all had the knowledge needed to pass the exams and get certified. Now to figure out what to do with the rest of my day, because my flight doesn’t leave until tomorrow. Had I known we would be done by 2pm today, I would have scheduled for an early evening flight. Oh well, looks like I get to go explore Las Colinas a little.

NCDA Boot Camp, Day 3

Had the first of two exams today, NS153, and I am happy to say that I passed! I am now half way to my NCDA. After the exam we began covering the material that will be on Saturday’s NS163 exam. After class I was talking with Jon and found out that he wasn’t going to be having dinner tonight because he was short on cash (due to forgetting to process a lot of expense reports, oops.) and so I told him he should come over to my hotel which is literally next door to his because we have free dinner in the lobby. We had some good conversation at dinner about the class, the material, and NetApp in general. Apparently this is the first time Jon has hung out with any of his students outside of class as well, which I thought was odd because he is by far one of the most straight forward, down to earth, and straight up cool techs I have ever met. Well, time to study some more!

NCDA Boot Camp, Day 2

So my first test is tomorrow and in preparation we filled out the necessary forms for Prometric, and as soon as I got my form I had to laugh. According to the paper Prometric still thinks they are a Thomson company. Prometric Oops I then had to explain to the class why I was laughing: 1) Thomson doesn’t exist anymore, we are now Thomson Reuters, and 2) when we were Thomson, we sold Prometric in 2007. Jon said he was going to send an email to his contact to see about getting updated forms. :) The second day of the class went smooth and as I predicted some of the conversations have been very interesting. Also like I predicted Jon is really awesome, he knows the material very well and has no problem admitting if he dosen’t have the answer to a question, but he gets the answer typically while we are doing the labs either by doing some quick research or calling one of his contacts. I can see Jon being a great contact through out my career.

NCDA Boot Camp, Day 1

Made it in to class this morning, bright and early at 8am…and went into the wrong building. I wasn’t the only one though, the entire class including the instructor did. It had to do with the way we were given the address and how the buildings are actually set up inside with the suites. Long complicated explanation short, I found the class. Interesting enough the instructor was one of the last people to arrive (on time I want to add, as I was just really early), and he (Jon Presti) seems really damn cool and I think he is going to be a great instructor. We have decided collectively to begin class at 8:30 instead of 8:00, which for me means another half hour to study before class because, lets face it, I will be up well in advanced. Anyway, the first day was really good. There are 8 guys in the class counting me and according to Jon, this is the first time he has had a class completely comprised of guys from enterprise level storage environments. Based on some of the conversations we had today in class I am thinking the conversations about storage are going to be very interesting and informative. I found out today that of the two exams I need to take to get my NCDA certification, one will be on Wednesday morning and the other will be on Saturday morning. The one on Saturday is supposed to be the more difficult of the two, so I am certain I will be spending all of my “free time” studying. Well, one day down, time to hit the books!

Texas, I Am In You

Flew in to Texas this afternoon (plane landed about 2:30pm Central / 3:30pm EST) and after finding may way out of the terminal I made my way to the bus for the rental car where I ended up getting a Dodge Caliber with a GPS system as my rental. First impression is that it is a decent car but not something I would want to own as I feel very cramped in it. I found my way to the hotel, TownePlace Suites, without to much of an issue. I say without much of an issue because it took me a little while to understand what the GPS (a Garmin Nuvi) was saying, mainly because every time I had to use an on or off ramp from a major highway (by the way Dallas and Las Colinas seems to be nothing but concrete and asphalt!) it sounded like it was saying to take the “sledge road” and that confused me, but luckily I could tell by the map what it wanted. After finding the hotel and getting checked in I decided I had better find my way to the class location so I would at least have an idea of how to get there in the morning. Turns out it’s not very difficult to get there but the main road is closed due to construction, of course, so I have to take a detour that puts me back out on the main road (Walnut Hill) about 15 yards from the driveway to the class location. Thankfully I have the GPS though because it was able to re-route and find a way back to the class location, and seeing as it is pretty straight forward I won’t need to use the GPS in the morning. After finding the class I then decided to go have dinner at a Sushi restaurant that Chuck recommended from when he was down here called The Blue Fish (this location), and I have to say it was very good! After dinner I came back to the hotel to relax and prepare for class, I have a feeling this is going to be a long week.

Renaming Volume Groups

I’m anticipating a project coming up at work that will hopefully allow me to rearrange some file systems on my TSM server and improve some performance. However, one of the things I want to do as part of this project is to rename some volume groups on the housing AIX server so they make a little more sense in the overall scheme of things. Problem is I was not positive as to how this  should be done, so I needed to do some research. I found the commands that I will need to use to implement and make the changes, in theory:

First I need to know what volume groups are on the system, this can be found using the lsvg command:

# lsvg
rootvg
vg04
raweagantsm02
raweagantsm01
raweaganarch

Next I need to know which disks are in each volume group, this can be found using the lspv command (truncated output):

#lspv
hdisk0 00cdfe5b0855e5f5 rootvg active
hdiskpower0 00cdfe5b448b684a vg04 active
hdiskpower13 00cdfe5bd1d8fdd5 raweagantsm02 active
hdiskpower92 00cdfe5b4494c652 raweagantsm01 active
hdiskpower96 00cdfe5b448dddcf raweaganarch active

Next I will need to offline the volume group that I want to rename using the varyoffvg command:

#varyoffvg vg04

Now we need to export the volume group so we can later import it with the new name. To export a volume group use the exportvg command:

#exportvg vg04

Now an lspv would show all disks previously associated with the exported volume as having no volume group:

#lspv
hdisk0 00cdfe5b0855e5f5 rootvg active
hdiskpower0 00cdfe5b448b684a None active
hdiskpower13 00cdfe5bd1d8fdd5 raweagantsm02 active
hdiskpower92 00cdfe5b4494c652 raweagantsm01 active
hdiskpower96 00cdfe5b448dddcf raweaganarch active

Now I can import the old volume group with the new name using the importvg -y command (the -y <volume_group_name> tells the system what to name the new volume group, if this is omitted the system will automatically generate a new one):

#importvg -y raweagantsm03 hdiskpower0 (or any other disk that was part of the volume group)

Now an lsvg should show the new volume group:

# lsvg
rootvg
raweagantsm03
raweagantsm02
raweagantsm01
raweaganarch

Additionally an lspv will show the disk now being part of the new volume group:

#lspv
hdisk0 00cdfe5b0855e5f5 rootvg active
hdiskpower0 00cdfe5b448b684a raweagantsm03 active
hdiskpower13 00cdfe5bd1d8fdd5 raweagantsm02 active
hdiskpower92 00cdfe5b4494c652 raweagantsm01 active
hdiskpower96 00cdfe5b448dddcf raweaganarch active

Hopefully this works the way I think it should. I have spoken with my local AIX guru and told him my plans and everything seems to check out. Once the new disk comes in for the rest of the project I’ll work on implementing the above.

Success, decrypted

The past few days have been pretty busy for me with the work disaster recovery test going on. It was nice to be back in Philadelphia, as I really enjoy this city; I am always amazed at the beauty of City Hall and the Masonic Temple. However, this time, work did not afford me the opportunity to do any site seeing. The test went well though, in reality it went better than anticipated. The new LTO4 tape drive encryption I implemented went very smoothly. I was able to configure the TS3500 library to communicate with our TKLM server without any issues, which was a very pleasant surprise. Once that was done and the AIX boxes were built we brought up the TSM server, and i have to admit, once the server started reading the first encrypted tape to restore TSM’s database I was elated. Had we not been able to read the encrypted data on that tape we would have been in a world of trouble, especially since I had already moved the entire environments data to encrypted tapes. Once the TSM server was completely up and running all of the necessary restores ran really smooth, and as a bonus the new LTO4 tapes wrote the data back out faster than anticipated. In the end it was a successful test and a pretty decent validation of all my hard work to implement the new encryption method.

Off to Philadelphia

Due to some unfortunate scheduling I had to cut my Father’s Day with Kylie short (dropped her back off at Mommy’s house around Noon) as I have to leave for a DR test in Philadelphia which starts tomorrow morning at 8am. My flight leaves at 1:45 which means I have to be to the airport by 1 the latest. I tried to get a later flight out so I would have more time to spend with Kylie, but the price went up by an additional $600 for the later flight, so sadly I am not able to postpone the flight any later. The only thing that makes it anywhere remotely ok is that Kylie has been with me since Thursday night, so I have had a good amount of time with her the last few days. So off I ago once again to Philadelphia, at least it is a city that I really enjoy being in.




Copyright © 2005 - 2011 King's Pride