Freitag, 24. September 2010

Growing a md device

0 Kommentare
I am not quite sure why so often people are  stumbling on this, but at this point I will invest a couple of line on the topic:

I am trying to grow my md, I already did this, and it used to work. This time I get the following error:

 Elbereth:~ # mdadm /dev/md2 --grow --size=max
 mdadm: Cannot set device size for /dev/md2: Device or resource busy

Short answer: You forgot your bitmap. Or at least there is a good chance you did.

Longer one: Currently, you cannot grow a md device that holds an internal bitmap. This is not too tragic, you can still remove the bitmap, grow your device and put the bitmap back, as long as you are not going to have a major failure during the process, you'll be just fine.
What if you happen to have the up mentioned failure? Well, do not focus on the bitmap too much, with a bad failure during a grow operation - bitmap or not - you are going to have quite a bit of trouble anyway.

Practically speaking:

mdadm --grow --bitmap=none /dev/mdX

to remove it and then grow you array as usual.
Once done, you put it back with a:

mdadm --grow --bitmap=internal /dev/mdX

Have fun!

Montag, 13. September 2010

Bits and pieces of a Monday

0 Kommentare


I should definitely start playing with SuseStudio a bit more.
Surprisingly enough, browsing trough the appliances, there is nothing already done for a quick setup of a dummy Openais node. Indeed, in this case a simple build is not enough, but I guess this gives me something to play with this week.
Well see if I manage to banish procrastination for long enough and actually get this done.

A couple of more or less proper topics deserving an entry from themselves are in the pipeline, but due the time and being Monday, for today I'll keep it short and confused.
So, here you go with random bits and pieces of what I've been asked. Probably nothing worthy to be noted, but you never know..

Given a NFS mount on my [opensuse|SLES|SLED] provided by my $filer (you haven't heard me saying Netapp here... ) due to a firewall dropping UDP packets, I see failures all over the place.
Captain obvious suggests to move to TCP, but unfortunately, using "mount -o tcp" does not really help, because mountd is still going to use UDP. 

As if it wouldn't be enough, using RH all my traffic is going as expected thorough TCP. What's wrong?
Quick answer

The  tcp  option  is an alternative to specifying proto=tcp.  What you are missing is mountproto=tcp
                   
Long answer
man nfs
/Using the mountproto mount option

On my Openais cluster I want to run several Dom-u resources but for whatever reason I don't want  automatic memory allocation.
I disabled it using the cluster GUI, but I still see changes in the amount of memory given to my Dom-u. What's wrong with me and with them?

With you, not sure, but with your cluster and Domu there is nothing wrong.
With the GUI, there was (this is currently fixed upstream, by the time you'll read this the answer will be make sure you are up to date!).
The problem was just a non valid value provided by the GUI, that would have let you choose between "True" and "False" while the resource agent was expecting a "0" in order to deactivate the feature.

I guess it is enough ranting for now, time to go back to my evening powered by Milky Oolong and Robochicken on the background.

qrcode

Mittwoch, 8. September 2010

Ode to autoreadonly

0 Kommentare
Have you ever come across unused md devices seen as autoreadonly?
 
This particular status is given to md devices lacking IO activity (as in, they never had any since they array was assembled).
If you are wondering why an md device should be started if no IO is taking place on it you are probably right - with one exception - it is legitimate to have swap on a md device.

This doesn't really cause any problem by itself, as soon as IO will start the device will automatically awake from this state, but there is something you should take care of.
Let's assume you are doing an autoinstallation, using (surprise surprise) Autoyast.
Without specifying a filesystem for those newly created mds, AY will do exactly what
it should - setting them up without further action.
Those are going to start in the (in)famous autoreadonly status, with a particularity - sync pending.
This is correct, the sides of the mirror never synced, and are currently readonly.
Practically it also means that you don't really have a working mirror.

If you do not manually take care of it, issuing for example a mdadm --readwrite in order to trigger a sync, you'll have quite some pain in case anything will happen to your storage.

Anything else? Well, do not forget that if by any chance your menu.lst contains a "resume=" option pointing to a md device, you'll get an autoreadonly status for free.
Either go for noresume, feel free to use a fake device to resume from or, if you have a real device that is not md, that one. If you change this early enough

AY related hint, try with a chrooted config:type="boolean">true</chrooted> in your chroot script

you can easily forget about all this rant, and live happily ever after.

Unrelated, but yummy QR of the day:
qrcode

Mittwoch, 1. September 2010

The wonderful world of SCSI errors return codes

0 Kommentare
This is not meant as anything too serious, and most important,  as a self note.
Somehow, this is one of those topics I do not manage to set into my long term memory, therefore, every time I need it, I need to look it up.
Hopefully this will fix my memory allocation, or at least give me a quick way to find what I was looking for.

So, after this little disclaimer, let's get going.
First important thing to remember, *the* file to look for is
/usr/src/linux/include/scsi/scsi.h and nothing else.

Given the classic
Sep 1 15:20:01 Elbereth kernel:sd 0:0:1:0: SCSI error: return code = 0x08000002
Can be represented as

So, in this case we have a 08 - 00 - 00 - 02

Let's check it against the above mentioned file (I mean it, look into it!):

--Driver byte codes

#define DRIVER_BUSY 0x01
#define DRIVER_SOFT 0x02
#define DRIVER_MEDIA 0x03
#define DRIVER_ERROR 0x04
#define DRIVER_INVALID 0x05
#define DRIVER_TIMEOUT 0x06
#define DRIVER_HARD 0x07
#define DRIVER_SENSE 0x08

-- host byte codes

#define DID_OK 0x00 /* NO error */
#define DID_NO_CONNECT 0x01 /* Couldn't connect before timeout period */
#define DID_BUS_BUSY 0x02 /* BUS stayed busy through time out period */
#define DID_TIME_OUT 0x03 /* TIMED OUT for other reason */
#define DID_BAD_TARGET 0x04 /* BAD target. */
#define DID_ABORT 0x05 /* Told to abort for some other reason */
#define DID_PARITY 0x06 /* Parity error */
#define DID_ERROR 0x07 /* Internal error */
#define DID_RESET 0x08 /* Reset by somebody. */
#define DID_BAD_INTR 0x09 /* Got an interrupt we weren't expecting. */
#define DID_PASSTHROUGH 0x0a /* Force command past mid-layer */
#define DID_SOFT_ERROR 0x0b /* The low level driver just wish a retry */
#define DID_IMM_RETRY 0x0c /* Retry without decrementing retry count */
#define DID_REQUEUE 0x0d /* Requeue command (no immediate retry) also
* without decrementing the retry count */
#define DID_TRANSPORT_DISRUPTED 0x0e /* Transport error disrupted execution
* and the driver blocked the port to
* recover the link. Transport class will
* retry or fail IO */
#define DID_TRANSPORT_FAILFAST 0x0f /* Transport class fastfailed the io */
#define DRIVER_OK 0x00 /* Driver status */

--message byte codes

#define COMMAND_COMPLETE 0x00
#define EXTENDED_MESSAGE 0x01
#define EXTENDED_MODIFY_DATA_POINTER 0x00
#define EXTENDED_SDTR 0x01
#define EXTENDED_EXTENDED_IDENTIFY 0x02 /* SCSI-I only */
#define EXTENDED_WDTR 0x03
#define EXTENDED_PPR 0x04
#define EXTENDED_MODIFY_BIDI_DATA_PTR 0x05
#define SAVE_POINTERS 0x02
#define RESTORE_POINTERS 0x03
#define DISCONNECT 0x04
#define INITIATOR_ERROR 0x05
#define ABORT_TASK_SET 0x06
#define MESSAGE_REJECT 0x07
#define NOP 0x08
#define MSG_PARITY_ERROR 0x09
#define LINKED_CMD_COMPLETE 0x0a
#define LINKED_FLG_CMD_COMPLETE 0x0b
#define TARGET_RESET 0x0c
#define ABORT_TASK 0x0d
#define CLEAR_TASK_SET 0x0e
#define INITIATE_RECOVERY 0x0f /* SCSI-II only */
#define RELEASE_RECOVERY 0x10 /* SCSI-II only */
#define CLEAR_ACA 0x16
#define LOGICAL_UNIT_RESET 0x17
#define SIMPLE_QUEUE_TAG 0x20
#define HEAD_OF_QUEUE_TAG 0x21
#define ORDERED_QUEUE_TAG 0x22
#define IGNORE_WIDE_RESIDUE 0x23
#define ACA 0x24
#define QAS_REQUEST 0x55

-- con byte message

#define SAM_STAT_GOOD 0x00
#define SAM_STAT_CHECK_CONDITION 0x02
#define SAM_STAT_CONDITION_MET 0x04
#define SAM_STAT_BUSY 0x08
#define SAM_STAT_INTERMEDIATE 0x10
#define SAM_STAT_INTERMEDIATE_CONDITION_MET 0x14
#define SAM_STAT_RESERVATION_CONFLICT 0x18
#define SAM_STAT_COMMAND_TERMINATED 0x22 /* obsolete in SAM-3 */
#define SAM_STAT_TASK_SET_FULL 0x28
#define SAM_STAT_ACA_ACTIVE 0x30
#define SAM_STAT_TASK_ABORTED 0x40

That concludes our search

#define DRIVER_SENSE 0x08
#define COMMAND_COMPLETE 0x00
#define DID_OK 0x00 /* NO error */
#define SAM_STAT_CHECK_CONDITION 0x02

That usually translates in pointing your finger to your storage guy and for once, ask him to tell you what is wrong with the device, but after that be nice and bring him some chocolate.